1 / 20

Identifying Interesting Association Rules with Genetic Algorithms

Identifying Interesting Association Rules with Genetic Algorithms. Elnaz Delpisheh York University Department of Computer Science and Engineering October 11, 2014. Data mining. Too much data. Data. Data Mining. I = {i 1 ,i 2 ,...,i n } is a set of items .

herbst
Download Presentation

Identifying Interesting Association Rules with Genetic Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Interesting Association Rules with Genetic Algorithms Elnaz Delpisheh York University Department of Computer Science and Engineering October 11, 2014

  2. Data mining Too much data Data Data Mining • I = {i1,i2,...,in} is a set of items. • D = {t1,t2,...,tn} is a transactional database. • ti is a nonempty subset of I. • An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ . • Apriorialgorithm is mostly used for association rule mining. • {milk, eggs}{bread}. Association rules

  3. Apriori Algorithm

  4. Apriori Algorithm (Cont.)

  5. Association rule mining Too much data Data Data Mining Too many association rules Association rules

  6. Interestingness criteria • Comprehensibility. • Conciseness. • Diversity. • Generality. • Novelty. • Utility. • ...

  7. Interestingness measures • Subjective measures • Data and the user’s prior knowledge are considered. • Comprehensibility, novelty, surprisingness, utility. • Objective measures • The structure of an association rule is considered. • Conciseness, diversity, generality, peculiarity. • Example: Support • It represents the generality of a rule. • It counts the number of transactions containing both A and B.

  8. Drawbacks of objective measures • Detabase-dependence • Lack of knowledge about the database • Threshold dependence • Solution • Multiple database reanalysis • Problem • Large number of disk I/O Detabase-independence

  9. Genetic algorithm-based learning (ARMGA ) • Initialize population • Evaluate individuals in population • Repeat until a stopping criteria is met • Select individuals from the current population • Recombine them to obtain more individuals • Evaluate new individuals • Replace some or all the individuals of the current population by off-springs • Return the best individual seen so far

  10. ARMGA Modeling • Given an association rule XY • Requirement • Conf(XY) > Supp(Y) • Aim is to maximise

  11. ARMGA Encoding • Michigan Strategy • Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅. • For example • {A1,...,Aj}{Aj+1,...,Ak}

  12. ARMGA Encoding (Cont.) • The aforementioned encoding highly depends on the length of the chromosome. • We use another type of encoding: • Given a set of items {A,B,C,D,E,F} • Association rule ACFB is encoded as follows • 00A11B00C01D11E00F • 00: Item is antecedent • 11: Item is consequence • 01/10: Item is absent

  13. ARMGA Operators • Select • Crossover • Mutation

  14. ARMGA Operators-Select • Select(c,ps): Acts as a filter of the chromosome • C: Chromosome • Ps: pre-specified probability

  15. ARMGA Operators-Crossover • This operation uses a two-point strategy

  16. ARMGA Operators-Mutate

  17. ARMGA Initialization

  18. ARMGA Algorithm

  19. Empirical studies and Evaluation • Implement the entire procedure using Visual C++ • Use WEKA to produce interesting association rules • Compare the results

More Related