Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol

1. 1 Scalable Algorithms for Association MiningMohammed J. ZakiIEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp. 372-390, May/June 2000

2. 2 Problem statement Find all frequent itemsets Frequent itemsets: the itemsets above the minimum percentage of support requirement Maximal frequent itemsets Main task of association mining All other frequent itemsets are subsets of maximal frequent itemsets Use minimum support and minimum confidence to determine association rules

3. 3 Itemset enumeration All subsets of a frequent itemsets are frequent The maximal frequent itemsets uniquely determine all frequent itemsets

4. 4 Key features of six new algorithms Use vertical tid-list a data format where we associate with each item a list of transactions in which it occurs All frequent itemsets can be enumerated via simple tid-list intersection Use a lattice-theoretic approach to decompose original search space into smaller pieces if main memory is not enough Pre-fix-based approach Maximal-clique-based approach Search strategies: enumerating the frequent itemsets within each class Bottom-up search Top-down search Hybrid search Only a few database scans, minimizing the I/O costs

5. 5 Austin: ????? Christie: ??????? Canon Doyle: ????Austin: ????? Christie: ??????? Canon Doyle: ????

6. 6

7. 7

8. 8

9. 9 If we do not have enough memory to enumerate all the frequent itemsets in the lattice, we need to decompose the whole lattice into pieces Prefix-based classes Recursive class decomposition Maximal-clique-based approach Smaller sub-lattices have fewer items and can save unnecessary intersections Graph theory Complete graph (clique)

10. 10

11. 11 Prefix-based classes: bottom-up search Bottom-up: search parents of frequent itemsetsBottom-up: search parents of frequent itemsets

12. 12 Top-down: search children of infrequent itemsetsTop-down: search children of infrequent itemsets

13. 13

14. 14

15. 15 Algorithm design and implementation New algorithms Eclat (Equivalence class transformation) Prefix-based with bottom-up search MaxEclat Prefix-based with hybrid search Clique Maximal-clique-based with bottom-up search MaxClique Maximal-clique-based with hybrid search TopDown Maximal-clique-based with top-down search AprClique Maximal-clique-based Horizontal layout Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice

16. 16 Experimental Results

17. 17

18. 18

19. 19

20. 20 Conclusion Partition search space into small, independent subspace Decomposition can solve main-memory problem Prefix-based method Maximal-clique-based method Search strategies Bottom-up search Top-down search Hybrid search Entire process takes only few database scans Best performance of new algorithms MaxClique combine with hybrid search and maximal-clique-based decomposition

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol

Presentation Transcript

Scalable Data Mining

Fast Algorithms for Mining Association Rules

Data Mining Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Rules: Advanced Concepts and Algorithms

Fast Algorithms for Mining Association Rules

Fast Algorithms For Mining Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

IEEE Transactions on Automation Science & Engineering

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE (2009)

Data Mining, Data Warehousing and Knowledge Discovery Basic Algorithms and Concepts

Fast Algorithms for Mining Association Rules

IEEE Transactions on Automation Science and Engineering

Fast Algorithms for Mining Association Rules

Fast Algorithms for Mining Association Rules *

Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering

Fast Algorithms for Mining Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15 NO.5, SEPTEMBER/OCTOBER 2003

Algorithms for Mining Association Rules

Scalable Algorithms for Association Mining

Data Mining Algorithms

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol