1 / 19

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol

2. Problem statement. Find all frequent itemsetsFrequent itemsets: the itemsets above the minimum percentage of support requirementMaximal frequent itemsetsMain task of association mining All other frequent itemsets are subsets of maximal frequent itemsetsUse minimum support and minimum confide

adelie
Download Presentation

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 1 Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp. 372-390, May/June 2000

    2. 2 Problem statement Find all frequent itemsets Frequent itemsets: the itemsets above the minimum percentage of support requirement Maximal frequent itemsets Main task of association mining All other frequent itemsets are subsets of maximal frequent itemsets Use minimum support and minimum confidence to determine association rules

    3. 3 Itemset enumeration All subsets of a frequent itemsets are frequent The maximal frequent itemsets uniquely determine all frequent itemsets

    4. 4 Key features of six new algorithms Use vertical tid-list a data format where we associate with each item a list of transactions in which it occurs All frequent itemsets can be enumerated via simple tid-list intersection Use a lattice-theoretic approach to decompose original search space into smaller pieces if main memory is not enough Pre-fix-based approach Maximal-clique-based approach Search strategies: enumerating the frequent itemsets within each class Bottom-up search Top-down search Hybrid search Only a few database scans, minimizing the I/O costs

    5. 5 Austin: ????? Christie: ??????? Canon Doyle: ????Austin: ????? Christie: ??????? Canon Doyle: ????

    6. 6

    7. 7

    8. 8

    9. 9 If we do not have enough memory to enumerate all the frequent itemsets in the lattice, we need to decompose the whole lattice into pieces Prefix-based classes Recursive class decomposition Maximal-clique-based approach Smaller sub-lattices have fewer items and can save unnecessary intersections Graph theory Complete graph (clique)

    10. 10

    11. 11 Prefix-based classes: bottom-up search Bottom-up: search parents of frequent itemsetsBottom-up: search parents of frequent itemsets

    12. 12 Top-down: search children of infrequent itemsetsTop-down: search children of infrequent itemsets

    13. 13

    14. 14

    15. 15 Algorithm design and implementation New algorithms Eclat (Equivalence class transformation) Prefix-based with bottom-up search MaxEclat Prefix-based with hybrid search Clique Maximal-clique-based with bottom-up search MaxClique Maximal-clique-based with hybrid search TopDown Maximal-clique-based with top-down search AprClique Maximal-clique-based Horizontal layout Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice

    16. 16 Experimental Results

    17. 17

    18. 18

    19. 19

    20. 20 Conclusion Partition search space into small, independent subspace Decomposition can solve main-memory problem Prefix-based method Maximal-clique-based method Search strategies Bottom-up search Top-down search Hybrid search Entire process takes only few database scans Best performance of new algorithms MaxClique combine with hybrid search and maximal-clique-based decomposition

More Related