Parallel Mining of Maximal Frequent Itemsets form Databases

Parallel Mining of Maximal Frequent Itemsets form Databases Soon M.Chunf and Congnan Luo Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’03)

Outline • Introduction • Max-Miner Algorithm • Parallel Max-Miner (PMM) Algorithm • Performance Evaluation • Conclusion

Introduction (1) • In mining association rules, the most time-consuming job is finding all frequent itemsets from a large database with respect to a given minimum support • In Apriori, the subset-infrequency based pruning step prevents many candidate k-itemsets from being counted in each pass k • In Apriori-like algorithms, if there is a frequent itemset with length l, then they will generate and count its 2l subsets.

Introduction (2) • Our basic idea is that if we find a large frequent itemset early, we can avoid counting all its subsets because they are all frequent • We propose a parallel algorithm, named Parallel Max-Miner (PMM), for mining maximal frequent items • The PMM requires multiple passes over the database, like the Count Distribution algorithm, need synchronization between nodes at every pass end

Max-Miner algorithm • Unlike Apriori, the Max-Miner algorithm extracts only the maximal frequent itemset • Superset-frequency based pruning • Max-miner always attempts to look ahead in order to identify large frequent itemsets early • So all subsets of these discovered frequent itemsets can be pruned form the search space

Set-enumeration tree of Max-Miner (1)

Set-enumeration tree of Max-Miner (2) • Each node in the tree is called a candidate group • A candidate group g consists of two components which are actually two itemsets • The first itemset is called the head of the group and denoted by h(g) • The second itemset is called the tail of the group and denoted by t(g) • t(g) is an ordered set and contains all the items not in h(g) but can potentially appear in any subnode derived from node g

The main procedure of Max-Miner (1) • From the root of the tree at level 0, count the support of 1-itemsets. • Only the 1-itemsets which are frequent can be enumerated at level 1 • 4 nodes are generated at level 1 if 1, 2, 3, and 4 are all frequent 1-itemsets • For the node g1, we need to count the support of {h(g1) t(g1)}={1,2,3,4} • If the support of {h(g1) t(g1)} is equal or greater than minsup, then we do not need to expand the tree from the node g1 anymore

The main procedure of Max-Miner (2) • At any node g, if {h(g) t(g)} is not frequent, for each item I in t(g), we check if {h(g) i} is frequent • If {h(g) i} is frequent, a corresponding subnode is generated • We notice that for a candidate group node g, if an item appears last in the tail of g in ordering, it will appear in most offsprings of the node g • To discover the maximal frequent itemsets early, we better order the subnodes of each node in ascending order of their support

Parallel Max-Miner (PMM) algorithm • The database is evenly divided into N partitions {D0, D1, D2, …, DN-1}, one for each of the N nodes {P0, P1, P2, …, PN-1} • Each node has the same number of transactions allocated • PMM requires multiple passes over database • For each pass k, all the nodes have exactly the same set of candidate groups, Ck. • Each node count the support of Ck in local database, independently • At the end of each pass, all nodes exchange the count information so that they can generate the same set of Ck-1 for the next pass

Performance Evaluation Speedup of PMM Sizeup of PMM

Conclusion • We proposed a parallel maximal frequent itemset mining algorithm, Parallel Max-Miner, for shared-nothing multiprocessor systems • Drawback: quire synchronization between nodes to exchange the count information at the end of every pass

Parallel Mining of Maximal Frequent Itemsets form Databases

Parallel Mining of Maximal Frequent Itemsets form Databases

Presentation Transcript

Algorithms for Mining Maximal Frequent Itemsets -- A Survey

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets

Mining Frequent Itemsets over Uncertain Databases

The Concept of Maximal Frequent Itemsets

CFI-Stream: Mining Closed Frequent Itemsets in Data Streams

Fast Algorithms for Mining Frequent Itemsets

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

Efficient Algorithms for Mining Share-Frequent Itemsets

Text clustering using frequent itemsets

Fast and Memory Efficient Mining of Frequent Closed Itemsets

Mining Frequent Itemsets over Uncertain Databases

Mining Approximate Frequent Itemsets in the Presence of Noise

Fast Algorithms for Mining Frequent Itemsets

Query Optimization of Frequent Itemset Mining on Multiple Databases

Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

Frequent Itemsets Mining in Distributed Wireless Sensor Networks

Fast Algorithms for Mining Frequent Itemsets