Mining Frequent Closed Cubes in 3D Datasets

Mining Frequent Closed Cubes in 3D Datasets Liping Ji Kian-Lee Tan Anthony K. H. Tung Computer Science Department National University of Singapore

Motivation • Frequent Closed Pattern (FCP) Mining: great importance, wide application • Previous works all limited to 2D FCP mining biological data: gene-time, gene-sample market basket data: transanction-itemset • Extend the 2D FCP mining to the 3D context biological data: gene-sample-time marketing data: region-time-items

Background • Frequent Pattern (FP) and Frequent Closed Pattern (FCP) minimum support threshold: minsup=2 Itemsets t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 Transactions

Background • Frequent Pattern (FP) and Frequent Closed Pattern (FCP) minimum support threshold: minsup=2 Itemsets t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 FCP Transactions FP

Background • Binary Mapping I t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 T

Frequent Closed Cube • 3D Dataset Height Slice Row Column

Frequent Closed Cube • Slices by Height Dimension h3 h1 h2

Frequent Closed Cube • Closed Cube: Maximal h3 h1 h2

Frequent Closed Cube • Definition: Frequent Closed Cube (FCC) • Maximal: cannot be extended in any dimension • Frequent: satisfy minH, minR, minC threshods

Frequent Closed Cube • Definition: Frequent Closed Cube (FCC) • Maximal: cannot be extended in any dimension • Frequent: satisfy minH, minR, minC thresholds

RSM vs. CubeMiner • Representative Slice Mining (RSM) extend existing 2D FCP mining algorithms for FCC mining • CubeMiner operate on the 3D space directly

RSM • Representative Slice (RS) Generation enumerate all possible combination of slices • 2D FCP Mining from each RS • Post-pruning to Remove Unclosed Cubes If a 2D FCP is contained in other slices besides its contributing slices, it is unclosed and hence removed; otherwise, it is retained.

RSM • Slices by Height Dimension h3 h1 h2

RSM

RSM • Slices by Height Dimension h3 h1 h2

CubeMiner Principle

CubeMiner: Cutters Slice h1 Cutters from h1

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 Cutter Checking: A. Cutter Checking: check if the Cutter is applicable (A.) • Subset of the node: A. • Otherwise: N.A.

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) Left Tree: remove Cutter’s left atom h1 from parent node

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) Middle Tree: remove Cutter’s middle atom r1 from parent node

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) Right Tree: remove Cutter’s right atom c4 from parent node

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 h1 ,r2, c4c5 N.A. A. A. Next Cutter: checking

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5 ) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3 )

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5 ) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3 ) Subset Cube

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3)

Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 Left Track Checking h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3)

Parallelism • RSM • Task: mining of each Representative Slice • CubeMiner: • Task: mining of each branch • Processor: • Initial: keep a copy of the whole dataset • Independent and concurrent with few communication cost

Mining FCC: Experiments • Real yeast cell-cycle regulated genes • Elutriation Experiments: 14*9*7161 • CDC15 Experiments: 19*9*7761 • Synthetic Data: IBM data generator • Synthetic 1: H*R*C=(8~20)*20*1000 • Synthetic 2: H*R*C=100*100*10000

Experiments: Optimize CubeMiner • Optimal: sort slices by zero decreasing order • Prune off infrequent cubes early Elutritration(14*9*7161)

Experiments: Optimize RSM • Optimal: enumerate slices by the smallest dimension • Slice enumeration takes relatively long processing time Elutritration(14*9*7161)

Experiments: RSM vs. CubeMiner With the increase of the smallest dimension, CubeMiner outperforms RSM Synthetic Data (vary size of height dimension)

Experiments: Parallelism • As the degree of parallelism increases, the response time decreases. • Optimal number of processors CDC15 (Vary Number of Processors)

Conclusion • Notion of Frequent Closed Cube • RSM: efficient when one of the dimension is small • CubeMiner: superior for large datasets • Parallel RSM and CubeMiner

Thank You!

Mining Frequent Closed Cubes in 3D Datasets

Mining Frequent Closed Cubes in 3D Datasets

Presentation Transcript

Frequent Item Mining

Frequent Pattern Mining

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets

Efficient Algorithms for Mining Maximal Frequent Concatenate Sequences In Biological Datasets

Frequent Structure Mining

Challenges in Mining Large Image Datasets

Frequent Pattern Mining in Data Streams

On Frequent Chatters Mining

Mining Frequent Patterns

CFI-Stream: Mining Closed Frequent Itemsets in Data Streams

Frequent Subgraph Mining

Mining Frequent Subgraphs

Mining Frequent Subgraphs

Deconvolution of 3D data cubes

Chapter 4 – Frequent Pattern Mining

MapReduce -based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

Fast and Memory Efficient Mining of Frequent Closed Itemsets

CloseGraph : Mining Closed Frequent Graph Patterns

Efficiently Mining Frequent Trees in a Forest

Frequent Pattern Mining

Mining Frequent Closed Cubes in 3D Datasets