Online Mining (Recently) Maximal Frequent Itemsets over Data Streams

Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA’05 speaker：董原賓 Advisor：柯佳伶

Introduction • Difficulties of Data Stream Mining • Huge • High speed • Continuous • Solution：one-pass algorithm • Summary data structure • Mines the maximal frequent itemsets

W1 abc bcd acd W2 cd abd bc WN a b cd ··· time Definition • Ψ= {i1, i2, …, in} ：a set of items • Wi：basic window i • Data stream= [W1, W2, …, WN)：an infinite sequence of basic windows • N：the window identifier of the latest basic window • Current length of data stream (CL) = |W1| + |W2| +…+ |WN| CL = 3xN

Definition • X.tsup：true supportof itemset X • X.esup：estimated supportof itemset X, 1 ≤ X.esup ≤ X.tsup • X.CL = |Wj|+|Wj+1|+…+|WN| • Wj：the first window containing X in the summary data structure • S：minimum support • ε：maximum support error threshold

Data Stream Mining for maximal Frequent Itemsets (DSM-MFI) • Step1, reads a window of transactions • Step2, constructs and maintains the summary data structure • Step3, prunes the infrequent information • Step4, searches the maximal frequent itemsets

Summary Frequent Itemsets forest (SFI-forest) • Composed of a FI-list and a set of SFI-trees • SFI-trees • item-id, the item identifier • esup, the number of transactions reaching the node with the item-id • window-id, assigned to a new node of the current basic window identifier • node-link, links to the next node with the same item-idin the same SFI-tree

Summary Frequent Itemsets forest (SFI-forest) • FI-list • item-id, the item identifier • esup, the number of transactions containing the item • window-id, assigned to a new entry of the current basic window identifier • head link, links to the root node of the item-id.SFI-tree

Summary Frequent Itemsets forest (SFI-forest) • Each SFI-tree has a specific opposite frequent item list (OFI-list) • OFI-list • (item-id, esup, window-id, head link) • head link links to the first node carrying the item-id in the SFI-tree

Example Transaction Projection (T)  abc bc c W1 abc bcd acd T = abc FI-list X = a X = b X = c (item-id, esup, window-id, node link) (1,1,1) a.SFI-tree a.OFI-list X = b X = c (2,1,1) (2,1,1) 1:1:1 2:1:1 3:1:1 (3,1,1) (3,1,1) b.SFI-tree 2:1:1 3:1:1 b.OFI-list c.SFI-tree (3,1,1) 3:1:1 c.OFI-list SFI-tree-maintenance (abc) SFI-tree-maintenance (bc) SFI-tree-maintenance (c)

Example Transaction Projection (T)  bcd cd d W1 abc bcd acd T = bcd FI-list X = b X = d X = c (item-id, esup, window-id, node link) a.SFI-tree (1,1,1) b.OFI-list X = d X = c b.SFI-tree (2,1,1) (2,1,2) (3,1,2) (3,1,1) 2:1:2 2:1:1 3:1:2 3:1:1 4:1:1 (3,1,2) (3,1,1) (4,1,1) (4,1,1) c.SFI-tree c.OFI-list 3:1:2 3:1:1 4:1:1 (4,1,1) d.SFI-tree d.OFI-list 4:1:1 SFI-tree-maintenance (cd) SFI-tree-maintenance (d) SFI-tree-maintenance (bcd)

Example Transaction Projection (T)  acd cd d W1 abc bcd acd T = acd FI-list X = d X = a X = c (item-id, esup, window-id, node link) (1,1,2) (1,1,1) a.SFI-tree a.OFI-list X = c X = d (2,1,2) 1:1:2 1:1:1 2:1:1 3:1:1 (2,1,1) (3,1,2) (3,1,3) (3,1,2) (3,1,1) 3:1:1 (4,1,2) (4,1,1) (4,1,1) 4:1:1 d.SFI-tree c.SFI-tree b.SFI-tree SFI-tree-maintenance (acd)

Pruning infrequent items from SFI-forest • X：1-itemset in the FI-list • if X.esup < X.CL*ε then X and its supersets are deleted from SFI-forest • Step • 1 deletes • item-id.OFI-list • item-id.SFI-tree • the entry with item-id from the FI-list • 2 removes the infrequent item from other OFI-lists by traversing the FI-list

Pruning infrequent items from SFI-forest • 3 deletes the infrequent item from other SFI-trees • 4 reconstructs SFI-trees by reinserting these modified item-suffix transactions or join the remainder subtrees into SFI-tree

Example a.CL = b.CL = c.CL = d.CL = 12 s= 0.3, ε= 0.2 FI-list (1,1,3) (2,1,2) (3,1,3) (4,1,3) b.SFI-tree c.SFI-tree d.SFI-tree 12 x 0.2 = 2.4 a.SFI-tree 1:1:3 2:1:2 3:1:3 4:1:3 2:1:1 3:1:1 3:1:2 3:1:1 3:1:2 4:1:2 d.OFI-list 3:1:1 4:1:1 4:1:1 (4,1,2) c.OFI-list a.OFI-list (2,1,1) (3,1,2) (3,1,2) (4,1,1) b.OFI-list (4,1,1)

Determining maximal frequent itemsets • There are k frequent 1-itemsets, e1, e2, …, ek, in the FI-list • o1, o2, …, oj, the items in the ei.OFI-list • Generates a candidate maximal frequent (j+1)-itemset, E = (ei, o1, o2, …, oj) • starts from a frequent item with the smallest estimated support • traverses the path via node link to count E’s estimated support

Determining maximal frequent itemsets • if E.esup≥ s．ei.CL then E is MFI • else enumerate E into itemsets with size |E|−1 • until finds the set of all maximal frequent itemsets with respect to entry e

Example a.CL = b.CL = c.CL = d.CL = 5 s= 0.3, ε= 0.2 FI-list (1,1,3) (2,1,2) (3,1,3) (4,1,3) b.SFI-tree c.SFI-tree d.SFI-tree 5 x 0.3 = 1.5 a.SFI-tree 1:1:3 2:1:2 3:1:3 4:1:3 2:1:1 3:1:1 3:1:2 4:1:2 d.OFI-list 3:1:1 4:1:1 4:1:1 Caculate support (bc) Caculate support (bcd) = 1 (4,1,2) c.OFI-list a.OFI-list (2,1,1) (3,1,2) (3,1,2) (4,1,1) b.OFI-list (4,1,1)

Sliding Window Mining over Data Streams • Modifications： • uses DSM-MFI algorithm to construct a SFI-forest i for each basic window Wi • find local maximal frequent itemsets (local MFIi), all local MFI are stored in a queue • global MFI-list store all local MFI from W1 to WN

Sliding Window Mining over Data Streams • When basic window N+1 arrives • removes the local MFI 1 from the queue • subtracts the support of the local MFI 1 from the global MFI • uses DSMMFI algorithm to mine all local maximal frequent itemsets of WN+1 • Increases the support of global MFI or insert local MFIN+1 into it

Experiment • 1GHz IBMx24, 384MB, Visual C++ 6.0 • s= 0.1%, ε= 0.01%. • IBM synthetic datasets • T10.I5.D1000K • T30.I20.D1000K • the data is broken into 20 basic windows for simulating the streaming data

Experiment

Online Mining (Recently) Maximal Frequent Itemsets over Data Streams

Online Mining (Recently) Maximal Frequent Itemsets over Data Streams

Presentation Transcript

Introducing Mining Proposal Online

Recently Extinct Animals

RECENTLY OBSERVED CHANGES

Mining and Fracking in Wisconsin

Features introduced recently

Recently Extinct Animals

ArcGIS Online for Organizations

Recently Published Documents

Recently Report

recently

Recently Sold

More recently…

SPRINGER ONLINE springerlink

BEST PRACTICES IN SMALL SCALE GEMSTONE MINING

Book Local Train Ticket online

Having Personal Background Check Online

Online Roulette - Play Free Roulette Online

What is Bitcoin Mining | Platinum Trading Institute

By cannabis online

Tinder Free Online

Chemistry Homework Help Online - myassignmenthelp.com

NCL Paramedical Post Online Recruitment Form 2020