240 likes | 350 Views
AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. H.Cheng, P.S. Yu, and J.Han ICDM ’ 06 報告者:林靜怡 2007/01/17. Introduction. In real applications, a database contains random noise or measurement error some interesting patterns would previously be fragmented
E N D
AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery H.Cheng, P.S. Yu, and J.Han ICDM’06 報告者:林靜怡 2007/01/17
Introduction • In real applications, a database contains random noise or measurement error • some interesting patterns would previously be fragmented • discover approximate frequent itemsets in the presence of random noise
Definition • D:a transaction database take the form of an n x m binary matrix • I = be a set of all items • T:the set of transactions in D
Definition • :The exact support of an itemset x • : The exact supporting transactions of x • :the support of an approximate itemset x • : The supporting transactions of an approximate itemset x
approximate closed frequent itemset mining • The problem of approximate closed frequent itemset mining from core patternsis the mining of all itemsets which are (1) core patterns w.r.t.α (2) approximate frequent itemsets w.r.t. , and min sup (3) closed.
Approximate Closed Itemset Mining • Mine the set of core patterns with min_sup = αs • Treat core patterns as the initial seeds for possible further extension to approximate frequent itemsets • C:the set of core patterns • L:A lattice whichis built over C
Example • For core pattern ,the number of 0s allowed in a supporting transaction is • extension space for : traverse upward in the lattice for 2 levels (i.e., levels 2 and 3)
for a core itemset yand each sub-pattern , any transaction supporting x also approximately supports y • is the unionof the transaction set • Ex: is the union of the transaction sets of all itemsets at levels 2 and 3
identify candidate approximateitemsets • steps to identify candidate approximate itemsets include
Topdown Mining and Pruning by Closeness • effective pruning by the closeness definition and the min_sup threshold • starts with the largest pattern in L and proceeds level by level, in the size decreasing order of core patterns.
{0,2,4} {0,2,5} {2,5} {0,2,6} {2,6} {2,3,5,6} {0,2} {2} {2,5} {2,6}
Example • = {0,2,3,4,5,6} The number of 0s allowed in a transaction is => extension space includes its sub- patterns at level 2.
=> • prune{a,b,c} without actual computation: (1) if {a,b,c,d} satisfies the min_sup threshold and the constraint, then no matter whether it is closed or non-closed {a,b,c} can be pruned (2) if {a,b,c,d}does not satisfy the min_sup, then {a,b,c}can be pruned
Experiment • The IBM synthetic data generator • A dataset T10.I100.D20K is generated with 20K transactions, 100 distinct items and an average of 10 items per transaction