1 / 24

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. H.Cheng, P.S. Yu, and J.Han ICDM ’ 06 報告者:林靜怡 2007/01/17. Introduction. In real applications, a database contains random noise or measurement error some interesting patterns would previously be fragmented

oswald
Download Presentation

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery H.Cheng, P.S. Yu, and J.Han ICDM’06 報告者:林靜怡 2007/01/17

  2. Introduction • In real applications, a database contains random noise or measurement error • some interesting patterns would previously be fragmented • discover approximate frequent itemsets in the presence of random noise

  3. Definition • D:a transaction database take the form of an n x m binary matrix • I = be a set of all items • T:the set of transactions in D

  4. Definition • :The exact support of an itemset x • : The exact supporting transactions of x • :the support of an approximate itemset x • : The supporting transactions of an approximate itemset x

  5. Definition

  6. Definition

  7. approximate closed frequent itemset mining • The problem of approximate closed frequent itemset mining from core patternsis the mining of all itemsets which are (1) core patterns w.r.t.α (2) approximate frequent itemsets w.r.t. , and min sup (3) closed.

  8. Approximate Closed Itemset Mining • Mine the set of core patterns with min_sup = αs • Treat core patterns as the initial seeds for possible further extension to approximate frequent itemsets • C:the set of core patterns • L:A lattice whichis built over C

  9. Candidate Approximate Itemset Generation

  10. Example • For core pattern ,the number of 0s allowed in a supporting transaction is • extension space for : traverse upward in the lattice for 2 levels (i.e., levels 2 and 3)

  11. for a core itemset yand each sub-pattern , any transaction supporting x also approximately supports y • is the unionof the transaction set • Ex: is the union of the transaction sets of all itemsets at levels 2 and 3

  12. identify candidate approximateitemsets • steps to identify candidate approximate itemsets include

  13. Pruning by

  14. Topdown Mining and Pruning by Closeness • effective pruning by the closeness definition and the min_sup threshold • starts with the largest pattern in L and proceeds level by level, in the size decreasing order of core patterns.

  15. {0,2,4} {0,2,5} {2,5} {0,2,6} {2,6} {2,3,5,6} {0,2} {2} {2,5} {2,6}

  16. Example • = {0,2,3,4,5,6} The number of 0s allowed in a transaction is => extension space includes its sub- patterns at level 2.

  17. => • prune{a,b,c} without actual computation: (1) if {a,b,c,d} satisfies the min_sup threshold and the constraint, then no matter whether it is closed or non-closed {a,b,c} can be pruned (2) if {a,b,c,d}does not satisfy the min_sup, then {a,b,c}can be pruned

  18. Forward pruning

  19. Backward pruning

  20. Experiment • The IBM synthetic data generator • A dataset T10.I100.D20K is generated with 20K transactions, 100 distinct items and an average of 10 items per transaction

More Related