1 / 16

COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach

COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach. Kuo-Yu Hung, Chia-Hui Chang, Jiun-Hung Tung, Cheng-Tao Ho DaWak 2006. Outline. Introduction Problem Definition COBRA algorithm pruning strategies design and implementation Experimental result Conclusion.

brandy
Download Presentation

COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach Kuo-Yu Hung, Chia-Hui Chang, Jiun-Hung Tung, Cheng-Tao Ho DaWak 2006

  2. Outline • Introduction • Problem Definition • COBRA algorithm • pruning strategies • design and implementation • Experimental result • Conclusion

  3. Introduction • CloSpan ,BIDE: adopt the framework of PrefixSpan by itemset extension and sequence extension the last transaction of the current sequence is extended with a frequent item in the same transaction or different transaction • Drawback :duplicate item extensions expensive matching cost

  4. Problem Definition • Absorb: α is a super-sequence of β and their supports are the same-> α absorbs β • Closed sequential pattern: a sequential pattern β if there exists no proper sequence α that absorb β

  5. Problem Definition(cont.) • Sequence support: All subsets of {A,B,C} has sequence support 4 • Transaction support: Itemset {B}=8=itemset {B,C} Itemset {A,B} {A,C}=5=itemset {A,B,C} {A},{C},{B,C},{A,B,C} are frequent closed itemset A closed sequential pattern is composed of only closed itemsets minsup=3

  6. 3 major phase of COBRA algorithm • 1-phase :mining closed frequent itemset use CHARM • 2-phase :Database Encoding Vertical-base and Horizontal-base • 3-phase :Mining Closed sequential pattern

  7. The COBRA algorithm C.F.I: Closed Frequent Itemset follows the idea of PrefixSpan, the locally frequent (extendable) codes in the projected database of a prefix sequence are the frequent C.F.I (closed frequent code) FML: First Matched Transaction list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Vertical-based LocationList and FML

  8. Pruning strategies • Layer pruning :prune non-closed sequences during sequence extension step of a prefix sequence #1.FML:{2,6,10,14} #2.FML:{2,5,8,14} #3.FML:{2,6,9,13} #4.FML:{1,5,8,12} #1.FML> L#4.FML and #3.FML>L#4.FML Skip prefix #1 and #3

  9. 2 • Reduce the cost of comparing any two FMLs (a total of O(|C.F.I|)) only C.F.Is that are hashed to the same bucket are compared to each other

  10. EL: Extended list 1 2 3 4 5 6 7 8 9 10 11 #2.EL={3,6,9} #4.EL={2,6,9,13} The number of transactions in the EL represents the largest support an extended sequence of α can have 12 13 14

  11. PDB (Projected database) 1 2 3 4 5 6 7 8 9 10 11 #2.PDB={3,4,6,7,9,10,11} #4.PDB={2,3,4,6,7,9,10,11,13,14} 12 13 14

  12. No super-sequence of α can be generated as frequent patterns The supports of all super-sequence of α are less than α 3-phase: ext-pruning No extendable codes with the same support as α

  13. ExtPruning: for two sequential patterns α and β, the rule of ExtPruning state that 1.if α.FML= L β.FML and α is a super sequence of β, then remove β and vice versa 2.if Sup(α)=Sup(β) and α is a super sequence of β, then β is not a closed pattern, vice versa

  14. Experimental Result

  15. Conclusion • COBRA cost more memory but less time

More Related