1 / 16

PREFIXSPAN ALGORITHM

PREFIXSPAN ALGORITHM. Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. Contet. 1. Introduction. 2 . Problem statement. 3 . Existing Solutions. 4 . Proposed Solution. 5. A lgorithm. 6 . Conclusion. 7 . References. Introduction.

devaki
Download Presentation

PREFIXSPAN ALGORITHM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth al113301m@student.etf.rs

  2. Contet 1. Introduction 2. Problem statement 3. Existing Solutions 4. Proposed Solution 5. Algorithm 6. Conclusion 7. References al113301m@student.etf.rs

  3. Introduction al113301m@student.etf.rs • Given a set of sequences, where each sequence consists of a list of elements and each element consists of set of items. • <a(abc)(ac)d(cf)> - 5 elements, 9 items • <a(abc)(ac)d(cf)> - 9-sequence • <a(abc)(ac)d(cf)> ≠ <a(ac)(abc)d(cf)>

  4. Subsequence vs. super sequence al113301m@student.etf.rs • Given two sequences α=<a1a2…an> and β=<b1b2…bm>. • α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤j1<j2<…<jn≤msuch that a1⊆bj1, a2 ⊆bj2,…, an⊆bjn. • „β is a super sequence of α. • Example: β =<a(abc)(ac)d(cf)> Correct : α1=<aa(ac)d(c)> α2=<(ac)(ac)d(cf)> α3=<ac> Not correct: α4=<df(cf)> α5=<(cf)d> α6=<(abc)dcf

  5. Sequential Pattern Mining Solution – 53 frequent subsequences: <a><aa> <ab> <a(bc)> <a(bc)a> <aba> <abc> <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <ac> <aca> <acb> <acc> <ad> <adc> <af> <b> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf> <c> <ca> <cb> <cc> <d> <db> <dc> <dcb> <e> <ea> <eab> <eac> <eacb> <eb> <ebc> <ec> <ecb> <ef> <efb> <efc> <efcb> <f> <fb> <fbc> <fc> <fcb> min_support = 2 al113301m@student.etf.rs Find all the frequent subsequences, i.e. the subsequences whose occurrence frequency in the set of sequences is no less than min_support (user-specified).

  6. Existing Solutions • Apriori-like approaches (AprioriSome (1995.), AprioriAll (1995.), DynamicSome (1995.), GSP (1996.)): • Potentially huge set of candidate sequences, • „ Multiple scans of databases, • „ Difficulties at mining long sequential patterns. • FreeSpan (2000.) - pattern groth method (Frequent pattern-projected Sequential pattern mining)General idea is to use frequent items to recursively project sequence databases into a smaller projected databases ,and grow subsequence fragments in each projected database. • PrefixSpan (Prefix-projected Sequential pattern mining) • „ Less projections and quickly shrinking sequence. al113301m@student.etf.rs

  7. Prefix al113301m@student.etf.rs • Given two sequences α=<a1a2…an> and β=<b1b2…bm>, m≤n. • Sequence β is called a prefix of α if and only if: • bi= ai for i ≤ m-1; • bm ⊆ am; • Example : • α =<a(abc)(ac)d(cf)> • β =<a(abc)a>

  8. Projection al113301m@student.etf.rs • Given sequences α and β, such that β is a subsequence of α. • A subsequence α’ of sequence α is called a projection of α w.r.t. β prefix if and only if: • α’ has prefix β; • There exist no proper super-sequence α’’ of α’ such that:α’’ is a subsequence of α andalso has prefix β. • Example: • α =<a(abc)(ac)d(cf)> • β =<(bc)a> • α’ =<(bc)(ac)d(cf)>

  9. Postfix al113301m@student.etf.rs • Let α’ =<a1,a2…an> be the projection of α w.r.t. prefix β=<a1a2…am-1a’m> (m ≤n) „. • Sequence γ=<a’’mam+1…an> is called the postfix of α w.r.t. prefix β, denoted as γ= α/ β, where a’’m=(am - a’m).„ • We also denote α =β⋅ γ. • Example: • α’ =<a(abc)(ac)d(cf)>, • β =<a(abc)a>, • γ=<(_c)d(cf)>.

  10. PrefixSpan – Algorithm al113301m@student.etf.rs • Input of the algorithm : A sequence database S, and the minimum support threshold min_support. • „Output of the algorithm: The complete set of sequential patterns. • „Subroutine: PrefixSpan(α, L, S|α). • Parameters: • α: sequential pattern, • „L: the length of α; • „ S|α: : the α-projected database, if α ≠<>; otherwise; the sequence database S. • Call PrefixSpan(<>,0,S).

  11. PrefixSpan – Algorithm (2) al113301m@student.etf.rs Method: • 1. Scan S|α once, find the set of frequent items b such that: • b can be assembled to the last element of α to form a sequential pattern; or • <b> can be appended to α to form a sequential pattern. • 2. For each frequent item b: • append it to α to form a sequential pattern α’ and output α’; • output α’; • 3. For each α’: • construct α’-projected database S|α’ and • call PrefixSpan(α’, L+1, S|α’).

  12. PrefixSpan - Example <a><b><c><d><e><f> al113301m@student.etf.rs 1. Find length1sequential patterns: 2. Divide search space

  13. PrefixSpan – Example (2) <db> <dc> <dcb> al113301m@student.etf.rs Find subsets of sequential patterns:

  14. Conclusions al113301m@student.etf.rs • PrefixSpan • „ Efficient pattern growth method. • „ Outperforms both GSP and FreeSpan. • „ Explores prefix-projection in sequential pattern mining. • „ Mines the complete set of patterns,but reduces the effort of candidate subsequence generation. • „ Prefix-projection reduces the size of projected database and leads to efficient processing. • „ Bi-level projection and pseudo-projection may improve mining efficiency.

  15. References al113301m@student.etf.rs Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth”, 17th International Conference on Data Engineering (ICDE), April 2001. Agrawal R., Srikant R., “Mining sequential patterns”, Proceedings 1995 Int. Conf. Very Large Data Bases (VLDB’94), pp. 487-499, 1999. Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C., ”Freespan: Frequent pattern-projected sequential pattern mining”, Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pp. 355-359, 2000. WojciechStach, “http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput695-04/slides/PrefixSpan-Wojciech.pdf”.

  16. Thank you for attention. Questions?Lazar Arsić2011/3301AL113301m@student.etf.rs al113301m@student.etf.rs

More Related