1 / 20

Mining Sequential Patterns with Constraints in Large Database

Mining Sequential Patterns with Constraints in Large Database. Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung. Introduction. In past studies, two problems remain:

romney
Download Presentation

Mining Sequential Patterns with Constraints in Large Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung

  2. Introduction • In past studies, two problems remain: • Many practical constraints are not covered • There lack a systematic method to push various constraints into the mining process • In this paper: • Develop a framework—Prefix-growth, is built based on a prefix-monotone property • The constraints can be effectively and efficiently pushed deep into sequential pattern mining under this new framework

  3. Categories of constraints • Item constraints • For example: • Length constraint • The number of transactions or occurrences of items… • For example:

  4. Categories of constraints (Cont.) • Super-pattern constraint where P is a given set of patterns • For example: • Aggregate constraint • Aggregate function: sum, avg, max, min,etc • For example: We like sequentialpatterns where average price of all the items in each pattern is over $100

  5. Categories of constraints (Cont.) • Regular expression constraints • Constraints specified as a regular expression • For example: • Duration constraints • Gap constraints • For example: Find purchasing patterns such that “the gap between each consecutive purchases is less than 1 month”

  6. Characterization of constraints • Anti-monotonic • If a sequence a satisfies C implies that every non-empty subsequence of a also satisfies C • For example:dur(a) < 3 • Monotonic • If a sequence a satisfies CMimplies that every super-sequence of a also satisfies CM • For example:len(a) >= 10, super-pattern constraints • Succinct constraint • For example: item-constraint

  7. Characterization of constraints (Cont.)

  8. Prefix-Monotone Property • Prefix anti-monotonic for each sequence a satisfying the constraint, so does every prefix of a • Prefix monotonic for each sequence asatisfying the constraint, so does every sequence having aas a prefix. • A constraint is calledPrefix-monotoneif it is prefix-monotonic or prefix monotonic.

  9. Theorem • All the commonly used constraint discussed above, except for g_sum and average, have prefix-monotone property

  10. Push Prefix-Monotone Constraints into Sequential Pattern Mining • Regular expression • Min_sup = 2

  11. are pruned!! Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) • Mining step: • find length-1 sequential and remove irrelevant sequence • Patterns <a>, <b>, <c>, <d>, <e> are identified as length-1 patterns, infrequent item <f> is removed • S_id = 10 is removed fail this constraint • divide the set of sequential patterns into subsets without overlap • prefix<a>, prefix<b>, prefix<c>, prefix<d>, prefix<e>

  12. Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) • construct <a>-projected database and mine it • SDB|<a>={<(_b)(bc)dd>, <(_e)(abc)(dd)>,<ddcb>} • Locally frequent items and satisfy the constraint: • prefix <ab>, prefix<ac>, prefix<ad> • recursive mining • To mining patterns with prefix <ab>、<ac>、<ad>, and form the projected database • Final pattern outputted • {<a(bc)d>, <add>}

  13. Handling Touch aggregate constraint • Constraint: • Min_sup = 2 • Item i called a small item if its value i.value <= 25, otherwise, it is called a big item

  14. Experimental results • Compare the efficiency of mining sequential patterns without constraint

  15. Experimental results (Cont.) • Compare the efficiency of mining sequential patterns with constraint • Capability of GSP and prefix-growth on pushing anti-monotone constraint (dur(a) <= t)

  16. Experimental results (Cont.) • Experimental results on mining with regular expression constraint

  17.  Scalability of prefix-growth with Constraint avg(a) ≤ v Number of projected databases in prefix-growth with Constraint avg(a) ≤ v  Experimental results (Cont.)

  18. Experimental results (Cont.) • Scalability of prefix-growth w.r.t. support threshold

  19. Experimental results (Cont.) • Scalability of prefix-growth w.r.t. database size

  20. Conclusion • Prefix-monotone property covers many commonly used constraints • Experiment results and performance study show that prefix-growth is efficient and scalable in mining large databases

More Related