1 / 30

Sequential Patterns & Process Mining

Sequential Patterns & Process Mining. Current State of Research Edgar de Graaf LIACS. Mining Sequential Patterns. Sequential Patterns Sequence Databases AprioriAll PrefixSpan Gap Constraints. Sequential Patterns. <(a,b)(c)(a,b,d)> < a 1 , a 2 , a 3 >

Download Presentation

Sequential Patterns & Process Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequential Patterns&Process Mining Current State of Research Edgar de Graaf LIACS

  2. Mining Sequential Patterns • Sequential Patterns • Sequence Databases • AprioriAll • PrefixSpan • Gap Constraints

  3. Sequential Patterns • <(a,b)(c)(a,b,d)> < a1, a2, a3 > • <(3)(4,5)(8)> contained in <(7)(3,8)(9)(4,5,6)(8)> • <(3)(4,5)(8)> not contained in <(7)(3,8)(9)(4)(5,6)(8)>

  4. Sequential databases The Database with sequences

  5. Sequential databases <(3)(4,5)(8)> Support count 0 A Generated Candidate Pattern

  6. Sequential databases <(3)(4,5)(8)> Support count 0 1

  7. Sequential databases Support count 1 <(3)(4,5)(8)> Not Contained → Not Counted

  8. Sequential databases Contained Support count 1 2 3 4 5 Contained Contained IF Minimal Support ≤ 50% THEN <(3)(4,5)(8)> frequent Contained Contained

  9. Lifting order (1) • Notation by examples • <A,B,C>, a ordered list of sets ≡ sequence • Every set A,B and C is unordered. E.g. A = (x,y,z) = (y,z,x) = (z,y,x) = … • [x,y,z] is an extension: we ignore the order when counting frequency

  10. Lifting order (2) • <(t1)(t2)(t3)(t4)> and <(t1)(t3)(t2)(t4)> frequent → <(t1)(t3,t2)(t4)> is frequent • Says: t3 and t2 occurs frequent in-between t1 and t4 in either order

  11. Lifting Order (3) • <(t1)(t2)(t3)(t4)> and <(t1)(t3)(t2)(t4)> infrequent suppose (t1)[t3,t2](t4) frequent • Says: often t3 and t2 occur in-between t1 and t4

  12. Existing Algorithms • AprioriAll: the first algorithm based on the anti-monotone principles • PrefixSpan: currently the fastest algorithm around, it uses projected databases

  13. AprioriAll (1) AprioriAll(DB, min_sup){ L1 = {frequent sequences size 1} k = 2 while(Lk-1 is not empty){ Ck = candidateGeneration(Lk-1,k) Ck = candidatePruning(Ck, k) Lk = supportBasedPruning(Ck) k++ } }

  14. PrefixSpan (1) Assume that the prefix = <(a,b)(c)> • Scan de projected database to find every frequent item x such that • <(a,b)(c,x)> is frequent or • <(a,b)(c)(x)> is frequent • Append the x to the prefix and output the pattern • Now call recursively e.g. PrefixSpan(<(a,b)(c,x)> , newProjDB)

  15. Gap Constraint • Simple idea: between sequence-item-sets a maximal distance • <(a)(c)(d)(e)>, e.g. pattern = <(a)(e)> and gap = 1 then this sequence is not counted

  16. Process Mining • What is process mining? • Using D/F tables and graphs • Genetic Algorithms • Problem areas • Using sequential patterns

  17. What is process mining? (1) • The ordering of events is known e.g. <(task A)(task B)(task C)> • Process mining constructs a petri net: pay ready claim register to_be_evaluated send_letter Source: Workflow Management by W. van der Aalst and K. van Hee. (1997)

  18. What is process mining? (2) • Usability of process mining: • Given the audit trails, what is the workflow network? • Mined workflow network ≡ original design? (Delta Analysis) • Mined workflow network better than the original design? (Performance Analysis)

  19. Using D/F tables and graphs (1) • For every task a D/F table: • Intuition: if A is often followed by B then the probability of A causing B increases

  20. Using D/F tables and graphs (2) • A D/F graph is constructed: IF((A→B ≥ N) AND (A > B ≥ σ) AND (B < A ≤ σ) THEN connection A to B • More complicated rules deal with recursion and short loops

  21. Using D/F tables and graphs (3) • D/F Graph example:

  22. Genetic Algorithms (1) • Create a initial population of workflows • Calculate their fitness using audit trails • Create a child • Mutate the child • Repeat 3 to 4 to create the new population • Go to 2

  23. Genetic Algorithms (2) • Advantages: • Can deal with duplicate tasks and non-free choice. • Disadvantages: • The structure of the “chromosome” • How do we measure fitness? • How do we do cross-over and mutation?

  24. Problem Areas (1) • Hidden tasks: • Duplicate tasks: when tasks have the same name B C

  25. Problem Areas (2) • Mining non-free-choice A D C B E

  26. Problem Areas (3) • Mining Loops: ABCDBCD A D B C

  27. Problem Areas (4) • Delta analysis: how do we compare two models? • Other problems: time, dealing with noise and incompleteness.

  28. Using sequential patterns • Mining loops? • Fitness measure in a GA? • Use in delta analysis? • Generate the important frequent subsequences to help the designer

  29. Further research in sequences • How about gaps between items in different item sets? • What type of frequent subsequences to use in fitness? • Lifting order, is it useful in workflow generation? • Further research of lifting order

  30. The End Thank you for your attention Edgar de Graaf edegraaf@liacs.nl

More Related