sequential patterns process mining
Download
Skip this Video
Download Presentation
Sequential Patterns & Process Mining

Loading in 2 Seconds...

play fullscreen
1 / 30

Sequential Patterns & Process Mining - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Sequential Patterns & Process Mining. Current State of Research Edgar de Graaf LIACS. Mining Sequential Patterns. Sequential Patterns Sequence Databases AprioriAll PrefixSpan Gap Constraints. Sequential Patterns. <(a,b)(c)(a,b,d)> < a 1 , a 2 , a 3 >

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sequential Patterns & Process Mining' - melania-taurus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sequential patterns process mining

Sequential Patterns&Process Mining

Current State of Research

Edgar de Graaf

LIACS

mining sequential patterns
Mining Sequential Patterns
  • Sequential Patterns
  • Sequence Databases
  • AprioriAll
  • PrefixSpan
  • Gap Constraints
sequential patterns
Sequential Patterns
  • <(a,b)(c)(a,b,d)>

< a1, a2, a3 >

  • <(3)(4,5)(8)> contained in <(7)(3,8)(9)(4,5,6)(8)>
  • <(3)(4,5)(8)> not contained in <(7)(3,8)(9)(4)(5,6)(8)>
sequential databases
Sequential databases

The Database with sequences

sequential databases1
Sequential databases

<(3)(4,5)(8)>

Support count

0

A Generated Candidate Pattern

sequential databases2
Sequential databases

<(3)(4,5)(8)>

Support count

0

1

sequential databases3
Sequential databases

Support count

1

<(3)(4,5)(8)>

Not Contained → Not Counted

sequential databases4
Sequential databases

Contained

Support count

1

2

3

4

5

Contained

Contained

IF Minimal Support ≤ 50% THEN <(3)(4,5)(8)> frequent

Contained

Contained

lifting order 1
Lifting order (1)
  • Notation by examples
    • <A,B,C>, a ordered list of sets ≡ sequence
    • Every set A,B and C is unordered. E.g. A = (x,y,z) = (y,z,x) = (z,y,x) = …
    • [x,y,z] is an extension: we ignore the order when counting frequency
lifting order 2
Lifting order (2)
  • <(t1)(t2)(t3)(t4)> and

<(t1)(t3)(t2)(t4)> frequent

<(t1)(t3,t2)(t4)> is frequent

  • Says: t3 and t2 occurs frequent in-between t1 and t4 in either order
lifting order 3
Lifting Order (3)
  • <(t1)(t2)(t3)(t4)> and

<(t1)(t3)(t2)(t4)> infrequent

suppose (t1)[t3,t2](t4) frequent

  • Says: often t3 and t2 occur in-between t1 and t4
existing algorithms
Existing Algorithms
  • AprioriAll: the first algorithm based on the anti-monotone principles
  • PrefixSpan: currently the fastest algorithm around, it uses projected databases
aprioriall 1
AprioriAll (1)

AprioriAll(DB, min_sup){

L1 = {frequent sequences size 1}

k = 2

while(Lk-1 is not empty){

Ck = candidateGeneration(Lk-1,k)

Ck = candidatePruning(Ck, k)

Lk = supportBasedPruning(Ck)

k++

}

}

prefixspan 1
PrefixSpan (1)

Assume that the prefix = <(a,b)(c)>

  • Scan de projected database to find every frequent item x such that
    • <(a,b)(c,x)> is frequent or
    • <(a,b)(c)(x)> is frequent
  • Append the x to the prefix and output the pattern
  • Now call recursively e.g. PrefixSpan(<(a,b)(c,x)> , newProjDB)
gap constraint
Gap Constraint
  • Simple idea: between sequence-item-sets a maximal distance
  • <(a)(c)(d)(e)>, e.g. pattern = <(a)(e)> and gap = 1 then this sequence is not counted
process mining
Process Mining
  • What is process mining?
  • Using D/F tables and graphs
  • Genetic Algorithms
  • Problem areas
  • Using sequential patterns
what is process mining 1
What is process mining? (1)
  • The ordering of events is known e.g. <(task A)(task B)(task C)>
  • Process mining constructs a petri net:

pay

ready

claim

register

to_be_evaluated

send_letter

Source: Workflow Management by W. van der Aalst and K. van Hee. (1997)

what is process mining 2
What is process mining? (2)
  • Usability of process mining:
    • Given the audit trails, what is the workflow network?
    • Mined workflow network ≡ original design? (Delta Analysis)
    • Mined workflow network better than the original design? (Performance Analysis)
using d f tables and graphs 1
Using D/F tables and graphs (1)
  • For every task a D/F table:
  • Intuition: if A is often followed by B then the probability of A causing B increases
using d f tables and graphs 2
Using D/F tables and graphs (2)
  • A D/F graph is constructed:

IF((A→B ≥ N) AND (A > B ≥ σ) AND

(B < A ≤ σ) THEN connection A to B

  • More complicated rules deal with recursion and short loops
genetic algorithms 1
Genetic Algorithms (1)
  • Create a initial population of workflows
  • Calculate their fitness using audit trails
  • Create a child
  • Mutate the child
  • Repeat 3 to 4 to create the new population
  • Go to 2
genetic algorithms 2
Genetic Algorithms (2)
  • Advantages:
    • Can deal with duplicate tasks and non-free choice.
  • Disadvantages:
    • The structure of the “chromosome”
    • How do we measure fitness?
    • How do we do cross-over and mutation?
problem areas 1
Problem Areas (1)
  • Hidden tasks:
  • Duplicate tasks: when tasks have the same name

B

C

problem areas 2
Problem Areas (2)
  • Mining non-free-choice

A

D

C

B

E

problem areas 3
Problem Areas (3)
  • Mining Loops:

ABCDBCD

A

D

B

C

problem areas 4
Problem Areas (4)
  • Delta analysis: how do we compare two models?
  • Other problems: time, dealing with noise and incompleteness.
using sequential patterns
Using sequential patterns
  • Mining loops?
  • Fitness measure in a GA?
  • Use in delta analysis?
  • Generate the important frequent subsequences to help the designer
further research in sequences
Further research in sequences
  • How about gaps between items in different item sets?
  • What type of frequent subsequences to use in fitness?
  • Lifting order, is it useful in workflow generation?
  • Further research of lifting order
the end
The End

Thank you for your attention

Edgar de Graaf

[email protected]

ad