Sequential patterns process mining
Download
1 / 30

Sequential Patterns & Process Mining - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Sequential Patterns & Process Mining. Current State of Research Edgar de Graaf LIACS. Mining Sequential Patterns. Sequential Patterns Sequence Databases AprioriAll PrefixSpan Gap Constraints. Sequential Patterns. <(a,b)(c)(a,b,d)> < a 1 , a 2 , a 3 >

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sequential Patterns & Process Mining' - melania-taurus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sequential patterns process mining

Sequential Patterns&Process Mining

Current State of Research

Edgar de Graaf

LIACS


Mining sequential patterns
Mining Sequential Patterns

  • Sequential Patterns

  • Sequence Databases

  • AprioriAll

  • PrefixSpan

  • Gap Constraints


Sequential patterns
Sequential Patterns

  • <(a,b)(c)(a,b,d)>

    < a1, a2, a3 >

  • <(3)(4,5)(8)> contained in <(7)(3,8)(9)(4,5,6)(8)>

  • <(3)(4,5)(8)> not contained in <(7)(3,8)(9)(4)(5,6)(8)>


Sequential databases
Sequential databases

The Database with sequences


Sequential databases1
Sequential databases

<(3)(4,5)(8)>

Support count

0

A Generated Candidate Pattern


Sequential databases2
Sequential databases

<(3)(4,5)(8)>

Support count

0

1


Sequential databases3
Sequential databases

Support count

1

<(3)(4,5)(8)>

Not Contained → Not Counted


Sequential databases4
Sequential databases

Contained

Support count

1

2

3

4

5

Contained

Contained

IF Minimal Support ≤ 50% THEN <(3)(4,5)(8)> frequent

Contained

Contained


Lifting order 1
Lifting order (1)

  • Notation by examples

    • <A,B,C>, a ordered list of sets ≡ sequence

    • Every set A,B and C is unordered. E.g. A = (x,y,z) = (y,z,x) = (z,y,x) = …

    • [x,y,z] is an extension: we ignore the order when counting frequency


Lifting order 2
Lifting order (2)

  • <(t1)(t2)(t3)(t4)> and

    <(t1)(t3)(t2)(t4)> frequent

    <(t1)(t3,t2)(t4)> is frequent

  • Says: t3 and t2 occurs frequent in-between t1 and t4 in either order


Lifting order 3
Lifting Order (3)

  • <(t1)(t2)(t3)(t4)> and

    <(t1)(t3)(t2)(t4)> infrequent

    suppose (t1)[t3,t2](t4) frequent

  • Says: often t3 and t2 occur in-between t1 and t4


Existing algorithms
Existing Algorithms

  • AprioriAll: the first algorithm based on the anti-monotone principles

  • PrefixSpan: currently the fastest algorithm around, it uses projected databases


Aprioriall 1
AprioriAll (1)

AprioriAll(DB, min_sup){

L1 = {frequent sequences size 1}

k = 2

while(Lk-1 is not empty){

Ck = candidateGeneration(Lk-1,k)

Ck = candidatePruning(Ck, k)

Lk = supportBasedPruning(Ck)

k++

}

}


Prefixspan 1
PrefixSpan (1)

Assume that the prefix = <(a,b)(c)>

  • Scan de projected database to find every frequent item x such that

    • <(a,b)(c,x)> is frequent or

    • <(a,b)(c)(x)> is frequent

  • Append the x to the prefix and output the pattern

  • Now call recursively e.g. PrefixSpan(<(a,b)(c,x)> , newProjDB)


Gap constraint
Gap Constraint

  • Simple idea: between sequence-item-sets a maximal distance

  • <(a)(c)(d)(e)>, e.g. pattern = <(a)(e)> and gap = 1 then this sequence is not counted


Process mining
Process Mining

  • What is process mining?

  • Using D/F tables and graphs

  • Genetic Algorithms

  • Problem areas

  • Using sequential patterns


What is process mining 1
What is process mining? (1)

  • The ordering of events is known e.g. <(task A)(task B)(task C)>

  • Process mining constructs a petri net:

pay

ready

claim

register

to_be_evaluated

send_letter

Source: Workflow Management by W. van der Aalst and K. van Hee. (1997)


What is process mining 2
What is process mining? (2)

  • Usability of process mining:

    • Given the audit trails, what is the workflow network?

    • Mined workflow network ≡ original design? (Delta Analysis)

    • Mined workflow network better than the original design? (Performance Analysis)


Using d f tables and graphs 1
Using D/F tables and graphs (1)

  • For every task a D/F table:

  • Intuition: if A is often followed by B then the probability of A causing B increases


Using d f tables and graphs 2
Using D/F tables and graphs (2)

  • A D/F graph is constructed:

    IF((A→B ≥ N) AND (A > B ≥ σ) AND

    (B < A ≤ σ) THEN connection A to B

  • More complicated rules deal with recursion and short loops


Using d f tables and graphs 3
Using D/F tables and graphs (3)

  • D/F Graph example:


Genetic algorithms 1
Genetic Algorithms (1)

  • Create a initial population of workflows

  • Calculate their fitness using audit trails

  • Create a child

  • Mutate the child

  • Repeat 3 to 4 to create the new population

  • Go to 2


Genetic algorithms 2
Genetic Algorithms (2)

  • Advantages:

    • Can deal with duplicate tasks and non-free choice.

  • Disadvantages:

    • The structure of the “chromosome”

    • How do we measure fitness?

    • How do we do cross-over and mutation?


Problem areas 1
Problem Areas (1)

  • Hidden tasks:

  • Duplicate tasks: when tasks have the same name

B

C


Problem areas 2
Problem Areas (2)

  • Mining non-free-choice

A

D

C

B

E


Problem areas 3
Problem Areas (3)

  • Mining Loops:

    ABCDBCD

A

D

B

C


Problem areas 4
Problem Areas (4)

  • Delta analysis: how do we compare two models?

  • Other problems: time, dealing with noise and incompleteness.


Using sequential patterns
Using sequential patterns

  • Mining loops?

  • Fitness measure in a GA?

  • Use in delta analysis?

  • Generate the important frequent subsequences to help the designer


Further research in sequences
Further research in sequences

  • How about gaps between items in different item sets?

  • What type of frequent subsequences to use in fitness?

  • Lifting order, is it useful in workflow generation?

  • Further research of lifting order


The end
The End

Thank you for your attention

Edgar de Graaf

edegraaf@liacs.nl