Loading in 5 sec....

Sequential Patterns & Process MiningPowerPoint Presentation

Sequential Patterns & Process Mining

- 151 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Sequential Patterns & Process Mining' - melania-taurus

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Mining Sequential Patterns

- Sequential Patterns
- Sequence Databases
- AprioriAll
- PrefixSpan
- Gap Constraints

Sequential Patterns

- <(a,b)(c)(a,b,d)>
< a1, a2, a3 >

- <(3)(4,5)(8)> contained in <(7)(3,8)(9)(4,5,6)(8)>
- <(3)(4,5)(8)> not contained in <(7)(3,8)(9)(4)(5,6)(8)>

Sequential databases

The Database with sequences

Sequential databases

Contained

Support count

1

2

3

4

5

Contained

Contained

IF Minimal Support ≤ 50% THEN <(3)(4,5)(8)> frequent

Contained

Contained

Lifting order (1)

- Notation by examples
- <A,B,C>, a ordered list of sets ≡ sequence
- Every set A,B and C is unordered. E.g. A = (x,y,z) = (y,z,x) = (z,y,x) = …
- [x,y,z] is an extension: we ignore the order when counting frequency

Lifting order (2)

- <(t1)(t2)(t3)(t4)> and
<(t1)(t3)(t2)(t4)> frequent

→

<(t1)(t3,t2)(t4)> is frequent

- Says: t3 and t2 occurs frequent in-between t1 and t4 in either order

Lifting Order (3)

- <(t1)(t2)(t3)(t4)> and
<(t1)(t3)(t2)(t4)> infrequent

suppose (t1)[t3,t2](t4) frequent

- Says: often t3 and t2 occur in-between t1 and t4

Existing Algorithms

- AprioriAll: the first algorithm based on the anti-monotone principles
- PrefixSpan: currently the fastest algorithm around, it uses projected databases

AprioriAll (1)

AprioriAll(DB, min_sup){

L1 = {frequent sequences size 1}

k = 2

while(Lk-1 is not empty){

Ck = candidateGeneration(Lk-1,k)

Ck = candidatePruning(Ck, k)

Lk = supportBasedPruning(Ck)

k++

}

}

PrefixSpan (1)

Assume that the prefix = <(a,b)(c)>

- Scan de projected database to find every frequent item x such that
- <(a,b)(c,x)> is frequent or
- <(a,b)(c)(x)> is frequent

- Append the x to the prefix and output the pattern
- Now call recursively e.g. PrefixSpan(<(a,b)(c,x)> , newProjDB)

Gap Constraint

- Simple idea: between sequence-item-sets a maximal distance
- <(a)(c)(d)(e)>, e.g. pattern = <(a)(e)> and gap = 1 then this sequence is not counted

Process Mining

- What is process mining?
- Using D/F tables and graphs
- Genetic Algorithms
- Problem areas
- Using sequential patterns

What is process mining? (1)

- The ordering of events is known e.g. <(task A)(task B)(task C)>
- Process mining constructs a petri net:

pay

ready

claim

register

to_be_evaluated

send_letter

Source: Workflow Management by W. van der Aalst and K. van Hee. (1997)

What is process mining? (2)

- Usability of process mining:
- Given the audit trails, what is the workflow network?
- Mined workflow network ≡ original design? (Delta Analysis)
- Mined workflow network better than the original design? (Performance Analysis)

Using D/F tables and graphs (1)

- For every task a D/F table:
- Intuition: if A is often followed by B then the probability of A causing B increases

Using D/F tables and graphs (2)

- A D/F graph is constructed:
IF((A→B ≥ N) AND (A > B ≥ σ) AND

(B < A ≤ σ) THEN connection A to B

- More complicated rules deal with recursion and short loops

Using D/F tables and graphs (3)

- D/F Graph example:

Genetic Algorithms (1)

- Create a initial population of workflows
- Calculate their fitness using audit trails
- Create a child
- Mutate the child
- Repeat 3 to 4 to create the new population
- Go to 2

Genetic Algorithms (2)

- Advantages:
- Can deal with duplicate tasks and non-free choice.

- Disadvantages:
- The structure of the “chromosome”
- How do we measure fitness?
- How do we do cross-over and mutation?

Problem Areas (4)

- Delta analysis: how do we compare two models?
- Other problems: time, dealing with noise and incompleteness.

Using sequential patterns

- Mining loops?
- Fitness measure in a GA?
- Use in delta analysis?
- Generate the important frequent subsequences to help the designer

Further research in sequences

- How about gaps between items in different item sets?
- What type of frequent subsequences to use in fitness?
- Lifting order, is it useful in workflow generation?
- Further research of lifting order

Download Presentation

Connecting to Server..