1 / 37

Process Mining : A Research Agenda

Process Mining : A Research Agenda. Group 2 M9301106  謝妹圜 M9401008  李宛柔 M9401304  陳志威 M9401402  林宜萱. Agenda. Preface Introduction to Process Mining Challenging Problems in Process Mining Differences in Mining Algorithms Special Issue Conclusion. Preface.

romeo
Download Presentation

Process Mining : A Research Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Mining:A Research Agenda Group 2M9301106 謝妹圜 M9401008 李宛柔 M9401304 陳志威 M9401402 林宜萱

  2. Agenda • Preface • Introduction to Process Mining • Challenging Problems in Process Mining • Differences in Mining Algorithms • Special Issue • Conclusion

  3. Preface • The evolution of enterprise information system: WFM  BPM  BPA Flexibility, diagnosis, and simulation are more important for information system. • The goal of process mining is to extract an explicit process model from event logs and also focuses on causal relations between activities.

  4. Process Mining (1/2) • Method:We can construct a process model by collecting a process log with data about the order that the events take place. Ex: Case 1 : A , B , C , D Case 2 : A , C , B , D Case 3 : A , B , C , D Case 4 : A , C , B , D Case 5 : E , F B & C are in parallel

  5. Process Mining (2/2) • We can deduce for example the process model: Case 1 : A , B , C , D Case 2 : A , C , B , D Case 3 : A , B , C , D Case 4 : A , C , B , D Case 5 : E , F Start with task A and finish with task D. After executing A, task B and C are in parallel.

  6. Challenging problems • Mining hidden tasks • Mining duplicate tasks • Mining non-free-choice constructs • Mining loops • Using time • Mining different perspectives • Dealing with noise • Dealing with incompleteness • Gathering data from heterogeneous sources • Visualizing results • Delta analysis

  7. Challenging problems-Mining hidden tasks Case 1 : A , B , C , D Case 2 : A , C , B , D Case 3 : A , B , C , D Case 4 : A , C , B , D Case 5 : E , F • Suppose that both A and D are removed, B and C are in parallel: • in this case it is still possible to construct a process model as below: We can detect that there are an AND-split & an AND-joint

  8. Challenging problems-Mining duplicate tasks • We can have a process model with two nodes referring to the same task, for example, task E is renamed to task B: • It is difficult to construct a process model as below: cause it’s not possible to distinguish the B from the Bs. Case 1 : A , B , C , D Case 2 : A , C , B , D Case 3 : A , B , C , D Case 4 : A , C , B , D Case 5 : B , F

  9. Challenging problems-Mining non-free-choice tasks • The Fig.4 below shows a non-free-choice construct: • After executing tasks C, there is a choice between 「 D 」 and「 E」, but it is “controlled” by the choice between 「 A 」and 「 B 」, so it’s not free-choice.

  10. Challenging problems-Mining loops (1/2) • In a process it may be possible to execute the same task multiple times. Fig.5 shows an example with a loop. • Possible events are: BD, BCD, BCCD, BCCCD... • Loops can also be used to jump back to any place in the process.

  11. Challenging problems-Mining loops (2/2) • There is a relation between loops and duplicate tasks. • In Fig.5 task A is executed multiple times (twice) but is not in a loop. Task A is different from task C. • Task A is a duplicate task as we mentioned before.

  12. Challenging problems-Using time • In many cases, the log of each event has a timestamp. The time information can be used for two purpose: • Adding time information to process model. First mine the process model while ignoring the timestamp, then “replay” the log in the process model, so it’s easy to calculate flow time, waiting time, and processing time. • Improve the quality of the discovered process model. If two events occur within a short time interval, it’s likely that there is some causal relation.

  13. Challenging problems-Mining different perspectives • Control-flow perspective • Ordering of tasks, usually Including timestamps • Organization perspective • Relations between roles & groups • Information perspective • Control data and production data • Application perspective • The applications being used to execute tasks

  14. Challenging problems-Dealing with noise • Noise: • Incorrectly logged information • The information we don’t need • The mining algorithm needs to distinguish exceptions from the “normal flow”. • Being robust with noise • Determine a threshold value to cut-off exceptions

  15. Challenging problems-Dealing with incompleteness • See the example as below: • If we change the process such that tasks C1~C9 are executed in parallel, then there are 10! possible routes. The log is likely to be incomplete.

  16. Challenging problems-Gathering data from heterogeneous sources • Events may be logged at several levels of parts of the system, for example, an ERP system like SAP. • It’s not easy to collect the event log for process mining. • One approach is to use a data warehouse which extract the information from these logs we need.

  17. Challenging problems-Visualizing results • Another challenge is to present the results of process mining in a way that people can gain insight in it. • ARIS PPM is used to display the performance such as flow time, work in progress, etc. in a way that is easy to understand.

  18. Challenging problems-Delta analysis • Delta analysis is used to compare the two models and explain the differences and commonalities. The two models are: • Descriptive or normative models - The model that has been drawn up by people before mining • Reference models - The model constructed after mining

  19. Differences in Mining Algorithms • A strong relation between the mining algorithm and the type of problems • To characterize a mining algorithm, we can start with a enumeration of the types of problems • Noise, incomplete logs, duplicate tasks….

  20. Data Mining and Process Mining (1/2) • Impossible to use existing data mining techniques directly for process mining • Most of the process mining techniques have some very specific properties • Process mining can be seen as a sub-domain of data mining • Inductive bias • Local-global dimension • Computational complexity • Memory requirement

  21. Data Mining and Process Mining (2/2) • Workflow logs can contain • Information about the attribute of cases • Actual route taken by a case • Traditional data mining • The mining of decision rules that predict the routing of a case • Process mining • Focus on mining the process model

  22. The Inductive Bias during Process Mining Algorithm(1/5) • Searching through a large space of possible models defined by the process representation language • The goal of search is to find the process model that best fits with the data in the workflow log

  23. The Inductive Bias during Process Mining Algorithm(2/5) • Process model representation language • Petri nets • Block-oriented process models • Event dependency models • Petri nets is a more expressive representation language

  24. The Inductive Bias during Process Mining Algorithm(3/5) • The negative effect that the size of the search space grows • Makes the mining technique more sensitive for noise • Needs more data for successful mining • Has a negative effect on the computational complexity and memory requirement

  25. The Inductive Bias during Process Mining Algorithm(4/5) • If we know that we are looking for a linear model and using linear regression as our modeling technique • A few data examples are appropriate • The approach is less sensitive for noise • The computing time is shorter than for the non linear case

  26. The Inductive Bias during Process Mining Algorithm(5/5) • If we know in advance which type of process model we are looking for and using this information during the selection of model representation language • We have a strong inductive bias

  27. The Local-Global Dimension (1/3) • Using different strategies to find the most appropriate model • Local strategies: step by step, local information • Markovian approach • Global strategies: one strike search, all traces in workflow - Genetic search

  28. The Local-Global Dimension (2/3) • The advantage of local strategies • Less complex from computational view point • Memory requirement is lower • The disadvantage of local strategies • The locally optimal steps won’t guarantee a globally optimal process model • For example: non-free-choice problem • The advantage of global strategies • More robust for noise

  29. The Local-Global Dimension (3/3) • Combine local and global strategies • A local search approach is used • A global check is performed on the whole model and all data in the workflow log

  30. Special Issue • Introduce 6 papers selected on process mining • The first 3 papers describe mining system in complete process models • The 4-th paper focus on the problem of the detection of concurrent behavior • The last 2 papers introduce information about some global properties

  31. Workflow Mining with InWoLvE • An overview of the algorithms implemented within the InWoLvE workflow mining system • InWoLvE solves the workflow mining problem in 2 steps • Create a stochastic activity graph from the example set • Transform this graph into a workflow model

  32. Mining Exact Models of Concurrent Workflow • An approach to mine exact workflow models from workflow logs • Using block-oriented representation language • Advantage • The property that resulting workflow models are always exact (complete, specific..) • Disadvantage • The inductive bias of the mining techniques

  33. Discovering Workflow Models from Activities’ Lifespans • A extension of the work of Agrawal with time information • Present 2 new algorithms for mining process models out of workflow logs • The number of excess and absent edges in the resulting graphs is smaller than the old algorithm

  34. Discovering Models of Behavior for Concurrent Workflow • Focus on concurrent behavior of process • A probability analysis of the workflow event traces • Discovery patterns by using metrics for the number, frequency, and regularity of event occurrences

  35. Discovery of Temporal Patterns from Process Instances • Focus on the discovering of frequently occurring temporal patterns • Define the temporal pattern discovery problem and evaluate 3 temporal pattern discovery algorithms

  36. Business Process Intelligence • BPI supports business and IT users in managing process execution quality • Provide several features • Analysis • Prediction • Monitoring • Control • Optimization

  37. Conclusion • Introduction to process mining • Illustrated the potential of process mining and challenging problems in process mining • Hidden tasks, duplicate tasks, non-free-choice constructs, loops, time, noise…and so on. • Trigger new research efforts to solve some problems

More Related