Discovering Social Networks from Enterprise Data

Discovering Social Networks from Enterprise Data Laks V.S. Lakshmanan Based on: Wil M.P. van der Aalst, Hajo A. Reijers, Minseok Song. Discovering Social Networks from Event Logs. Full version of paper in Business Process Management (BPM) 2004.

General Remarks • Discovering/mining SN from (some) data vs. mining a given SN for extracting some value (we’re talking about the former here). • What kind of data: • Can be email • Event log from a business process (this paper) • Video capturing interactions • Event log – also called audit trail, history, transaction file. • Project opportunity here.

Model • Events = {(case, activity, person), ... ordered by time}. • Case (process instance) = “thing” being handled: e.g., customer order, job app., building permit, license app., loan app, insurance claim etc. • Activity (task, operation, action, work-item) = some operation performed on the case by a person: e.g., contact customer, check credit rating, contact references, visit site etc. • What do you look for?: • Is there a handover? (e.g., (c,a1,p1)(c,a2,p2)). • Does it happen often enough? • What are the org roles of persons involved?

Process Mining – A Related Area • Given an event log, mine a process, which can be: • A Petri net. • A model with org/temporal/info/social aspects (most relevant to us). • Here is an example log and an example social graph we can extract right away, based on handover or “immediately followed by” on a case.

An example event log • case activity performer • case 1 activity A John • case 2 activity A John • case 3 activity A Sue • case 3 activity B Carol • case 1 activity B Mike • case 1 activity C John • case 2 activity C Mike • case 4 activity A Sue • case 2 activity B John • case 2 activity D Pete • case 5 activity A Sue • case 4 activity C Carol • case 1 activity D Pete • case 3 activity C Sue • case 3 activity D Pete • case 4 activity B Sue • case 5 activity E Clare • case 5 activity D Clare • case 4 activity D Pete

Social Graph Mined Mined graph could be enhanced with (relative frequencies), handoff delays [not mentioned in paper]. SNA can be done on it: who are the “power centers”? Which are the cliques? Can we (org) enable other interactions to improve efficiency? ... John Sue Clare Mike Pete Carol This graph need not be viewed purely conjunctively (my thoughts). E.g., “Jack always hands over to Jane or Peter, depending on activity type, or case type” (assuming meta-data on both. Most interesting case: when meta-data+timing info. is available. )

Some challenges • (in)completeness: log may not exhibit all possible orders (when concurrency is in the underlying model); rare occurrences and exceptions (both + and –) should be handled with care. • noise: data could be missing and/or erroneous. • legal issues: affect quality/utility/granularity of data available, if at all.

Social Network Analysis • Here are some interesting measuremenets one can make on the mined SN. • Convention: distance for us = distance of geodesic, unless otherwise stated; duv = distance between u and v. • What is the density of the whole graph (sociocentric) or of a person’s neighborhood network (egocentric)?: density = #edges/possible no.; what is the diameter? • What is the average distance of v to other nodes? What proportion of the geodesics between other node pairs passes through v?

SNA (contd.) • Bavelas-Leavitt index of centrality of node u, BL(u) = ∑v,w dvw / ∑v,w (dvu + duw). • captures how much a shortest route through u stretches an “average” geodesic. Paper doesn’t say this, but makes more sense w/ “v≠u≠w, v≠w”. Will assume this below. BL(1) = (1+2+2+1+2+1)/(6x2) = 9/12. 1 • Closeness(u) = 1/∑v dvu

Digression into process mining • Efficient algorithms exist for mining a process model from an event log. • Can reveal if true causality exists between activities. • E.g., can say process = A followed by one of {B,C} in any order OR E, then followed by D. • Note: Will use this later for discriminating between causal and non-causal transfers of work.

A slightly more general def. of event log • Let A be a set of activities and P a set of performers. E = A×P is the set of (possible) events, i.e., combinations of an activity and a performer ((a, p) denotes the execution of activity a by performer p). C = E∗is the set of possible event sequences (traces describing a case). L ∈ B(C) is an event log, where B(C) is the set of all bags (multi-sets) over C. How does this def. abstract actual event logs? • Notation: π$a(e) = a and π$p(e) = p for event e = (a, p).

Mining SN from an event log • Can use a mining algorithm analogous to frequent itemset mining or more specifically episode mining (to be overviewed soon). • Key is choosing the right metric for filtering arcs. • Some metrics look at just transfer of work, some insist on causality (need knowledge of process).

Metrics based on (possible) causality • Direct and indirect succession (direct a special case): e.g., John-1->Mike, John-3->Pete. • With or without checking causality: e.g., Mike=1=>John is false and Mike=2=>Pete is true(taking causality into account). • Boolean vs count version: |John-1->Mike| = 2; |Mike=2=>Pete| = 2. (Verify using log table.)

Metrics based on work transfer • p-X->q = #times p transferred work to q/total #possible such times: e.g., John-X->Mike = 2/(3+3+3+3+2). • p-.X->q = #cases in which p transferred work to q at least once/length of log. • p-βX->q = same as p-X->q, except longer successions (length n) are penalized by βn-1, where 0<β<1. • P-β.X->q = same as p-βX->q, except only count distinct successions within each case. • β – “causality fall factor”.

“In between” metrics • p-◊n->q = p did some action at i and some other at i+n, and q did some action at j: i<j<i+n. • ||p-◊2->q|| = total #times a “◊2-in-between” occurred between p and q/total #possible such occurrences. • We can inject causality into this. • We can introduce causality fall factor β here too.

Working together metric • p cq = p and q do some action (not necessarily same) for case c. Then p Lq = #cases on which they worked together/#cases on which p worked (does that remind you of some familiar measure?). • E.g., John L Pete = 2/2 whereas Pete L John = 2/4. • Can compute a matrix of users x actions with M[u,a] = #times u did a (e.g.). Then use row vectors (users) to define similarity (similarly to what we will do in RecSys!)

Patterns Found • Conducted on Dutch national public works dept. Responsible for road and water infrastructure. • 17 activities, 4,988 cases, 33,603 lines of log, and 43 employees (users).

SN based on handover metrics 43 nodes, 406 edges, density=0.225. can conduct SNA on this graph.

Concluding Remarks • See paper for other SN mined by using different metrics. • Challenges: • scalability of actually mining SN using different metrics. • Scalability of conducting requisite SNA on the mined networks.

Other Questions • Can you think of other things worth measuring in event logs? • Key is measured patterns/quantities should be actionable and should yield value for business.

Other Social Network Discovery Papers (for your talks) • Ting Yu; Lim, S.-N.; Patwardhan, K.; Krahnstoever, N., "Monitoring, recognizing and discovering social networks," Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , vol., no., pp.1462,1469, 20-25 June 2009. • SinisaPajevic and DietmarPlenz. Efficient Network Reconstruction from Dynamical Cascades Identifies Small-World Topology of Neuronal Avalanches. PLoS Comput Biol. 5(1), 2009.

Discovering Social Networks from Enterprise Data

Discovering Social Networks from Enterprise Data

Presentation Transcript

Learning Bayesian Networks from Data

Social Innovation and Social Enterprise: Evidence from Australia

Towards Discovering Criminal Communities from Textual Data

Social Enterprise

Enterprise data Architecture and its application in social networks

SOCIAL ENTERPRISE

Social Enterprise

Towards Discovering Criminal Communities from Textual Data

Linking Social, Open, and Enterprise Data

Social Enterprise

Social Enterprise

Learning Bayesian Networks from Data

Data Mining: Discovering Information From Bio-Data

Discovering the Most Potential Stars in Social Networks

Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Social Enterprise

Social Enterprise

Social Enterprise

Discovering Social Networks from Event Logs

Slides from "Making The Business Case For Enterprise Social Networks" Report

Social Enterprise