1 / 52

Socio-Cognitive Analysis of an E-mail Network

Socio-Cognitive Analysis of an E-mail Network. Jaideep Srivastava University of Minnesota srivasta@cs.umn.edu Joint work with: Nishith Pathak, Sandeep Mane, University of Minnesota Noshir S. Contractor, University of Illinois - Urbana. Outline. Social networks and their analysis

ryanb
Download Presentation

Socio-Cognitive Analysis of an E-mail Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Socio-Cognitive Analysis of an E-mail Network Jaideep Srivastava University of Minnesota srivasta@cs.umn.edu Joint work with: Nishith Pathak, Sandeep Mane, University of Minnesota Noshir S. Contractor, University of Illinois - Urbana

  2. Outline • Social networks and their analysis • Automatic social network construction • Socio-cognitive analysis • Modeling socio-cognitive networks • Analysis of a Socio-cognitive network • Extracting concealed relationships • An IR-inspired approach • Analysis of cognitive knowledge networks • A Dempster-Shafer approach • Conclusion Jaideep Srivastava

  3. Social Networks • A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest • Social network analysis (SNA) is the study of social networks to understand their structure and behavior Reference: Freeman, Linton C. See you in the Funny Papers: Cartoons and Social Networks. CONNECTIONS, 23(1): 32-42, 2000 Jaideep Srivastava

  4. SNA in organizations • Types of social networks of interest (DeLange et al 2005) • Advice network: Who asks whom for advice and help • Trust network: Who discusses/shares sensitive information with whom • Support network: Who provides emotional support and social companionship to whom • Adversarial network: Who has an adversarial relationship with whom • Social network analysis can identify ‘undesirable structures’ (Nardi et al, 2002) • Imploded relationships: More frequent inter-group communication than intra-group communication • Holes: Missing edges which are expected to be present in the network • Bow ties: High dependency on a single person for communication; a potential bottleneck Jaideep Srivastava

  5. Computer networks as social networks • “Computer networks are inherently social networks, linking people, organizations, and knowledge” (Wellman, Science 2001) • Data sources include newsgroups like USENET; instant messenger logs like AIM; e-mail messages; social networks like Orkut and Yahoo groups; weblogs like Blogger; and online gaming communities USENET Jaideep Srivastava

  6. Types of Social Network Analysis • Sociocentric (whole) network analysis • Roots lies in sociology • Involves quantification of interaction among a socially well-defined group of people • Focus on identifying global structural patterns • Most SNA research in organizations concentrates on sociometric approach • Egocentric (personal) network analysis • Also called socio-cognitive analysis • Roots lie in anthropology and psychology • Involves quantification of interactions among an individual (called ego) and all other persons (called alters) related (directly or indirectly) to ego • Make generalizations of features found in personal networks • Difficult to collect data, so till now studies have been rare Jaideep Srivastava

  7. All this analysis is great, but … • a BIG question is • How do you construct the social network? Jaideep Srivastava

  8. A shift in approach:from ‘synthesis’ to ‘analysis’ Cognitive network for B • Problems • High cost of manual surveys • Survey bias • - Perceptions of individuals may be incorrect • Logistics • - Organizations are now spread across several countries. B Cognitive network for A A Cognitive network for C C Sdfdsfsdf Fvsdfsdfsdfdfsd Sdfdsfsdf Sdfsdfs ` Employee Surveys Sdfdsfsdf Fvsdfsdfsdfdfsd Sdfdsfsdf Sdfsdfs ` Sdfdsfsdf Fvsdfsdfsdfdfsd Sdfdsfsdf Sdfsdfs ` • Email • - Web logs Analysis Electronic communication Synthesis Social network Cognitive network Social Network Shift in approach Jaideep Srivastava

  9. Modeling a Socio-Cognitive Network

  10. Example of E-mail Communication • A sends an e-mail to B • With Cc to C • And Bcc to D • C forwards this e-mail to E • From analyzing the header, we can infer • A and D know that A, B, C and D know about this e-mail • B and C know that A, B and C know about this e-mail • C also knows that E knows about this e-mail • D also knows that B and C do not know that it knows about this e-mail; and that A knows this fact • E knows that A, B and C exchanged this e-mail; and that neither A nor B know that it knows about it • and so on and so forth … Jaideep Srivastava

  11. Modeling Pair-wise Communication • Modeling pair-wise communication between actors • Consider the pair of actors (Ax,Ay) • Communication from Ax to Ay is modeled using the Bernoulli distribution L(x,y)=[p,1-p] • Where, • p = (# of emails from Ax with Ay as recipient)/(total # of emails exchanged in the network) • For N actors there are N(N-1) such pairs and therefore N(N-1) Bernoulli distributions • Every email is a Bernoulli trial where success for L(x,y) is realized if Ax is the sender and Ay is a recipient Jaideep Srivastava

  12. Modeling an agent’s belief about global communication • Based on its observations, each actor entertains certain beliefs about the communication strength between all actors in the network • A belief about the communication expressed by L(x,y) is modeled as the Beta distribution, J(x,y), over the parameter of L(x,y) • Thus, belief is a probability distribution over all possible communication strengths for a given ordered pair of actors (Ax,Ay) Jaideep Srivastava

  13. Model for Belief Update • Jk(x,y) is the Beta distribution maintained by actor Ak regarding its belief about the communication from Ax to Ay • a and b, the two parameters of Jk(x,y), are associated with the number of emails observed by Ak which are • from Ax to Ay , i.e. number of successes, and • from Ax not to Ay, i.e. number of failures • Initialization • a and b start out with default initial values • Many different possibilities • For example, values can be chosen to be small so that they do not have much of an impact and can be “washed out” by future observations • Belief update • on observing a success or failure, Ak increments a or b respectively Jaideep Srivastava

  14. Belief State of an Actor • Every actor maintains Beta distributions (or beliefs) for all ordered pairs of actors in the network • Actor Ak’s belief state is defined to be the set of all N(N-1) Beta distributions (one for every Bernoulli distribution) • We also introduce a “super-actor” in the network • The super-actor is an actor who observes all the communication in the network • Super-actor is used as the baseline for reality • E-mail server is the “super-actor” Jaideep Srivastava

  15. Algorithmic details for belief update For an email the following will be the perceptions of the various actors involved in the email exchange: The sender and the super-actor will know all the recipients of the email Each of the To and Cc recipients will know only about all the To and Cc recipients Each Bcc recipient will know about itself and all the To and Cc recipients Jaideep Srivastava

  16. Quantitative Measures for Perceptual Closeness

  17. Types of Perceptual Closeness • We analyze the following aspects • Closeness between an actor’s belief and reality, i.e. “true knowledge” of an actor • Closeness between the beliefs of two actors, i.e. the “agreement” between two actors • We define two metrics, r-closeness and a-closeness for measuring the closeness to reality and closeness in the belief states of two actors respectively Jaideep Srivastava

  18. Measuring the Closeness Between Beliefs • For measuring the closeness between two belief states, the KL-divergence across the expected Bernoulli distributions for the two respective beliefs is computed. • The expected Bernoulli distribution for a belief is the expectation of the Beta distribution corresponding to that belief • If J(a,b)k,t is the Beta distribution, then the corresponding expected Bernoulli distribution (denoted by E[J(a,b)k,t]) is obtained by normalizing the parameters of Beta distribution J(a,b)k,t Jaideep Srivastava

  19. Belief Divergence Measures • The divergence of one belief, expressed by the Beta distribution J(a,b)x,t, from another, expressed by J(a,b)y,tat a given time t, is defined as, where, and • The divergence of a belief state By,t from the belief state Bx,t for two actors Ay and Ax respectively, at a given time t, is defined as, Jaideep Srivastava

  20. Belief Divergence Measures (contd.) • The a-closeness measure is defined as the level of agreement between two given actors Ax and Ay with belief states Bx,t and By,t respectively, at a given time t and is given by, • The r-closeness measure is defined as the closeness of the given actor Ak’s belief state Bk,t to reality at a given time t and it is given by, Where BS,t is the belief state of the super-actor AS at time t Jaideep Srivastava

  21. Interpretation of the Metrics • The r-closeness measure • An actor who has accurate beliefs regarding only few communications is closer to reality than some other actor who has a relatively large number of less accurate beliefs • Thus, accuracy of knowledge is important • The a-closeness measure between actor pairs • Consider three actors Ax, Ay and Az • Suppose we want to determine how divergent are Ay’s and Az’s belief states from that of Ax’s • If Ay and Ax have few beliefs in common, but low divergence for each of these few common beliefs, then their belief states may be closer than those of Az and Ax, who have a relatively larger number of common beliefs with greater divergence across them • a-closeness measure can be used to construct an “agreement graph” (or a who agrees with whom graph) • Actors are represented as nodes and an edge exists between two actors only if the agreement or the a-closeness between them exceeds some threshold t Jaideep Srivastava

  22. r-closeness and a-closeness experiments with EnronE-mail logs

  23. Enron Email Logs • Publicly available: http://www.cs.cmu.edu/~enron/ • Cleaned version of data • 151 users, mostly senior management of Enron • Approximately 200,399 email messages • Almost all users use folders to organize their emails • The upper bound for number of folders for a user was approximately the log of the number of messages for that user • A visualization of Enron data (Heer, 2005) • For experiments emails exchanged between users for the months of October 2000 and October 2001 were used Jaideep Srivastava

  24. Testing ‘conventional wisdom’ using r-closeness • Conventional wisdom 1: As an actor moves higher up the organizational hierarchy, it has a better perception of the social network • It was observed that majority of the top positions were occupied by employees • Conventional wisdom 2: The more communication an actor observes, the better will be its perception of reality • Even though some actors observed a lot of communication, they were still ranked low in terms of r-closeness. • These actors focus on a certain subset of all communications and so their perceptions regarding the social network were skewed towards these “favored” communications • Executive management actors who were communicatively active exhibited this “skewed perception” behavior • which explains why they were not ranked higher in the r-closeness measure rankings as expected in 1 Jaideep Srivastava

  25. N ot Higher Executive E mplo - Ranks A vail - Manage - Manage - Others yees able ment ment 2.6% 14.6% 6.9% 6.67% 1 - 10 0% (0) (1) (6) (2) (1) 28.9 % 34.1% 24.1% 13.33% 11 – 50 21.4% (6) (11) (14) (7) (2) 51 - 151 68.5% 51.3% 78.6% 69 % 80% (26) (21) (22) (20) (12) Total 100% 100% 100% 100% 100% (28) (38) (41) (29) (15) Experiment with r-closeness – Oct 2000 • For October 2000, based on their r-closeness rankings actors can be roughly divided into three categories • Top ranks: Actors who are communicatively active and observe a lot of diverse communications • Mid ranks: Actors who also observe a lot of communication but had skewed perceptions • Low ranks: Actors who are communicatively inactive and hardly observe any of the communication Jaideep Srivastava

  26. Experiment with r-closeness – Oct 2001 • r-closeness rankings for the crisis month Oct, 2001 show a significant increase (31% to 65.5%) in the percentage of senior executive management level actors in the top 50 ranks, with employees moving down Jaideep Srivastava

  27. Experiment with a-closeness Agreement Graph for Oct 2000 (threshold = 0.95) Agreement Graph for Oct 2001 (threshold = 0.95) Mean a-closeness against time Jaideep Srivastava

  28. 0.01 0.077 Lavorato Denailey Lavorato Denailey 0.01 0.007 0.01 0.01 0.019 0.007 1.00 0.005 Skilling Lay Skilling Lay 0.016 0.01 Initial State up to Oct,1999 0.06 Lavorato Denailey Lavorato Denailey 0.016 0.012 0.011 0.03 Skilling Lay Skilling Lay Lavorato Denailey Skilling Lay a-closeness between Lavorato, Denailey, Lay and Skilling Nov,1999-Dec,2000 Dec, 2000 Jan, 2001- Aug, 2001 0.047 Sept,2001-Oct,2001 0.017 0.019 0.012 0.027 0.017 0.016 Oct, 2001 0.049 Aug, 2001 0.015 Nov,2001-Dec,2001 0.019 0.012 0.027 0.017 Dec, 2001 Jaideep Srivastava

  29. Automatic Extraction of Concealed Relations

  30. Concealed Relations • Concealed/Covert Relations: Relations between groups of actors that • have high strength • but are known to very few actors in the network outside the group • Problem: Given email log data for all actors, extract the concealed relations from this data Jaideep Srivastava

  31. An IR-Motivated Approach • Use an approach motivated by informational retrieval • Use a tf-idf style scheme for relations • an actor’s view of the social network  document in a corpus • an (unordered) pair-wise actor relation  a term in a document • number of instances of a relation observed by an actor  term frequency (tf) in a document • number of actors that know about a relation  document frequency (df) • actual frequency of a relation is used to determine a ‘global’ ranking of concealed relations Jaideep Srivastava

  32. tf-idf in a Social Network • If N is the total number of actors, then for relation rkl between two actors Ak and Al, we define its tf-idf score to be • The score sklinduces a ranking among relations based on decreasing levels of concealment • If nklis replaced by mklj(# of instances of rkl observed by Aj) then we get score for relation rkl relative to actor Aj (denoted by sklj) • The higher the value for sklj the more privy is actor Aj to the relation rkl Jaideep Srivastava

  33. Distribution of Relation Scores Relation Scores October 2000 Relation Scores October 2001 Jaideep Srivastava

  34. Top 10 Concealed Relations Score of this relation has dropped Jaideep Srivastava

  35. Identifying actor cluster with ‘similar knowledge’ of concealed relations • Construct an N x NC2 matrix M • Each column corresponds to a relation • Each row corresponds to an actor • Each element corresponds to the score of a relation, relative to an actor • Use SVD to decompose M = UDVT • U is an N x k matrix • D is a k x k diagonal matrix • VT is a k x NC2 matrix • The matrix UD maps actors in a k dimensional latent space • Actors privy to the same set of relations get clustered together in this latent space Jaideep Srivastava

  36. Experimental Results • For the month of October 2000 • around 11 clusters were estimated to be present • One large cluster (size 121) consisted of those actors close to the origin, i.e., neither participated in nor were privy to any prominent concealed relations • Other 9 clusters of sizes 8, 5, 4, 3 , 2, 2, 2, 2, 1 and 1 respectively, were smaller groups of actors • For the months of October 2001 • around 20 clusters were estimated to be present • Again one large (size 95) cluster consisted of actors close to the origin • All other clusters were relatively smaller groups of actors (sizes 11, 7, 6, 4, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1 and 1) • Clustering was more pronounced for the crisis period Jaideep Srivastava

  37. Top actors from top clusters Jaideep Srivastava

  38. Some observations • Actors belonging to the smaller clusters tend to be aware of each others’ communications and exhibit community behavior • One of the strongest clusters of 3 actors consisting of the top 3 concealed relations for the October 2001 crisis period, is made up of actors who held the positions of Vice President (Government Affairs), Government Relations Executive and Vice President (Regulatory Affairs) • Other statistics • October 2001 – 490 nonzero relations • October 2000 – 129 nonzero relations • Total 11325 relations Jaideep Srivastava

  39. Analysis of Cognitive Knowledge Networks

  40. Cognitive Knowledge Networks • Cognitive knowledge networks • Who thinks who knows what • An actor’s mental (cognitive) model of ‘who knows what’ • Our cognitive models of who knows what drive our actions • We study • the propagation of knowledge perception in an e-mail network, i.e. • how actors develop perceptions about knowledge that other actors have Jaideep Srivastava

  41. Knowledge Perception Analysis • Knowledge is expressed as logical propositions which can be either true or false • Every actor entertains certain beliefs regarding the truthfulness of these knowledge propositions • Every email observed by an actor provides evidence (for/against/neutral) regarding the truth of some (set of) knowledge proposition K • Using Dempster-Shafer theory we can combine the evidence offered by an email along with an actor’s current belief in order to obtain his/her updated belief Jaideep Srivastava

  42. Knowledge Perception Analysis • Knowledge network is represented as a bipartite graph • Edge between an actor A and knowledge proposition K carries a label (s,p) • s is A’s confidence in K being true • p is the plausibility of K being true • (p-s) is the uncertainty that A has regarding K’s truthfulness • (s,p) can be seen as an expression of A’s beliefs regarding the verity of K Jaideep Srivastava

  43. Approach Jaideep Srivastava

  44. Evidence Extraction from Email Text • Use Topic Extraction for determining which knowledge proposition is being discussed • Berry and Browne (2005), show an interesting approach for automatically identifying semantic features (topics) from emails • Sentiment Extraction for determining sentiment of email i.e. whether it speaks for, against or neutral • Current state of the art might not deliver acceptable performances • Require development of more effective techniques for email data • Sentiments need to be quantified indicating their level of intensity • Pang and Lee (2005) provide an approach for converting sentiments into a rating measure Jaideep Srivastava

  45. Labeled Enron E-mail Dataset • Made available by the Enron Email Analysis Project at the University of California, Berkeley (http://bailando.sims.berkeley.edu/enron_email.html). • 1700 labeled email messages • Emails focus on business-related and California Energy Crisis content, and emails that occurred later in the collection • Emails are classified into 8 major categories • Business-related, purely-personal, personal but in a professional context, logistic arrangements, employment arrangements, document editing/checking, empty messages due to missing attachments and empty messages • The business-related category is further classified into sub-categories – • regulations and regulators, internal projects, company image (current), company image (changing), political influence/ contributions/contacts, California energy crisis/California politics, internal company policy, internal company operations, alliances/partnerships, legal advice, talking points, meeting minutes, trip reports. • Only one knowledge proposition of interest was examined, i.e. • ‘The company image is/remains good’ • Only those emails in the company image (current) as well as company image (changing) sub-categories were used • Total of 118emails Jaideep Srivastava

  46. Experimental Setup • Emails were manually classified into seven categories indicating the impact on the company image • Each category was associated with a specific confidence (rating on a scale of 1) and for/against sentiment • Knowledge Networks were recorded for year 2000 and year 2001 Jaideep Srivastava

  47. Company Image in Oct 2000 • Year 2000 • Positive Events • Positive news articles • Article in time magazine • Talk of the most innovative company of the year award • Negative Events • Miscommunication with the public relations section resulting in bad press • Environmental and human rights violation in overseas ventures Jaideep Srivastava

  48. Company Image in Oct 2001 • Year 2001 • Positive Events • Talk of a possible merger with a rival company, Dynergy • Negative Events • Bad press generated during the California power crisis • The spread out points in both the plots indicate a lack of consensus • A significant number of points away from the s=p line indicate presence of uncertainty in actors’ beliefs Jaideep Srivastava

  49. Summary • Automatic social network construction • Socio-cognitive analysis • Modeling socio-cognitive networks • Analysis of a Socio-cognitive network • Extracting concealed relationships • An IR-inspired approach • Analysis of cognitive knowledge networks • A Dempster-Shafer approach Jaideep Srivastava

  50. Impact on Organizational Dynamics Research • Analyzing social networks • Hypothesis Testing and model verification • Example[Raymond2001] • Individual job performance was positively related to centrality in advice networks and negatively related to centrality in hindrance networks composed of relationships tending to thwart task behaviors • Hindrance network density was significantly and negatively related to group performance • Identifying who consults whom when confronted with a problem. • Such patterns can be identified by mining the chains of emails • Negative relationships may also be determined by mining changes in patterns of emails across time • The building of trust among individuals can be studied over time • Analyzing knowledge networks • Mining patterns of knowledge in network • These patterns may change/evolve over time • Social Networks data verification • Verifying subjective data (collected via surveys) with observed event sequences Jaideep Srivastava

More Related