1 / 38

Chayant Tantipathananandh with Tanya Berger-Wolf

Constant-Factor Approximation Algorithms for Identifying Dynamic Communities. Chayant Tantipathananandh with Tanya Berger-Wolf. Social Networks. These are snapshots and networks change over time. Dynamic Networks. t =1. t=1. 3. 5. 4. 1. 2. t =2. 1. …. 3. 2. t =2. 5. 2. 3. 4.

lisle
Download Presentation

Chayant Tantipathananandh with Tanya Berger-Wolf

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constant-Factor Approximation Algorithms for Identifying Dynamic Communities Chayant Tantipathananandh with Tanya Berger-Wolf

  2. Social Networks These are snapshots and networks change over time

  3. Dynamic Networks t=1 t=1 3 5 4 1 2 t=2 1 … 3 2 t=2 5 2 3 4 1 5 2 4 1 … 5 2 3 4 4 5 5 2 4 1 3 Aggregated network 1 1 2 2 3 5 2 3 • Interactions occur in the form of disjoint groups • Groups are not communities 1 1 4

  4. Communities • What is community? “Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.” [Wasserman & Faust 1994] • Dynamic Community Identification • GraphScope [Sun et al 2005] • Metagroups [Berger-Wolf & Saia 2006] • Dynamic Communities [TBK 2007] • Clique Percolation [Palla et al 2007] • FacetNet [Lin et al 2009] • Bayesian approach [Yang et al 2009]

  5. Ship of Theseus from Wikipedia “The ship … was preserved by the Athenians …, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.” [Plutarch, Theseus] Jeannot's knife “has had its blade changed fifteen times and its handle fifteen times, but is still the same knife.” [French story]

  6. Ship of Theseus Individual parts never change identities Cost for changing identity …

  7. Ship of Theseus Identity changes to match the group Costs for visiting and being absent …

  8. Approach

  9. Community = Color Valid coloring: In each time step, different groups have different colors.

  10. Interpretation Group color: How does community c interact at time t?

  11. Interpretation Individual color: Who belong to community c at time t? 1 2 1 2 2 1 2 1 2 1

  12. Social Costs: Conservatism Switching cost α 2 2 α α 2 2 α α 2 2 2 2 α α 2 2 Absence cost β1 Visiting cost β2

  13. Social Costs: Loyalty 3 3 β1 β1 β1 2 3 2 3 β1 β1 1 1 β1 β1 Switching cost α Absence cost β1 Visiting cost β2

  14. Social Costs: Loyalty β2 3 β2 3 β2 2 β2 2 Switching cost α Absence cost β1 Visiting cost β2

  15. Problem Complexity • Minimizing total cost is hardNP-complete and APX-hard [with Berger-Wolf and Kempe 2007] • Constant-Factor Approximation [details in paper] • Easy special caseIf no missing individuals and 2α ≤ β2 , thensimply weighted bipartite matching[details in paper]

  16. Group Graph

  17. Approximation via bipartite matching assume all individuals are observed at all time steps

  18. Greedy Approximation No visiting or absence and minimizing switching time

  19. Greedy Approximation No visiting or absence and minimizing switching 3 4 2 ≈ maximizing path coverage 3 Greedy alg guaranteesmax{2, 2α/β1, 4α/β2} in α, β1, β2, independent of input size 7 2 3 4 Improvementby dynamic programming 3 time

  20. Southern Women Data Set [DGG 1941] • 18 individuals, 14 time steps • Collected in Natchez, MS, 1935 aggregated network

  21. Ethnography [DGG1941] Core Core note: columns not ordered by time

  22. Optimal Communities individuals time Core Core ethnography all costs equal white circles = unknown

  23. Approximate Optimal time time ethnography Core Core Core Core

  24. Approximation Power 28 inds, 44 times 29 inds, 82 times 313 inds, 758 times

  25. Approximation Power 41 inds, 418 times 264 inds, 425 times 96 inds, 1577 times

  26. Conclusions • Identity of objects that change over time (Ship of Theseus Paradox) • Formulate an optimization problem • Greedy approximation • Fast • Near-optimal • Future Work • Algorithm with guarantee not depending on α, β1, β2 • Network snapshots instead of disjoint groups

  27. Thank You NSF grant, KDD student travel award Mayank Lahiri Chayant Jared Saia David Kempe Arun Maiya Ilya Fischoff Habiba Saad Sheikh Tanya Berger-Wolf Dan Rubenstein Anushka Anand Siva Sundaresan Rajmonda Sulo Robert Grossman

  28. Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins IBM Almaden Research Center On the Bursty Evolution of Blogspace

  29. Blogspace Blogspace Collection of blogs with their links Motivation Sociological Different with traditional web page Technical From static snapshot to dynamic graphs

  30. Background Web communities (Ravi Kumar,1999) groups of individuals who share a common interest characterized by dense directed bipartite subgraphs. Bursty communities of blogs Exhibit striking temporal characteristics Extract the community within a time interval

  31. Time graph time graph G = (V,E) v in V has an associated duarationD(v) e in E is a triple (u, v, t) t is a timein interval D(u) ∩ D(v). prefixof G attime t Gt = (Vt,Et) Vt= {v in V | D(v)∩ [0, t] ≠Ø } Et= {(u, v, t) in E| t’ ≤ t}

  32. Approach Two step approach Community extraction Extract dense subgraphs( potential communities) Bust analysis analyze each dense subgraph to identfy and rank bursts in these communities.

  33. Community extraction Finding the densest subgraph: NP-hard Two steps: Pruning Remove vertices of degree no more than one Vertices of degree two are K3g Output and remove communities (pass a threshold) Repeat the 3 steps above Expanding Determines the vertex containing the most links Add it to the community If the links is larger than tk.

  34. Burst analysis Kleinberg’s method (SIGKDD 2002) model the generation of events by an automaton one of two states, “low” and “high.” high state is hypothesized as generating bursts of events. a cost is associated with any state transition to discourage short bursts. find a low cost state sequence that is likely to generate the stream. solves the problem of enumerating all the bursts by order of weight( dynamic programming)

  35. Tuning the algorithms Expansion in community extraction Edgesmust grow to triangles; communities of size up to six willonly grow vertices that link to all but one vertex; Communitiesof size up to nine will only grow vertices that link to allbut two vertices; communities up to size 20 will grow onlyvertices that link to 70% of the community; larger communitieswill grow only vertices that link to at least 60% ofthe community

  36. Results

More Related