1 / 33

Connectivity Structure of Bipartite Graphs via the KNC-Plot

Connectivity Structure of Bipartite Graphs via the KNC-Plot. Erik Vee joint work with Ravi Kumar, Andrew Tomkins. The fundamental question…. Given graph with millions/billions of nodes, how do we understand it?. Macroscopic Success Stories.

Download Presentation

Connectivity Structure of Bipartite Graphs via the KNC-Plot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins

  2. The fundamental question… • Given graph with millions/billions of nodes, how do we understand it?

  3. Macroscopic Success Stories • Given graph with millions/billions of nodes, how do we understand it? • Spectral Graph Analysis • Eigenvalues reveal intuition for mixing time, connectivity • Conductance of a graph • Degree distribution

  4. Macroscopic models of graphs:Understanding connectivity Bow tie model [Broder et al] Web graph Jellyfish model [Faloutsos et al] Internet AS graph No equivalent model for bipartite graphs

  5. Our Goals • Develop macroscopic tools to analyze social networks • Massive networks • What are simple, easy-to-understand properties? • Today: KNC-plot for bipartite graphs • Given implicit graph representation,do something smarter than explicitly building graph • Bipartite representation gives an implicit graph • Our algorithms never build actual graph • Same spirit as work of [Feder, Motwani 95]

  6. Outline • Definition of the KNC-plot • k-neighborhood graph • Analysis of real social networks using the KNC-plot • Description of algorithm

  7. The k-neighborhood graph, Gk • Given bipartite graph B, users on left, interests on right • Connect two users if they share at least k interests in common

  8. The k-neighborhood graph, Gk • Given bipartite graph B, users on left, interests on right • Connect two users if they share at least k interests in common G1

  9. The k-neighborhood graph, Gk • Given bipartite graph B, users on left, interests on right • Connect two users if they share at least k interests in common G2

  10. The k-neighborhood graph, Gk • Given bipartite graph B, users on left, interests on right • Connect two users if they share at least k interests in common G3

  11. Illustration k=1

  12. Illustration k=2

  13. Illustration k=3

  14. Illustration k=4

  15. Illustration k=5

  16. The KNC-plot • The k-neighbor connectivity plot • How many connected components does Gkhave? • What is the size of the largest component? • Answers the question: how many shared interests are meaningful? • Communities, Cuts

  17. Analysis • Four graphs: • LiveJournal • Blogging site, users can specify interests • Y! query logs (interests = queries) • Queries issued for Yahoo! Search (Try it at www.yahoo.com) • Content match (users = web pages, interests = ads) • Ads shown on web pages • Flickr photo tags (users = photos, interests = tags) • All data anonymized, sanitized, downsampled • Graphs have 100s of thousands to a million users

  18. Examples —Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected

  19. Examples —Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected FlickrPhotos = “users” Tags = “interests” Content matchWeb pages = “users” Ads = “interests”

  20. Examples —Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected

  21. Examples —Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected Y! queries Users = users Queries = “interests” LiveJournalUsers = users Interests = interests

  22. Algorithms — Naïve —Ours For k = 2 • Naïve implementation takes O(mn) time • Impractical for large graphs

  23. Algorithms — Naïve —Ours For k = 2 • Naïve implementation takes O(mn) time • Impractical for large graphs • Our implementation takes O(m2-1/k) time • Social networks are generally sparse • Faster for power-law distribution (no change in the algorithm) • Very fast for k=2, can trim graph for k=3, etc. Space O(km)

  24. Alg-Intersect • Roughly speaking, for every pair of users, determine whether they have k interests in common • For each node u, record its neighborhood • For each node v, • see if u’s and v’s neighborhoods intersect in at least k nodes • If so, connect them, otherwise don’t • Takes O(nm) time (n= # nodes, m = # edges) Space = O(m)

  25. Alg-Intersect • Roughly speaking, for every pair of users, determine whether they have k interests in common • For each node uS, record its neighborhood • For each node v, • see if u’s and v’s neighborhoods intersect in at least k nodes • If so, connect them, otherwise don’t • Takes O(nm) time (n= # nodes, m = # edges) • BUT: May explore only nodes in set S. • Takes O(|S|m) time Space = O(m)

  26. Alg-Tuples • Consider k=2. • Suppose user 1 has interests {A,B,C}user 2 has interests {A,C,D} • Create “virtual nodes” • Connect user 1 to {AB}, {AC}, {BC} • Connect user 2 to {AC}, {AD}, {CD} • There is an edge between user 1 and user 2 in Gk iff there is a virtual node that both are connected to.

  27. Alg-Tuples • For each node u, • Create virtual nodes for u (if not already created) • Connect u to those virtual nodes • // (note: there are O( deg(u)k ) of them) • Figure out connectivity of Gk using virtual graph • Runtime O( u deg(u)k) • Uses Union-Set structure • Edges not actually explicitly computed Space O ( u deg(u)k)

  28. Combining them High degree nodes • Run Alg-Intersect for some subset S of nodes • We know all edges in Gk that go from uS to any node v • Runtime O(|S|m) Other nodes S

  29. Combining them • Run Alg-Intersect for some subset S of nodes • We know all edges in Gk that go from uS to any node v • Runtime O(|S|m) • Run Alg-Tuple on the rest of the nodes • We “know” all edges in Gk that go from uS to vS • Runtime O(uS deg(u)k ) Other nodes S

  30. Finding S • Order u1, u2, … by decreasingdeg(ui) • Initialize b=1. Increase b untili≥b deg(ui)k≤ bm • Let S = {u1, u2 …, ub} • Run Alg-Intersect on nodes in S • Run Alg-Tuple on nodes not in S • Connect the two • Runtime isO(bm) + O(i≥b deg(ui)k ) = O(2bm) High degree nodes

  31. Combining them • Runtime is O(bm) + O(i≥b deg(ui)k ) • But, for any graph, deg(ui) ≤ m/i (by Markov) • Do not need power-law • Hence, bm = i≥b deg(ui)k≤i≥b mk /ik = O( mk/bk) • So b = O(m1-1/k) Runtime is O(m2-1/k)

  32. Extensions • Power-law distributed provably faster • O(m1+(1-1/k)/) for power law with exponent  • Algorithm works exactly the same • No need to know whether power-law ahead of time • When set of interests is logarithmic, can get quasi-linear time algorithms • Different algorithm • In paper

  33. Conclusion • KNC-plot useful tool • Exposes how meaningful shared interests are • The k-neighborhood graph defined implicitly • Efficient algorithm for implicit graph • Other algorithms for Gk, given bipartite representation • Find additional social graph properties that are meaningful, computable • Describe macroscopic structure of social networks

More Related