1 / 54

Measuring and Extracting Proximity in Complex Networks

Measuring and Extracting Proximity in Complex Networks. Emden Gansner, Yehuda Koren, Stephen North , Chris Volinsky AT&T Labs Research. AT&T “Safe Harbor”.

ronda
Download Presentation

Measuring and Extracting Proximity in Complex Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring and ExtractingProximity in Complex Networks Emden Gansner, Yehuda Koren, Stephen North, Chris VolinskyAT&T Labs Research

  2. AT&T “Safe Harbor” The following contains "forward-looking statements" which are based on management's beliefs as well as on a number of assumptions concerning future events made by and information currently available to management. Readers are cautioned not to put undue reliance on such forward-looking statements, which are not a guarantee of performance and are subject to a number of uncertainties and other factors, many of which are outside AT&T's control, that could cause actual results to differ materially from such statements. For a more detailed description of the factors that could cause such a difference, please see AT&T's filings with the Securities and Exchange Commission. AT&T disclaims any intention or obligation to update or revise any forward-looking statements, whether as a result of new information, future events or otherwise.

  3. large social networks data source |V| |E|

  4. Connecting Co-Authors in DBLP 18 node subgraph Proximity: 1.35e+01 Captured: 1.31e+01(97%) Adam?Glenn?Emden?

  5. 95% of communication between… 5 node subgraph Proximity: 7.10e+00 Captured: 6.74e+00(95%(

  6. Our goals • Measure proximity between nodes. • Explain proximity by extracting connection subgraphs that are readily visualized.

  7. What is proximity? • proximity [prox·im·i·ty || prɑk'sɪmətɪ /prɒ-]n. adjacency, nearness, closeness, vicinity • Network proximity is an elusive notion! • Let’s work by refining a series of definitions.

  8. Measuring proximity • Simplest approach – length of shortest path • Easily visualized

  9. Measuring proximity • Simplest approach – length of shortest path • Easily illustrated • Disregards alternative paths Captures 56% Captures 98%

  10. Measuring proximity • Simplest approach – length of shortest path • Easily visualized • Disregards alternative paths • Naïve calculation will be fooled by high degreesExample from a telephone call graph…

  11. Meaningful connection Shankar Suresh Stephen Lefty Which pair is closer? • Both paths are 2-hops, about the same lengths • But when considering node-degrees… Random connection?

  12. Measuring proximity – 2nd try • Net network flow between the nodes • Accounts for multiple paths • Distance indifferent – might favor long paths • High degree are still an issue

  13. Measuring proximity – 3rd try • Delivered electric current(effective conductance) • Resistor network model • Accounts for multiple paths • Penalizes long paths • High degrees?? • Getting us closer… • “intuitive” • Physical analogy is not perfect! 1V 0V edge weights conductance, inverse-resistance

  14. When is the electrical current analogy misleading? Significant connection Noise? What does current flow mean?

  15. When is the electric current analogy misleading? • Same current flow in both cases! • Degree-1 nodes are neutral (attract no flow) • Degree-1 nodes are very common, due to incomplete information Significant connection Noise?

  16. Augment network by a universal sink[Faloutsos, McCurley & Tomkins, KDD 2004] • Connect all nodes to a grounded universal sink (with 0V) • Tax each node - deliver portion of the flow to the sink • No internal nodes of degree 1 (above problem solved) • Penalizes long paths • A new parameter to worry about:Which tax system? - Constant tax? Proportional tax? Tax brackets? How much? • There is a worse problem…

  17. Universal sink and (non-)monotonicity • In our previous notions of proximity, adding nodes/edges to the network couldn’t decrease proximity • Hmmm…this “blind monotonicity” was part of their shortcoming… Proximity Network size

  18. Universal sink and (non-)monotonicity • For all previous measures, adding nodes/edges to the network couldn’t decrease proximity • With universal sink – no monotonicity:Larger network  proximity tends to zero, sink attracts more flow • Even adding s—t paths can decrease proximity! Proximity Network size

  19. Universal sink and (non-)monotonicity • Problems with non-monotonicity: • Counter-intuitive and hard to use • Size bias makes proximity-comparison across different pairs completely unreliable • Impossible to explain (size-dependent) proximity using a connection subgraph Proximity Network size

  20. A random-walk perspective • Current-flow model has a direct r.w. interpretation • Reminder:We defined proximity by “delivered current” or “effective conductance” • The escape probability, Pesc(st), is the probability that a r.w. originating at s will reacht before visiting sagain • Let Deg(s) be the number of r.w.’s originating at s • The effective conductance between s and t, is Pesc(st)*Deg(s)

  21. “Dead end” paths have no influence on escape probability • Both graphs have the same escape-probability from red to green Lowerredgreen escape probability Higherredgreen escape probability In both cases higher effective conductance by Rayleigh’s Monotonicity Law

  22. Extending escape probability • The escape probability, Pesc(st), is the probability that a r.w. originating at s will reacht before visiting sagain • The cycle-free escape probability, Pc.f.esc(st) is the probability that a r.w. originating at s will reacht without visiting any node more than once • Multiply by degree to get an absolute quantity (accounting for the number of "actually initiated" r.w.'s):The c.f. effective conductance between s and t is Pc.f.esc(st)*Deg(s)

  23. The c.f. effective conductance is a good candidate proximity measure: • Accounts for multiple paths • Favors short paths • Penalizes high-degree nodes • Penalizes dead-end paths • Parameter free • Has the “right” monotonicity • Accommodates edge directions • Has a natural extension to multiple endpoints Higherredgreen c.f. escape probability Lowerredgreen c.f. escape probability

  24. Computing c.f. escape probability • Unlike previous measures, exact computation is impossible • Practically, we can estimate it extremely well • Probability of paths declines exponentially (e.g., 100th path is x106 less probable than the first one.) • Estimate using the most probable paths: =

  25. Finding k most probable paths • For an edge u-v of weight w(u,v), define its length • Edge lengths are positive • Exp(-<length of path>) = Prob(path) • Short path High-probable path • Compute k shortest simple paths in O(k|E|log|E|) time [Katoh, Ibarki and Mine, 1982] • Stop searching when probability drops below “10-6”of first path

  26. Extracting and explaining proximity

  27. Extracting proximity • Cycle free effective conductance (CFEC) depends on the full graph • Find a small subgraph that captures the most proximity • A tradeoff between “size” and “captured proximity”, can be expressed in alternative ways: • Extract a subgraph with at most Bnodes that captures maximal CFEC • Maybe with B+1 nodes we can capture much more??? • Extract a minimal-sized subgraph that captures at least P% of total CFEC • Maybe we can capture (P-1)%of total CFEC with a much smaller subgraph???

  28. Extracting proximity • Find a small subgraph that captures most proximity • Achieve an efficient balance between “size” and “proximity” by maximizing the ratio: • Larger α emphasize proximity  larger subgraph • α=0  returns only the shortest path • α=∞ return all paths • Optionally, explicitly fix lower and upper bounds on subgraph size

  29. What solutions do we seek? • Overlapping paths delivering the most flow

  30. The path merger algorithm • We already have a collection of paths • Find the subset of the paths that maximizes • Combine the selected paths into a “proximity subgraph” • Overlapping paths are cheaper to add • An NP-hard problem…

  31. Optimal algorithm • Scanning all subsets takes O(2k)time (can we do better?) • A branch-and-bound pruning significantly reduces running time • Huge deviations in path-quality make this approach effectivee.g. often it is clear that the best-subset must contain first path(s) • Prematurely terminate exponential algorithm after scanning “too many” subsets

  32. Agglomerative algorithm • If optimal algorithm couldn’t finish, improve current result by an agglomerative algorithm • Iteratively, merge the two subsets that maximize the ratio • Record the best subset discovered

  33. Working with large graphs in external storage • Dealing with full graph is sometimes infeasible and usually unnecessary • Prior to running the algorithm, we construct a candidate graph in main memory • We begin by growing increasing neighborhoods around the endpoints

  34. S T

  35. Dist(S,i)=2 Dist(T,i)=2 S T

  36. Dist(S,i)=3 Dist(T,i)=3 S T

  37. Dist(S,i)=4 Dist(T,i)=4 S T

  38. Dist(S,i)=5 Dist(T,i)=5 S T Shortest path of length 10

  39. Most probable path of length 10 was found No use for low-probability paths... Paths longer than “24” unneeded!

  40. S T Dist(S,i)=12 Dist(T,i)=12

  41. S T i Dist(S,i)=12 Dist(T,i)=12 • Stop adding nodes • Any s—t path through unscanned node must be longer than “24”, thus useless • Can we prune the resulting graph? • Yes! • From two circles into an ellipse…

  42. Pruning the candidate graph • We can safely prune a significant portion of the candidate graph • Use the fact: dist(i,s)+dist(i,t)>L all s—tpaths going via i are longer than L • We ignore much less probable pathsPaths longer than “24” are not interesting • Take only nodes within the ellipse defined by:dist(i,s)+dist(i,t)<24

  43. From2-centers of circlesto2-foci of ellipse Dist(S,i)+Dist(T,i)=24 S T Dist(S,i)=12 Dist(T,i)=12

  44. Some statistics…

  45. Distribution of proximities in phone-call network

  46. Distribution of #hops in phone-call network

More Related