1 / 48

The Universal Laws of Structural Dynamics in Large Graphs

The Universal Laws of Structural Dynamics in Large Graphs. Dmitri Krioukov UCSD/CAIDA David Meyer & David Rideout UCSD/Math F . Papadopoulos, M. Kitsak , M. Á. Serrano , M. Bogu ñá M. Ostilli DARPA’s GRAPHS, Washington, DC, Halloween 2012 (Cancelled by the Frankenstorm ).

Download Presentation

The Universal Laws of Structural Dynamics in Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Universal Lawsof Structural Dynamicsin Large Graphs Dmitri KrioukovUCSD/CAIDADavid Meyer & David RideoutUCSD/Math F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá M. Ostilli DARPA’s GRAPHS, Washington, DC, Halloween 2012(Cancelled by the Frankenstorm)

  2. High-level project description • Motivation: • Predict network dynamics • Detect anomalies • Goal: • Identify the universal laws of network dynamics • Methods: geometry: random geometric graphs • Past work: static graphs • Present/future work: dynamic graphs

  3. Outline • Hyperbolic  popular  similar • Growing random hyperbolic graphs • Next step • Random Lorentzian graphs

  4. Growing hyperbolic Random geometric graph • “Discretization” of a smooth manifolds (B.Riemann, Nature, v.7) • Take a circle of radius R • Sprinkle N points into it uniformly at random • Connect each pair of points iff the distance between them is x r  R hyperbolic R grows to R+dR 1 new point in [R,R+dR] new point to existing

  5. Connecting to m closest nodes The expected distance to the m’th closest node from t is: New node t located at radial coordinate rt lnt,and connecting to all nodes within distance Rt~ rt,connects to a fixed number of closest nodes

  6. Closest nodes The hyperbolic distance between s and t is Find m nodes s, st, with smallest xst for a given t: New node t connects to a fixed number of existing nodes s with smallest sst

  7. Hyperbolic  popular  similar • Two dimensions of attractiveness • Radial popularity: birth time s: • The smaller the s, the more popular is the node s • Angular similarity: distance st: • The smaller the st, the more similar is the node s to t • New node t connects to existing nodes s optimizing trade-offs betweenpopularity and similarity • This trade-off optimization yieldshyperbolic geometry

  8. What else it yields • Power-law graphs • With strongest possible clustering • Effective preferential attachment

  9. Clustering • Probability of new connections from t to s so far • If we smoothen the threshold • Then average clustering linearly decreaseswith T from maximum at T = 0 to zero at T = 1 • Clustering is always zero at T > 1 • The model becomes identical to PA at T 

  10. Effective preferential attachment • Average attractiveness of nodes of degree kis a linear function of k • Probability that new node t connects toan existing node of degree k is

  11. PSO  PA • PSO  PA  S, where • PSO is popularity  similarity optimization • PA is preferential attachment (popularity) • S is similarity (sphere) • PA is 1-dimensional (radial popularity) • PSO is d1-dimensional, where d is the dimensionality of the similarity space

  12. Validation • Take a series of historical snapshots of a real network • Infer angular/similarity coordinates for each node • Test if the probability of new connections follows the model theoretical prediction

  13. Learning similarity coordinates • Take a historical snapshot of a real network • Apply a maximum-likelihood estimation method (e.g., MCMC) using the static hyperbolic model • Metropolis-Hastings example • Assign random coordinates to all nodes • Compute current likelihood • Select a random node • Move it to a new random angular coordinate • Compute new likelihood Ln • If Ln > Lc, accept the move • If not, accept it with probability Ln / Lc • Repeat

  14. Real networks • PGP web of trust • Nodes: PGP certificates (roughly, people) • Links: Trust relationships • Internet • Nodes: Autonomous systems (ASes) • Links: Business relationships • Metabolic (E.coli) • Nodes: Metabolites • Links: Reactions

  15. Binning and overfitting • Number of parameters (, node coordinates) is much smaller then the number of unknowns (, distances between nodes) • Overfitting is impossible but we have to bin the hyperbolic distances with a small number of bins to compute empirical connection probability • More rigorous measures of the fitting quality, independent of any binning, are desired

  16. More rigorous measuresof modeling quality • By maximizing likelihood , MLE minimizes the logarithmic loss • The modeling quality is thus measured by either the log-loss difference , or the normalized likelihood , where • is the log-loss with the inferred coordinates • is the log-loss with random angular coordinates • The normalized likelihood is thus the ratio of the probability that a given network with the inferred coordinates is generated by the model, to the same probability with random coordinates, in which case the network has “nothing to do” with the model

  17. Normalized likelihood • The popularity  similaritymodel does not describe well the actor network because very dissimilar actors often collaborate on big movies

  18. Soft community detection effect • Inferred coordinates correlate with meaningful node groups

  19. Capturing network structure • As a “simple” consequence of the fact the PSO model accurately describes the large-scale network growth dynamics, it also reproduces very well the observed large-scale network structure across a wide range of structural properties

  20. Take-home messages (on PSO) • Popularity  similarityoptimization dynamics  Geometrical hyperbolicityTopological heterogeneity  transitivity(real nets) • Popularity is modeled by radial coordinates • Similarity is modeled by angular coordinates • Projections of a properly weighted combination of all the factors shaping the network structure

  21. Immediate applications(submitted) • New simple network-embedding method • The idea is to “replay” the growth of a given network snapshot according to PSO • New link prediction method, outperforming all the most popular link prediction methods • Some classes of links can be predicted with 100% accuracy • Perhaps because the method captures all the factors shaping the network structure

  22. Something is definitely wrong • Node density is not uniform unless , while in all the considered real networks • Modeled graphs are not random geometric graphs • They do not properly reflect hyperbolic geometry • The main project goal (find fundamental laws of network dynamics using geometry) cannot be achieved using hyperbolic geometry

  23. Plausible solution • Geometry under real networks is not hyperbolic but Lorentzian • Lorentzian manifolds explicitly model time • Proof that PSO graphs are random geometric graphs on de Sitter spacetime (accepted)

  24. Lorentzian manifolds • Pseudo-Riemannian manifold is a manifold with a non-degenerate metric tensor • Distances can be positive, zero, or negative • Lorentzian manifold is a manifold with signature • Coordinate corresponding to the minus sign is called time • Negative distance are time-like • Positive distance are space-like

  25. Causal structure • For each point , the set of points at time-like distances from p can be split in two subsets: • ’s future • ’s past • If , then is called the Alexandrovset of

  26. Alexandrov sets • Form a base of the manifold topology • Similar to open balls in Riemannian case

  27. Lorentzian Random geometric graph • “Discretization” of a smooth manifolds (B.Riemann, Nature, v.7) • Take a circle of radius R • Sprinkle N points into it uniformly at random • Connect each pair of points iff the distance between them is x r  R Lorentzian T; 0; because Alexandrovsets are “balls” now

  28. Major challenge (in progress) • On the one hand, random Lorentzian graphs are random geometric graphs, and consequently exponential random graphs (equilibrium ensembles) • On the other hand, they are dynamic growing graphs (non-equilibrium ensembles) • Can it be the case that a given ensemble of graphs is static (equilibrium) and dynamic (non-equilibrium)at the same time??? • If we prove that it is indeed the case, then we • Discover some unseen static-dynamic graph duality • Open a possibility to apply very powerful tools developed for equilibrium systems (e.g., exponential random graphs), to dynamic networks

  29. F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá, and D. Krioukov,Popularity versus Similarity in Growing Networks,Nature, v.489, p.537, 2012

More Related