1 / 35

Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions

Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions. Saket Navlakha , Carl Kingsford. Presented by: Geli Fei. Overview. Motivation Network Reconstruction Algorithms Experiments. Importance of Knowing Network Growing Dynamics.

lavi
Download Presentation

Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Archaeology: Uncovering Ancient Networksfrom Present-Day Interactions SaketNavlakha, Carl Kingsford Presented by: GeliFei

  2. Overview • Motivation • Network Reconstruction Algorithms • Experiments

  3. Importance of Knowing Network Growing Dynamics • Many networks are the product of an evolutionary process that guided their growth. • Analyses of network growth dynamics are useful for understanding: • Existing network properties • How networks change in the future

  4. Past Network Unavailable • In many cases, only a static snapshot of a network is available. • Biology domain • Social network domain • Lack of data makes understanding of a network difficult

  5. Network Growth Models • Often, we know a general principle that governs the network’s forward growth. • Preferential attachment (PA) • Duplication-mutation with complementarity (DMC) • Forest fire (FF) • Can be used to understand global changes in a network.

  6. Network Growth Models • However, a randomly grown network will generally not match a target network! • Instead of growing a random graph forward, we decompose the actual observed network backwards in time.

  7. Network Reconstruction Algorithm • Goal is to find G1, G2, …, Gt-1 given Gt under model M

  8. Network Reconstruction Algorithm • Computational issue • Heuristically set • Greedily reverse only a single step of the evolutionary model • First-order Markov model assumption

  9. Network Reconstruction Algorithm

  10. Network Reconstruction Algorithm • Model M is being run forward as intended! • , the prior, can be used to guide the choice of Gt-1 • Use uniform prior for simplicity

  11. The Duplication-Mutation with Complementarity (DMC) Model • Based on the duplication-divergence principle • Start with a connected two-node graph

  12. DMC Model

  13. To Reverse DMC Model • Given qmod and qconand Gt • Goal is to find a pair of nodes: • <node most recently entered Gt, its anchor node in Gt-1>

  14. To Reverse DMC Model • After (u,v) is found, Gt-1 is formed by: -> Removing either u or v -> Assume we remove v, u gains edges to all nodes in N(u)N(v) • pairs of nodes must be considered

  15. The Forest Fire (FF) Model • Was proposed by Leskovec et al. to grow networks that mimic certain properties of social networks • Probabilistic process: • Fire starts at some node u • Probabilistically move forward to N(u) • Stops when the spreading ceases

  16. FF Model • Start with a connected two-node graph, a burning probability p

  17. To Reverse FF Model • Given burning probability p, current network Gt • As for DMC model, find <most recently entered node, its anchor node> • Difficult to write down Analytic expression computing the likelihood of Gt-1. • Simulation is used to compute the likelihood instead.

  18. The Preferential Attachment (PA) Model • Was proposed as a mechanism to emulate the growth of the Web. • New pages make popular pages more popular by linking to them preferentially. • Only consider linear version of the PA model.

  19. PA Model Start with a clique of k+1 nodes, parameter k

  20. PA Model • No anchor node as in DMC or FF • Most recently added node must be of minimum degree in Gt • Find a node to remove among nodes in C

  21. PA Model

  22. Algorithm Measures • Likelihood of node/node anchor pair • Spearman’s footrule and Kendall’s measures of arrival-time correlation

  23. Reversibility of Models • In a situation where the evolutionary history is completely known. • For each model, grow a 100-node network forward, then use Gt=100 to reconstruct its history. • Repeat this process 1000 times for each model and average the results.

  24. Reversibility of Models - DMC • DMC • Reversibility varies drastically depending on: • the DMC model parameters to grow the network forward • the match between parameters used to grow and reverse the network

  25. Reversibility of Models - DMC • Performance under noise • Most sensitive to noise among three models

  26. Reversibility of Models - FF FF: p increases the degree of each node increases

  27. Reversibility of Models - FF • Under noise

  28. Reversibility of Models - PA • PA The most easily reversible

  29. Reversibility of Models - PA • Under noise • Most resilient to noise

  30. Recovery of ancient protein interaction network • Use PPI network for the yeast S. cerevisiae from the IntAct database • 2,599 proteins (nodes) and 8,275 physical interactions • Past PPI networks are unavailable, do not have true node arrival times • Node arrival times are inferred using additional information as ground truth

  31. Recovery of ancient protein interaction network • Comparison between three models • Duplication-based model is a better fit for PPI network • For DMC, low-to-medium qmod and medium-to-high qcon give the best performance

  32. Recovery of ancient protein interaction network • Actual likelihood values for DMC also indicate the plausibility of the reconstruction • The ratio of log-likelihoods between inferred history and a random reconstruction is > 5 • The likelihood of reconstruction (qmod=0.4, qcon=0.7) is 2.6 times higher than reconstruction (qmod=0.9, qcon=0.1)

  33. Recovery of past social networks • Music social network data nodes are users edges indicate friends • Predict user-arrival • Best performing model was the worst for PPI network

  34. Conclusion • A novel framework for uncovering past networks given only a growth model • Works in a principled way, and provides a likelihood estimates for ancestral graphs • Using the accuracy of history reconstruction as an optimization criterion, optimal parameters are chosen

  35. Thank you!

More Related