1 / 30

Link Reconstruction from Partial Information

Link Reconstruction from Partial Information. Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS. General situations where problems may arise. Observed network (A NxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc.

Download Presentation

Link Reconstruction from Partial Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS

  2. General situations where problems may arise Observed network (ANxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc. B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity measure. C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to link prediction. D) network is subject to change. one kind of prediction problem (link prediction), node prediction, network evolution, etc.

  3. 1 3 5 2 4 B.1 Problem of network reconstruction There are some unknown links, which may be corrupted, missed or unable to measure at time. Guess out the values (0 or 1) of dashed arrows. • Network has structures. • Unknown links are fairly sampled. • Number of unknown links are small. Presumptions:

  4. model function optimization connection probability threshold reconstruction or prediction observed network prediction parameters modeling B.2 Procedures of reconstruction of links Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each unknown links (i,j) -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise

  5. B.3 Reformulated signal detection problem Observed network -> 3 types of signals, 0, 1 and ?. Fitted model -> connection probabilities, P0 and P1. Signals (P?) to be classified -> ? Problem: Giving connection probability P? -> type of signal (0 or 1) Assumption under certain model: Unknown links do not influence significantly the reliability of fitted model (P0 and P1) , i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.

  6. B.4 An equivalent hypothesis testing problem Observation (data): connection probability (p) Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region Decision D: D0 (accept H0) and D1 (accept H1) Searching an optimal detection scheme? e.g., Neyman-Pearson criterion,

  7. B.5 Measuring reconstruction performance Contingency table (or confusion matrix)

  8. f1(p) f0(p) R2 R1 R3 R4 B.6 Relation to performance measures connection probabilities pt

  9. B.7 Criterion of MAP For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.

  10. A.1 Probabilistic model of structured networks

  11. A.2 Estimate model parameters (MLE)

  12. B.8 Example network

  13. B.9 Density function of connection probabilities

  14. B.10 MAP detector minimizes average error Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).

  15. B.11 Distribution of connection probabilities

  16. B.12 Generalizability of algorithm Unknowns following same distribution approximately? Possible reasons for unfavorable burst at tail, source of model error.

  17. B.13 Robustness of algorithm sensitive to number of unknown links?

  18. B.14 Comparison of operation points

  19. B.15 Reconstruction results USAir Network, 10% missed

  20. 1 3 5 2 4 C.1 A variant problem of link reconstruction Observed network -> types of signals, 0 and 1. some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.

  21. C.2 Procedures for the variant problem Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each 0-link (i,j) -> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.

  22. C.3 Algorithm based on common neighbor

  23. C.4 Comparison between two methods Probability density functions Distribution functions

  24. C.5 Generalizability and robustness of algorithms

  25. C.6 Reconstruction performance by ranking

  26. D.1 Problem of link prediction Procedure is identical to that of the variant link reconstruction problem. Econophysics Co-authorship network (N=506, m=519, nL=379)

  27. D.2 Factors to affect prediction performance Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism

  28. D.3 Effects of training set size Assume new links to be known, examine the variant problem above: training data set is not able to capture underlying distribution faithfully, either size is too small or growing rule is time dependent.

  29. Conclusions The problem of network reconstruction is thoroughly studied. Under more general framework, the problem can be reformulated as hypothesis testing problem, which gives deeper insights into our understanding of the problem, and enable us to relate the reconstruction performance of various methods to quantities at more fundamental level.

  30. THANK YOU

More Related