1 / 35

Diffusion-Aware Sampling and Estimation in Information Diffusion Networks

Diffusion-Aware Sampling and Estimation in Information Diffusion Networks. By: Motahareh Eslami Mehdiabadi , Hamid Reza Rabiee and Mostafa Salehi eslami@ce.sharif.edu 4 th September, 2012. Outline. Introduction Related Work Complete Data Partial Data Problem Formulation

taryn
Download Presentation

Diffusion-Aware Sampling and Estimation in Information Diffusion Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diffusion-Aware Sampling and Estimation in Information Diffusion Networks By: MotaharehEslamiMehdiabadi, Hamid Reza Rabiee and MostafaSalehi eslami@ce.sharif.edu 4th September, 2012

  2. Outline • Introduction • Related Work • Complete Data • Partial Data • Problem Formulation • Proposed Framework • Sampling Design • Estimation Approach • Experimental Evaluation • Conclusion • References Diffusion-Aware Sampling and Estimation P2P Live Video Streaming DML DML DML 2

  3. Introduction Diffusion-Aware Sampling and Estimation P2P Live Video Streaming DML DML DML 3

  4. Introduction Diffusion-Aware Sampling and Estimation P2P Live Video Streaming DML DML DML 4

  5. Different Sampling Designs in Diffusion Networks • Considering the sampling approaches to study diffusion behaviors of social networks, apart from their topologies Figure 1-(a): Using well-known sampling methods such as BFS and RW without considering the behavior of the diffusion process gathering redundant data + losing parts of diffusion data + decrease the performance of these sampling methods Figure 1-(b): Working with diffusion network directly It is often not feasible to work with this latent network P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 5

  6. Introduction Proposing a Novel Two-Step Framework (Sampling/ Estimation) »DNS« P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 6

  7. Related Work • Complete Data • Partial Data

  8. Complete Data • The most fundamental approach: Collecting the complete diffusion data • Following the Iraq war petitions in the format of e-mail [11], [19] • studying communication events between faculty and staff of a university by e-mails [15] • tracking the flow of information by extracting short textual phrases [16] P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 8

  9. Partial Data P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 9

  10. Problem Formulation • Basic Notations and Definitions • Problem Definition

  11. Basic Notations & Definitions P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 11

  12. Diffusion Characteristics Measurement P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 12

  13. Problem Definition P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 13

  14. Proposed Framework DNS: Diffusion Network Sampling • Sampling Design • Estimation Approach

  15. Sampling Design • Existing sampling methods such as BFS & RW do not consider diffusion paths. P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 15

  16. Sampling Design (cont’d) • Each cascade c can be assigned to a time vector • tc = {t1, t2, …, tn} which shows the infection times of nodes by cascade c • CT = {t1, t2, …, tNc}: the set of cascades’ time vectors • Waiting time model[1] depends on Δ = tv – tu • Ce: set of cascades which pass over link e • The infection probability of link e Exponential Model P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 16

  17. The Algorithm • moving from a node u, to a neighboring node v, through an outgoing link with the infection probability • Using the infection times (i.e. CT) as local information without any prior knowledge about the latent structure of the diffusion network P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 17

  18. Estimation Approach • Correcting the selection bias of a sampling method by re-weighting of the measured values • Hansen-HurwitzEstimator: elements are weighted inversely proportional to their visiting probability • Xi and п(Xi) are the visited element and its visiting probability on the ith draw of sampling method. P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 18

  19. Estimation Approach To use this estimator, the probability of visiting each element in the proposed sampling procedure should be computed. P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 19

  20. Link-based Characteristics • Gaining some information without having any connection to others for propagation will be not valuable in a network. • Link Attendance: shows the amount of presence in diffusion process for a link • Moving over links with the probability of Pe п(e) = Pe • Only Using local knowledge to compute the visiting probabilities of the links. P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 20

  21. Node-based Characteristics P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 21

  22. Cascade-based Characteristics P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 22

  23. Experimental Evaluation • Setup • Dataset • Performance Evaluation • Diffusion Behavior Analysis

  24. Setup P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 24

  25. Dataset • Synthetic Networks • Real Networks Table 1: The network and cascade generation parameters P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 25

  26. Speed of Cascade • The unknown structure of the diffusion network structure unavailability of the cascade speed Figure 2: Speed of cascade effect of link-attendance bias 0.4 ≤⍺≤ 0.7 P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 26

  27. Performance Evaluation Figure 3: Link Attendance characteristic evaluation in different sampling rates P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 27

  28. Performance Evaluation • Decreasing the bias by 30% in average by the DNS framework in comparison to the sampling design without estimation approach Table 2: The average performance difference of DNS with BFS, RW & DNS-WoE P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 28

  29. Diffusion Behavior Analysis Figure 4: Analysis of diffusion rate over sampling frameworks P2P Live Video Streaming Diffusion-Aware Sampling and Estimation DML DML DML 29

  30. Contributions DNS Framework P2P Live Video Streaming Diffusion Network Extraction DML DML DML 30

  31. Conclusion & Future Work… P2P Live Video Streaming Diffusion Network Extraction DML DML DML 31

  32. References [1] M. Gomez-Rodriguez, J. Leskovec and A. Krause, Inferring networks of diffusion and influence, In proc. of KDD ’10, pages 1019-1028, 2010. [2] Twitter Blog: ̸ = numbers. Blog.twitter.com., Retrieved 2012-01-20, http://blog.twitter.com/2011/03/numbers.html. [3] M. Eslami, H.R. Rabiee and M.Salehi, Sampling from Information Diffusion Networks, 2012. [4] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, Walking in Facebook: A Case Study of Unbiased Sampling of OSNs, Proceedings of IEEE INFOCOM, 2010. [5] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, Practical Recommendations on Crawling Online Social Networks, IEEE J. Sel. Areas Commun, 2011. [6] M. Salehi, H. R. Rabiee, N. Nabavi and Sh. Pooya, Characterizing Twitter with Respondent-Driven Sampling, International Workshop on Cloud and Social Networking (CSN2011) in conjunction with SCA2011, No. 9, Vol. 29, pages 5521–5529, 2011. [7] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel and B. Bhattachar-jee, Measurement and analysis of online social networks, Proceedings of the ACM SIGCOMM conference on Internet measurement, pages 29–42, 2007. [8] J. Leskovec and Ch. Faloutsos, Sampling from large graphs, Proceedings of the ACM SIGKDD conference on Knowledge discovery and data mining, pages 631–636, 2006. [9] M. Salehi, H. R. Rabiee, and A. Rajabi, Sampling from Complex Networks with high Community Structures, Chaos: An Interdisciplinary Journal of Nonlinear Science , 2012. [10] J. Leskovec, M. McGlohon, C. Faloutsos, N. S. Glance and M. Hurst, Patterns of Cascading Behavior in Large Blog Graphs, In proc. of SDM’07, 2007. [11] D. Liben-Nowell and J. Kleinberg, Tracing information flow on a global scale using Internet chain-letter data, Proc. of the National Academy of Sciences, 105(12):4633-4638, 25 Mar, 2008. [12] M. D. Choudhury, Y. Lin, H. Sundaram, K. S. Candan, L. Xie and A.Kelliher, How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?, Proc. of ICWSM , 2010. P2P Live Video Streaming Extracting Network of Information DML DML DML 32

  33. References(cont’d) [13] E. Sadikov, M. Medina, J. Leskovec and H. Garcia-Molina, Correcting for missing data in information cascades, WSDM, pages 55-64, 2011. [14] D. Gruhl, R. Guha, D. Liben-Nowell and A. Tomkins, Information dif-fusion through blogspace, In proc. of of the 13th international conference on World Wide Web, pages 491–501, 2004. [15] G. Kossinets, J. M. Kleinberg and D.J. Watts, The structure of information pathways in a social communication network, KDD ’08, pages 435-443. 2008. [16] J. Leskovec, L. Backstrom and J. Kleinberg, Meme-tracking and the dynamics of the news cycle, KDD ’09: Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497-506, 2009. [17] M.Gomez-Rodriguez, D. Balduzzi and B.Scholkopf, Uncovering the Temporal Dynamics of Diffusion Networks, Proc. of the 28 th International Conference on Machine Learning, Bellevue,WA, USA, 2011. [18] M. Eslami, H. R. Rabiee and M. Salehi, DNE: A Method for Extracting Cascaded Diffusion Networks from Social Networks, IEEE Social Computing Proceedings, 2011. [19] F. Chierichetti, J. Kleinberg and D. Liben-Nowell, Reconstructing Pat-terns of Information Diffusion from Incomplete Observations, NIPS 2011. [20] C.X. Lin, Q. Mei, Y.Jiang, J. Han and S. Qi, Inferring the Diffusion and Evolution of Topics in Social Communities, SNA KDD, 2011. [21] Ch. Wilson, B. Boe, A. Sala, K. P. N Puttaswamy and B. Y. Zhao, User interactions in social networks and their implications, EuroSys ’09: Proceedings of the 4th ACM European conference on Computer systems, pages 205–218, 2009. [22] M. Kurant, A. Markopoulou and P. Thiran, On the bias of BFS (Breadth First Search), 22nd IEEE International Teletraffic Congress (ITC), pages 1–8, 2010. [23] L. Lovas, Random walks on graphs: a survey, Combinatorics, 1993. [24] M.R Henzinger, A. Heydon,M. Mitzenmacher and M. Najork, On near-uniform URL sampling, Proceedings of the World Wide Web conference on Computer networks, pages 295-308, 2000. [25] J. Leskovec and C. Faloutsos, Scalable modeling of real graphs using Kronecker multiplication, Proc. of ICML, pages 497-504, 2007. P2P Live Video Streaming Extracting Network of Information DML DML DML 33

  34. References(cont’d) [26] J. Leskovec, J. Kleinberg and C. Faloutsos, Graphs over Time: Densi-cation Laws, Shrinking Diameters and Possible Explanations, Proc. Of KDD , 2005. [27] P.Erds and A. Rnyi, On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci., 5: page 17, 1960. [28] A. Clauset, C. Moore and M. E. J. Newman, Hierarchical structure and the prediction of missing links in networks, Nature, 453: pages 98-101, 2008. [29] J. Leskovec, K.J. Lang, A. Dasgupta and M.W. Mahoney, Statistical properties of community structure in large social and information networks, WWW, pages 695-704, 2008. [30] L.A. Adamic and N. Glance, The political blogosphere and the 2004 US Election, Proc. of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005. [31] M. E. J. Newman, Finding community structure in networks using the eigenvectors of matrices, Preprint physics/0605087, 2006. [32] S.A. Myers and J. Leskovec, On the Convexity of Latent Social Network Inference, Advances in Neural Infromation Processing Systems, 2010. [33] Ch. Gkantsidis, M. Mihail and A. Saberi, Random walks in peer-to-peer networks: algorithms and evaluation, Elsevier Science Publishers B. V., Performance Evaluation, P2P Computing Systems, Vol 63, pages 241–263, 2006. [34] D. Stutzbach, R. Rejaie, N. Duffield,S. Sen and W. Willinger, On Unbiased Sampling for Unstructured Peer-to-Peer Networks, Proceedings of IMC,pages 27–40, 2008. [35] L. Becchetti, C. Castillo, D. Donato and A.Fazzone, On the bias of BFS (Breadth First Search), LinkKDD, pages 1–8, 2006. [36] J. Yang and J. Leskovec, Modeling Information Diffusion in Implicit Networks, ICDM, IEEE Computer Society, pages 599-608, 2010. [37] D. Kempe, J. Kleinberg and E. Tardos, Maximizing the spread of influence through a social network, KDD ’03: Proc. of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press, pages 137-146, 2003. [38] M. Hansen and W. Hurwitz, On the Theory of Sampling from Finite Populations, Annals of Mathematical Statistics, No. 3, Vol 14, 1943. [39] E. Volz and D. Heckathorn, Probability based estimation theory for respondent-driven sampling, Official Statistics, pages 79, Vol 24, 2008. [40] M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, pages 7821-7826, 2002. P2P Live Video Streaming Extracting Network of Information DML DML DML 34

  35. Q&A

More Related