1 / 46

Analysis of Large-Scale Cell Phone Networks

Analysis of Large-Scale Cell Phone Networks. 10-802 Course Project Leman Akoglu Bhavana Dalvi Skyler Speakman April 22 2010. 3.8 million anonymized customers from India Gender Activation Date Age (sketchy) 6 months of time-stamped directed phone calls Time of day Duration

shelly
Download Presentation

Analysis of Large-Scale Cell Phone Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Large-Scale Cell Phone Networks 10-802 Course Project Leman Akoglu BhavanaDalvi Skyler Speakman April 22 2010

  2. 3.8 million anonymized customers from India • Gender • Activation Date • Age (sketchy) • 6months of time-stamped directed phone calls • Time of day • Duration • Switching stations removed (bummer) • 220 million text messages • Time of day Analysis of Large-Scale Cell Phone Networks

  3. Analysis of Tie Strengths and Mutuality • Leman Akoglu • Persistence of Social Ties • BhavanaDalvi • Pattern & Event Detection in Social Networks • Skyler Speakman Analysis of Large-Scale Cell Phone Networks

  4. Analysis of Ties in Composite NetworksLink Prediction in Large SMS+CALL Networks Presented by Leman Akoglu April 22, 2010

  5. Sub-Problem • Goal: Link prediction • In integrated networks (SMS+VOICE) • Questions: • How do different methods perform? • Does information of edge weights matter? • Does knowledge of VOICE interactions improve SMS predictions, and vice versa? • Similar to: D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. Proc. 12th International Conference on Information and Knowledge Management (CIKM), 2003 • They use very small graphs with up to 5K nodes and 50K edges. Here we have networks of millions of users. • They did not use the weighted version of most methods.

  6. Methods used in Link Prediction

  7. Results In general, low prediction accuracy (up to ~7%)

  8. Sub-Problem II • Main sub-project goal: Analysis of ties/links • In integrated networks (SMS+VOICE) • Questions: • How do mutual and non-mutual networks differ? • How equal is reciprocity? • Is there a correlation between node degree and its neighbors’ degrees? • How does total duration ornumber of phonecalls/SMSs grow by the number of contacts? • Does strength of a tie depend on neigborhoodoverlap?

  9. 1. How do mutual and non-mutual networks differ? 0.3 SMS PHONECALL In the mutual network of SMS, 70% of the nodes become singletons!

  10. 2. How equal is reciprocity? SMS PHONECALL

  11. 3. Is there a correlation between node degree and its neighbors’ degrees? disassortative vs. assortative mixing high degree nodes with low degree neighbors, where also all edges have the same weight. SMS

  12. 3. Is there a correlation between node degree and its neighbors’ degrees? PHONECALL

  13. 4. How does total duration ornumber of phonecalls/SMSs grow by the number of contacts? SMS PHONECALL

  14. 5. Does strength of a tie depend on neigborhood overlap? SMS

  15. 5. Does strength of a tie depend on neigborhood overlap? PHONECALL

  16. CONCLUSIONS: • How do mutual and non-mutual networks differ? • There is far less mutuality in the SMS network. • Is reciprocity balanced? • Yes, balanced and small reciprocity is more common. • Is there a correlation between node degree and its neighbors’ degrees? • Yes, degree of a node and avg. degree of its neighbors have an assortative mixing for nodes of degree>~10. • How does total duration ornumber of phonecalls/SMSs grow by the number of contacts? • Total node strength grows super-linearly (power-law) by increasing degree. • Does strength of a tie depend on neigborhoodoverlap? • Yes, tie strength increases by increasing neighborhood overlap on average.

  17. Network Structure and Tie Persistence in mobile network BhavanaDalvi

  18. Goal • Predict which of the existing ties will survive? •  Questions :  • Which link features matter? •  Which node features matter? • How are they correlated to each other? •  Which prediction method to use?

  19. Related Work • Structure and tie strengths in mobile communication network - Onnela, Barabasi - PNAS 2007 • Coupling between tie strengths and local network structure • Information diffusion through strong ties vs weak ties • The dynamics of a mobile phone network - Hidalgo et. al. ScienceDirect Jan 2008 • Relation between structure of mobile network and link persistence • Rule based prediction • We formulate it as prediction problem.

  20. Problem Formulation •  Divide the data into time panels  • Given the links and network structure in panel 1 predict which links will persist in panels 2,3,4 etc.

  21. Concept Definitions • Persistence of tie • Perseverence of user

  22. Random Sample • Selected seed uniformly at random • Took a subgraph of original graph by traversing neighbors and their neighbors • # users : 5K • #links : 14.6K • Duration : 3 months

  23. Tie persistence distribution • Bimodal distribution • Ties either active most of the times or rarely active

  24. Tie Attributes • Reciprocity (R) • 1 : If the tie is reciprocal • 0 : otherwise • Topological Overlap (TO)

  25. Node Attributes • Degree (K) • Cluster Coefficient (C) • Average reciprocity (r) • fraction of ties containing both incoming and outgoing calls

  26. Pearson Correlation Coefficient • Measures of dependence between two quantities Corr(X,Y) =     cov(X,Y) var(X) * var(Y) 

  27. Tie Persistence

  28. User Perseverence

  29. Example regression Coefficients for Tie Persistence • Delta_C : -0.0067 • Delta_K : 0.0008 • Delta_r : 0.0643 • R : 0.5441 • TO : 0.2311

  30. Prediction Problem • Input : • Links in panel 1 • For each link • Delta_C, Delta_K, Delta_r, R and TO (from panel 1 data) • Output : • Will a link in panel 1 persist in Panel k? K = 2,3,4,5,6

  31. Variants of Logistic regression for tie persistence prediction Using both node and tie attributes improves the prediction accuracy

  32. Comparison with rule based method LR performs better than rule based method : (R =1 & TO > 0.1) then predict 1 else 0

  33. Conclusion • To predict persistence of existing ties local network attributes does help. • LR like techniques give better accuracy than rule based techniques.

  34. Analysis of Social MediaPresentation Contribution from Skyler Speakman April 22 2010

  35. Pattern Detection through Subset Scanning (A reminder) (Neill, 2008) Find the subset of locations for a given region that has the highest score Affected locations Un-affected locations contributing to region score

  36. Connectivity Constraints Create an adjacency graph of the locations and score everyconnected subset Increase power to detect non-circular clusters

  37. Social Media • Can pattern detection work with people on ‘societal scale’ ? • Automatic (participatory sensing) • Self-reported (healthmap.org)

  38. In the News… (American Teenagers) • Texting has surpassed: • Face-to-face • Email • Instant Message • Voice calling • 1 in 3 send more than 100 texts a day Pew Internet & American Life Project

  39. Anomaly Detection through Subset Scanning We wish to maximize a scoring function over all possible connected subsets, S Assume texts ~ Poisson(bi) (learned from historical data) Provides a likelihood score that the counts in S are generated from a different distribution (Anomalous)

  40. Initial Attempt • Formed a very simple social network based off of ‘1 call’ • Add a threshold? • … Still running • Focus on a much smaller group of extremely active texters

  41. Trimming the data… • Require a threshold of monthly activity in order to be considered • 500 incoming & outgoing texts every month • 468 customers • Require a threshold of messages exchanged in order to be connected

  42. Highest scoring connected subset for a selection of days Maximum likelihood ratio score for everyday in May

  43. Conclusions • GraphScan algorithm can reasonably scale to graphs of a few hundred nodes • Performance is highly dependent on underlying graph structure • Future improvements through heuristics are possible (necessary) • Realistic anomaly detection is difficult with unlabeled data, but have demonstrated a solid proof of principle

More Related