1 / 26

Retweeting Behavior and Spectral Graph Analysis in Social Media

Retweeting Behavior and Spectral Graph Analysis in Social Media. Xintao Wu Jan 18, 2013 . Social Media Customer Analytics . Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy. Network topology. Structured profile.

blaine
Download Presentation

Retweeting Behavior and Spectral Graph Analysis in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Retweeting Behavior and Spectral Graph Analysis in Social Media Xintao Wu Jan 18, 2013

  2. Social Media Customer Analytics • Entity resolution • Patterns • Temporal/spatial • Scalability • Visualization • Sentiment • Privacy • Network topology • Structured profile • Customer profile • Customer transaction • Inventory • Product desc and review • … • Retweet sequence • Unstructured text (e.g., blog, tweet)

  3. Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection

  4. Multi-factor interaction analysis • For each following relationship , what factors affect the user A’s decision on whether to forward messages from B to A’ s followers? • We examine users’ retweet behaviors by using various features • Power ratio (A) • Link structure (B) • Location factor (C) • Gender factor (D) • … • We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.

  5. Interpreting interaction effect

  6. Interpretation example • Neither gender nor location has any significant effect on retweeting solely. • However, considering link structure, • Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. • Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users.

  7. Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection

  8. Retweet Sequence • Information dynamically flows through the network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2

  9. Retweet Sequence • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2

  10. Flow Through Tree Structure • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2

  11. Flow Through Tree Structure • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2

  12. WISE12 Challenge • SinaWeibo • # of user: 5,636,858 • # of tweets: 46,584,914 • # of retweets: 190,920,026 • 33 test messages • each with 100 initial retweets • composed by 27 users • from 6 events • For each message, predict • M1: the number of retweets in 30 days • M2: the number of possible-views in 30 days

  13. Idea • We treat retweeting activities of each original message in the training data as a time series • Each value corresponds to the number of times that the original message during time period t • For each message in the test data Known from 100 retweets Use ARMA to predict

  14. Prediction Result Death of Steve Jobs Yao Jiaxin Murder Case Xiaomi Release Xiaomi Release Runner-up award (2nd place) on WISE 2012 Challenge – Mining Track.

  15. Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection

  16. Bursts Peak Time Duration Time

  17. Topic

  18. Retweet vs. Time

  19. Retweet vs. Time

  20. Burst Analysis : Users • Top 100 users tend to have: shorter path length, shorter peak time, shorter duration time.

  21. Burst Prediction • Extract features • User related including profile and history information • Tweet-related including time series and retweet tree • Run classifiers • Logistic regression • Random forest • Decision tree • Naïve bayes • SVM • KNN • Achieve 83.2% accuracy

  22. Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection

  23. Spectral graph analysis Spectral coordinate: Polbook Network

  24. Accuracy of AdjCluster • Lap [Miller and Teng 1998]: Laplacian based • Ncut[Shi and Malik, 2000]: Normalized cut • HE’ [Wakita and Tsurumi, 2007]: Modularity based agglomerative clustering • SpokEn[Prakashet al., 2010]: EigenSpoke Accuracy: where :the i-th community produced by different algorithms Refer to IJCAI 11 for details

  25. SPCTRA fraud detection • Evaluation on Web spam challenge data 100-1000 times faster GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008] Refer to ICDE11details.

  26. Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation CNS-0831204 and CCF-1047621, and UNC Charlotte Chancellor’s Special Fund .

More Related