1 / 24

Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models

Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models. Liangzhe Chen , K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Science at Virginia Tech. Introduction: Surveillance. How to estimate and predict flu trends?.

sespinoza
Download Presentation

Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Scienceat Virginia Tech

  2. Introduction: Surveillance • How to estimate and predict flu trends? Surveillance Report Hospital record Lab survey Population survey

  3. Introduction: GFT& Twitter • Estimate flu trends using online electronic sources So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying.

  4. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  5. Observation 1: States • There are different states in an infection cycle. • SEIR model: 1. Susceptible 2.Exposed 3. Infected 4.Recovered

  6. Observation 2: Ep. & So. Gap • Infection cases drop exponentially in epidemiology (Hethcote 2000) • Keyword mentions drop in a power-law pattern in social media (Matsubara 2012)

  7. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  8. HFSTM Model • Hidden Flu-State from Tweet Model (HFSTM) • Each word (w) in a tweet (Oi) can be generated by: • A background topic • Non-flu related topics • State related topics Latent state Initial prob. Transit. switch Transit. prob. Binary non-flu related switch Binary background switch Word distribution

  9. HFSTM Model • Generating tweets Generate the state for a tweet Generate the topic for a word State: [S,E,I] Topic: [Background, Non-flu, State] restaurant good S: This is really E: The movie was good but was it freezing I: I think I have flu

  10. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  11. Inference • EM-based algorithm: HFSTM-FIT • E-step: • At(i)=P(O1,O2,…,Ot,St=i) • Bt(i)=P(Ot+1,…,OTu|St=i) • γt(i)=P(St=i|Ou) • M-step: • Other parameters such as state transition probabilities, topic distributions, etc. • Parameters learned:

  12. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  13. Vocabulary & Dataset • Vocabulary (230 words): • Flu-related keyword list by Chakraborty SDM 2014 • Extra state-related keyword list • Dataset (34,000 tweets): • Identify infected users and collect their tweets • Train on data from Jun 20, 2013-Aug 06, 2013 • Test on two time period: • Dec 01, 2012- July 08, 2013 • Nov 10, 2013-Jan 26, 2014

  14. Learned word distributions • The most probable words learned in each state Probably healthy: S Having symptons: E Definitely sick: I

  15. Learned state transition Transition probabilities Transition in real tweets Learned by HFSTM: Not directly flu-related, yet correctly identified

  16. Flu trend fitting • Ground-truth: • The Pan American Health Organization (PAHO) • Algorithms: • Baseline: • Count the number of keywords weekly as features, and regress to the ground-truth curve. • Google flu trend: • Take the google flu trend data as input, regress to the PAHO curve. • HFSTM: • Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO.

  17. Flu trend fitting • Linear regression to the case count reported by PAHO (the ground-truth)

  18. Bridging the Ep. & So. Gap • Select some flu-related keyword • Plot its number of mentions w.r.t time • Identify the fall-part • Fit the fall-part with exponential functions, and power law.

  19. Bridging the Ep. & So. Gap • Fitting the fall-part with power-law and exponential functions

  20. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  21. Conclusions • HFSTM: • infers biological states for twitter users. • learns word distributions and state transitions. • helps predict the flu-trend. • reconciles the social contagion activity profile to standard epidemiological models.

  22. Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work

  23. Future work • A possible issue with HFSTM • Suffer from large, noisy vocabulary. • Semi-supervision for improvement • Introduce weak supervision into HFSTM.

  24. Code at:http://people.cs.vt.edu/~liangzhe Questions? Naren Ramakrishnan Liangzhe Chen K. S. M. Tozammel Hossain Patrick Butler B. Aditya Prakash Funding:

More Related