1 / 21

Learning About Medicine by Applying Machine Learning to User Generated Content: The Case of Anorexia

Learning About Medicine by Applying Machine Learning to User Generated Content: The Case of Anorexia. Elad Yom-Tov Microsoft Research Israel. Why medicine?. People use the Internet extensively: More than 77% of USA population use the Internet

monita
Download Presentation

Learning About Medicine by Applying Machine Learning to User Generated Content: The Case of Anorexia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning About Medicine by Applying Machine Learning to User Generated Content: The Case of Anorexia Elad Yom-Tov Microsoft Research Israel

  2. Why medicine? • People use the Internet extensively: • More than 77% of USA population use the Internet • Every day, 55% of Americans use the Internet. They do so for an average of two hours. • More than 80% of Internet users search for medical information online, and significant medically-related activities happen on the Internet • Large-scale medical trials are expensive and time consuming. • Making sense of Internet data requires processing large amounts of data to produce meaningful insights *Pew survey, 2010

  3. Anorexia Nervosa

  4. A lifestyle choice? “Thin is perfection, I'll die trying to achieve it” “Anorexia is a lifestyle, not a diet” “I only feel beautiful when I'm hungry”

  5. Contacts

  6. Data: Users • All users who posted at least two photographs with a relevant tag (“thinspo”, “thinspiration”, “pro-ana”) • 162 users • All users who posted to eating disorder groups on Flickr • 71 users • Users who commented or favorited to at least two of the above-mentioned photos • 683 users

  7. Data: Photos and links • Raw data: • 543,891 photographs • 2,229,489 comments • 642,317 favorite markings • 237,165 contact links • Labeling: • Users were labeled on a 5-point scale. • Kappa = 0.51 (p<10-5)

  8. Contacts Comments Tags Favorites

  9. Tag similarity • Modeled users with a TF-IDF weighted bag-of-tags • Average Cosine similarity: • Pro-anorexia: 0.259 • Pro-recovery: 0.202 • Pro-recovery to pro-anorexia: 0.225 • ROC: 0.52 • Tag usage: • “thinspiration”: 37% pro-anorexia, 7% pro-recovery • “pro-anorexia”: 1.7% pro-anorexia, 2.4% pro-recovery

  10. Is exposing pro-anorexia users to pro-recovery comments an effective intervention?

  11. Hazard model

  12. How do they get there?

  13. Data Toolbar data over a period of 5 months, in which we identified two types of behavior: A total of 5,800,270 users searched for least one celebrity in the top 2.5% of PAS, of which 3,615 also made AASs. Anorexia queries We define anorexic activity searching (AAS) as one of the following: Tips for proana or anorexia “how to … ” and proana or anorexia. Proana buddy Celebrity queries • One of 3640 known celebrities • Each scored for the probability of them appearing in conjunction with the word “anorexia” • We refer to this probability as the Perceived Anorexia Score (PAS).

  14. Clustering • Start with a matrix of users by celebrities • 9,188,983 users by 3,640 celebrities • Cluster using k-means with cosine similarity • Clusters are statistically significant by PAS, but not by occupation.

  15. Hazard models

  16. Adding the media effect • The Spearman correlation between the number of queries for a celebrity and the number of tweets was 0.63, so the bigger the peak (the “media buzz”), the more searches will occur. • When focusing on queries and tweets which mentioned anorexia, this correlation is 0.68. • AAS searchers were 1.9 times more likely to query for a high PAS celebrity in the days following a media peak compared to all other people, and 2.4 times more likely when the peak was associated with anorexia.

  17. Hazard models revisited

  18. Why is this interesting?

  19. Summary • As people spend ever more time on the Internet, they generate content which we can use to understand (and later hopefully improve) health and healthcare • This content is especially useful when: • People have less of an incentive to lie, compared to the real world • Collecting data in the real world is hard • Activity is largely web-driven • BUT: Making sense of so much data requires integrating Machine Learning research with medical practice.

  20. Questions?

More Related