How valuable is medical social media data? Content analysis of the medical web

How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai TzungRuei Authors :Kerstin Denecke, Wolfgang Nejdl 國立雲林科技大學 National Yunlin University of Science and Technology InfSci 2009

Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

Motivation • It is still an open question where to search for complying a specific information need due tothe large amount and diversity of information available. • Finding the best knowledge source to comply a specific information need is difficult, because relevant information can be either hidden in web pages or encapsulated in social media data such as blogs and Q&A portals. We focus on health-related information provided in the Internet. Why? First, health-related experiences and medical histories offer unique data for research purposes , for practitioners, and for patients. Second, it is still an open question whether existing text and content analysis tools are able to process medical social media data and to identify relevant (medical) information out of them.

Objective • To give an overview on content differences in the various social media resources on health-related topics. • In particular, the content of medical Question & Answer Portals, medical weblogs, medical reviews and Wikis is compared.

Methodology • Research questions • Data collection • Data analysis Which topics do the different health-related web resources focus on? What similarities and differences in content exist between different medical social media data resources? To what extent do medical blogs contain information or experiences? • Query&Answer Forums • Medical weblogs • Reviews • Wikis and encyclopedias Assessing the medical content Assessing the information type of documents

Methodology • Data collection • Query&Answer Forums • Medical weblogs • Reviews • Wikis

Methodology • Data analysis • Assessing the medical content Medical text SeReMeD A semantic representation

Methodology • Data analysis • Assessing the information type of documents Assumptions 1. Extensive use of medical terminology is an indication for informative content. 2. Adjectives are an indication for affective content.

Experiments • The manually classified posts of physicians and patients

Experiments • Result 1

Experiments • Excluded data written by nurses • Training results Assumption more patient-written posts are informative than nurse-writtenposts. contradiction

Conclusion • MAJOR CINTRIBUTION • Several conclusions can be drawn from the aforementioned results. Our hypotheses proved only to be partly true. • provide an overview on the content available in the (medical) Web. • Disorders and Physiology • Anatomy and Procedures • Drugs • FUTURE WORK • Plan to test the proposed method to identify informative (or ‘good’) answers to health-related queries in rather general Q&A portals such as Yedda, where answers can be given by any person. This could help to filter out comments and irrelevant answers. weblogs and Q&A AskDrWiki and MedlinePlus Drug reviews and AskDrWiki

Experiments • Advantage • To help to identify the best-suited information source in order to comply a specific information need. • A potential application of our algorithm is its exploitation for sorting or ranking search results within a blog post search engine. • Drawback • This work is unique in a sense that such an analysis was still missing in particular for the domain of medicine. • Application • information retrieval

How valuable is medical social media data? Content analysis of the medical web