1 / 38

Resolving Healthcare Forum Posts via Similar Thread Retrieval

Resolving Healthcare Forum Posts via Similar Thread Retrieval. Jason H.D. Cho 1,2 , Parikshit Sondhi 1 , Chengxiang Zhai 1 , Bruce R. Schatz 1,2,3 1 Department of Computer Science, 2 Institute of Genomic Biology, 3 Department of Medical Information Science,

harva
Download Presentation

Resolving Healthcare Forum Posts via Similar Thread Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resolving Healthcare Forum Posts via Similar Thread Retrieval Jason H.D. Cho1,2, Parikshit Sondhi1, Chengxiang Zhai1, Bruce R. Schatz1,2,3 1Department of Computer Science, 2Institute of Genomic Biology, 3Department of Medical Information Science, University of Illinois at Urbana-Champaign, Urbana, IL

  2. Motivation • 72% of internet users looked online for health information within the past year • 18% of internet users have gone online to find others who might have health concerns similar to theirs • Improving health information retrieval and similar case retrieval will improve quality of search for vast majority of users • Not many posts are answered in timely manner! * Pew Research http://www.pewinternet.org/

  3. Motivation

  4. Envisioned Response • The following threads discuss similar problems: • Doritos Allergy Very Severe and New • Certain Foods + Beer = Flushing and Head Pounding…Help! • Peanut/Food Allergies • ……………………

  5. Case Retrieval Task • Traditionally defined as retrieving relevant cases doctors may be interested in • Doctors may want to compare cases that are similar to the current patient • In online domain, we define this as retrieving forum posts written by patients • We tackled cases where we do not know user’s background

  6. Query Characteristics • Queries meant for human experts not automated systems • Simple non-technical language • Presence of emotional statements

  7. Document Characteristics

  8. Our Goal • How can we improve case retrieval search task? • How should we represent queries? • Entity-based search, or context-based search? • Which posts are most informative in a given thread? • Can we utilize forum categories?

  9. Evaluation via Pooling • 350K threads and 20 queries from HealthBoards • 2 judges first judged 100 query-thread pairs • 88% agreement (κ=0.76) • 730 total judged query-thread pairs • 324 relevant • 406 irrelevant

  10. Method Summary Q: How should we represent queries? • Baseline weighting • First Post BM-25 • Thread BM-25 • Semantic weighting • Medical term extraction • Shallow Information Extraction • Post weighting • Monotonic weighting • Parabolic weighting • Forum Category weighting • Uniform weighting (FCUW) • Feedback weighting (FCFW)

  11. State of the Art Baseline Baseline BM-25 formula: c(w,t):Count of word win thread t c(w,q):Count of word win query q FPBM-25: Consider only the content of first post to represent the thread document TBM-25: Consider content of entire thread to represent the thread document

  12. Results: Query Representation Comparison Representing first post as query is better than utilizing all of the posts

  13. Method Summary Q: Which one works better? Entity-based search, or context-based search? • Baseline weighting • First Post BM-25 • Thread BM-25 • Semantic weighting • Medical term extraction • Shallow Information Extraction • Post weighting • Monotonic weighting • Parabolic weighting • Forum Category weighting • Uniform weighting (FCUW) • Feedback weighting (FCFW)

  14. Medical Entity Extraction Applied ADEPT toolkit (MacLean and Heer 2013) High precision but low recall

  15. MedicalEx: Relevance Scoring Modified query frequency Count of occurrences labeled as med entity Count of occurrences not labeled as med entity

  16. Shallow Information Extraction I am severly allergic to some product that is found in both Tostitos and Doritos, as well as random other types of chips. I know the solution is "don't eat chips" but what could the product be?I don't want to accidentally consume it. When I eat this, I get very bad stomach cramps and it ruins the rest of my day/night - the only solution is to go to sleep so I can't feel it.Help! Any ideas on this? Sondhi, 2010 Physical Examination (PE) Disease, Symptoms Medication (MED) Treatment, Prevention Background (BKG) Neither PE nor MED

  17. ShallowEx: Relevance Scoring Modified Query Count Word count in PE sentences Word count in MED sentences Word count in BKG sentences Give higher importance to PE and MED sentences

  18. Results: Semantic Methods Shallow extraction is better than medical entity extraction

  19. Method Summary Q: Which posts are most informative in a given thread? • Baseline weighting • First Post BM-25 • Thread BM-25 • Semantic weighting • Medical term extraction • Shallow Information Extraction • Post weighting • Monotonic weighting • Parabolic weighting • Forum Category weighting • Uniform weighting (FCUW) • Feedback weighting (FCFW)

  20. Post Weighting Not all posts are equally representative Sondhi, 2013

  21. Post Weighting : gives the weight of post i in a thread with K posts

  22. Monotonic Post Weighting Relative Post Weight for K=10 Post Position i

  23. Parabolic Post Weighting

  24. Post Weighting Methods Evaluation

  25. Results: Post Weighting Both post weighting schemes outperform the baseline

  26. Method Summary Q: Can we utilize forum categories? • Baseline weighting • First Post BM-25 • Thread BM-25 • Semantic weighting • Medical term extraction • Shallow Information Extraction • Post weighting • Monotonic weighting • Parabolic weighting • Forum Category weighting • Uniform weighting (FCUW) • Feedback weighting (FCFW)

  27. Forum Categories

  28. Forum Category Weighting Ratio of current forum ID amongst retrieved documents Randomly selecting forum ID • Relevance feedback based on top k retrieved categories • Forum Category Uniform weighting (FCUW) • Forum Category Feedback weighting (FCFW)

  29. Forum Category Weighting Scoring Weights for forum category weighting New Score Forum Category Feedback weighting

  30. Results: Forum Category Weighting Uniform weighting and Feedback weighting similar performance, but FCFW less parameters to tune.

  31. Results: Method Combinations Monotonic + ShallowEx performs the best

  32. Conclusion • Fairly high P@5 accuracy is achievable • Treating first post as query performed the better than utilizing all posts in thread • Shallow information extraction is better for query understanding • Incorporates contextual information • Utility of posts drops steadily with position • Easy extension of baseline method

  33. Future Work • Recommending relevant forum posts for doctors • Various online forums have ‘ask a doctor’ section • Doctors will save time by recommending forum posts • Intent-based case retrieval • Identifying intents for both the end user and the existing posts will improve search quality • Examples: Cause of symptom, managing disease, adverse effects

  34. Acknowledgements This work is supported in part by the National Science Foundation under Grant Number CNS-1027965. We would also like to thank the anonymous reviewers for their invalu- able feedback, and Institute of Genomic Biology for their computing resources.

  35. Questions? Thank you!

  36. References J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research. ACM BCB, 2013 J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving Healthcare Forum Posts via Similar Thread Retrieval. ACM BCB, 2014 K. Pattabiramanand P. Sondhi and C. Zhai, Exploiting Forum Thread Structures to Improve Thread Clustering. ICTIR 2013. P. Sondhiand M. Gupta and C. Zhaiand J. Hockenmaier, Shallow Information Extraction from Medical Forum Data. COLING 2010. B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events from Personal Health Messages, AMIA 2011 Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.

  37. Features & Performance of Shallow Information Extraction Method

  38. ShallowEx: Extraction Model We use the best performing SVM based classifier (Posts: 175, Sentences: 1494)

More Related