1 / 19

MedSearch : A Specialized Search Engine for Medical Information Retrieval

MedSearch : A Specialized Search Engine for Medical Information Retrieval. Presented by Navyasri Canumalla Paper by Gang Luo , Chunqiang Tang , Hao Yang , Xing Wei. Overview. Motivation MedSearch Objectives Challenges Approach Implementation Experimental Setup Conclusion.

magda
Download Presentation

MedSearch : A Specialized Search Engine for Medical Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MedSearch: A Specialized Search Engine for Medical Information Retrieval Presented by NavyasriCanumalla Paper by Gang Luo , Chunqiang Tang , Hao Yang , Xing Wei

  2. Overview • Motivation • MedSearchObjectives • Challenges • Approach • Implementation • Experimental Setup • Conclusion

  3. Motivation • Medical information searcher is uncertain about his exact question. • Prefers to give long queries describing the symptoms in plain English. • Unfamiliar with Medical terminology and uses web search to better digest information obtained from doctors afterwards.

  4. Limitations of existing Medical Search Engines: -They impose certain limits on query length. Eg. Google and Healthline are 32 words and 20 words , respectively - Cannot suggest diversified,related medical phrases, if query is in plain English • A medical information searcher prefers the search engine to suggest diversified, related medical phrases that help him quickly digest search results and refine the query.

  5. MedSearch Objectives • Patient can use MedSearch to facilitate preliminary self diagnosis. • Patient can use MedSearch to better prepare for doctor’s appointments. • Patient can use MedSearch to help him digest the information that he does not fully understand after consulting the doctor. • Patient can use MedSearch to find more information and clarify his symptom description.

  6. Challenges • Rewrite long queries without losing info • When ranking the suggested medical phrases, it has to resolve the terminological discrepancy between medical phrases and queries written in plain English.

  7. Approach • MedSearch crawls Web pages from a few selected, high-quality medical Web sites. • Calculates relevance score by ranking documents. • MedSearch makes use of the Medical Subject Headings (MeSH) ontology , a standard vocabulary edited by the National Library of Medicine to generate medical phrases.

  8. Implementation MedSearch processes a medical query Q in the following steps: Step 1: Remove stopwords from Q. Step 2: Rewrite Q into a moderate length if it is too long. Step 3: Produce diversified search result pages. Step 4: Generate snippets. Step 5: Suggest related medical phrases

  9. Step 2: Rewriting Queries • MedSearch uses a length threshold lT= 10 • If lT< ||Q|| , MedSearch treats Q as a long query and rewrites into a moderate-length query Q’ by selectively dropping unimportant terms • Terms in query are ranked according to tfxidfvalue and those with largest tf×idf values are kept. • An upper bound U on the length of the modified query, U = 80

  10. Step 3: Diversifying Search Results • Cluster results in collection C into k clustersusing K-means clustering. • Aconstant j=20 is chosen and each cluster contributes the highest relevance score(in that cluster) result to top-j results. • Both relevance and diversity are judged using a single metric: usefulness. Webpage P is useful thenscoreu(P ) = 1 ,otherwise 0

  11. For the returned top-20 Web pages, their weighted average usefulness score is defined as

  12. When K is too small, relevant Web pages tend to gather in the same clusters • When K is too large, the clustering effect is not significant • K = 1500

  13. Step 4: Generating Snippets • For each such snippet sn, MedSearch highlights the medical phrases and the top-3 common terms between snand the query Q Step 5: Suggesting related Medical Phrases • For each query, MedSearch suggests V related medical phrases, where V is 60

  14. Sub-step 1: Generating Candidate Set • MedSearchselects V distinct medical phrases with thelargest tf×idfvalue from the returned top-J Web pages to form a candidate set S. Sub-step 2: Ranking Medical Phrases • For each medical phrase M retrieve the top-ranked r Web pages in C and use them as M’s representative Web pages • Compute the relevance score between M and Qas a weighted average of the relevance scores between Q and M’s representative Web pages

  15. To achieve good performance, it is best to set r=1.

  16. Experimental Setup • Crawled 20GB of Web pages from WebMD, one of the most popular medical Web sites. • Fed MedSearch with natural medical queries extracted from the Med Help International Medical and Health Forum.

  17. Conclusion • MedSearchis a specialized Web search engine for medical information retrieval • MedSearch supports queries written in plain English, accepts long queries, provides diversified search results, and suggests related medical phrases with proper ranking and annotation • These features are attractive to ordinary Internet users who have little medical knowledge and are unfamiliar with medical terminology

  18. Thank You

More Related