1 / 6

Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question Answering

J. Gregory Caporaso Center for Computational Pharmacology Department of Biochemistry and Molecular Genetics University of Colorado Health Sciences Center. Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question Answering. gregcaporaso@gmail.com

karsen
Download Presentation

Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. J. Gregory Caporaso Center for Computational Pharmacology Department of Biochemistry and Molecular Genetics University of Colorado Health Sciences Center Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question Answering gregcaporaso@gmail.com http://hsc-turing.uchsc.edu

  2. Q: What is the role of huntingtin in Huntington's Disease? Q: What is the role of DRD4 in alcoholism? Q: What is the role of DRD4 in alcoholism? Q: What is the role of DRD4 in alcoholism? dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf DfsfdsTYYfjlgdf Sdfsl Pkdf sdf Asdfs Ldfsdf Fsdf Psdf by Kilgore Trout dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf dfsfdsfjlgdfsdfslkdfsdfsdfsdfsdfsdfsdf A: Repeats in the huntingtin gene cause protein aggregation and likely cause HD. Q: What is the role of DRD4 in alcoholism? Q: What is the role of DRD4 in alcoholism? Q: What is the role of DRD4 in alcoholism? TREC Genomics 2006 28 topic questions • Mean average precision at document, passage, and aspect levels • Relevance judged by human domain experts 162,259 full-text journal articles Up to 1000 passages per topic questions

  3. uchsc1 uchsc2 uchsc3 • Document processing: HTML parsing, paragraph splitting, document zoning • Section filtering • Indri indexing • Query expansion • Strict • Loose • Indri query, pseudo-relevance feedback • Result expansion with Latent Semantic Analysis • Naïve Bayes-based result filtering Plasmids and cell lines Concluding remarks Acknowlegements Materials and Methods Experimental Methods Results Acknowledgments Discussion Aknowledgements res-disc-conc introduction acknowledgments UCHSC approach Abstract Conclusions Introduction Introduction References Methods

  4. Above mean on each performance metric Performed well where median was low UCHSC results and conclusions

  5. Best systems: indexed paragraphs, expanded queries, selected sentences Aspect uchsc2 Passage Mean average precision Document mean average precision Graph modified from: TREC 2006 Genomics Track Overview, W. Hersh, A.M. Cohen, P. Roberts, H.K. Rekapalli, TREC 2006 Notebook (68), November 2006. Genomics track results and conclusions Successful techniques included concept-based query expansion, acronym expansion Unsuccessful techniques included cluster-based reranking

  6. Larry Hunter Bill Baumgartner Kevin Cohen Lynne Fox Helen Johnson Hyunmin Kim Anna Lindemann Zhiyong Lu Olga Medvedeva Elizabeth White TREC 2006 co-authors Future Directions TREC Genomics 2007: same corpus, very similar or identical task Gold-standard will be released allowing for in-depth comparison of methods Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question-Answering, J. G. Caporaso, W.A. Baumgartner, Jr., H. Kim, Z. Lu, H.L. Johnson, O. Medvedeva, A. Lindemann, L. Fox, E.K. White, K. B. Cohen, L. Hunter. TREC 2006 Proceedings (723), November, 2006. gregcaporaso@gmail.com http://hsc-turing.uchsc.edu

More Related