1 / 19

Building an Augmented Index for Genomic Information Retrieval

Building an Augmented Index for Genomic Information Retrieval. Hohyon Ryu , Xiangming Mu, Kun Lu University of Wisconsin-Milwaukee School of Information Science Information Intelligence & Architecture Research Lab. Problems in Genomic Information Retrieval. Introduction.

halia
Download Presentation

Building an Augmented Index for Genomic Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building an Augmented Index for Genomic Information Retrieval HohyonRyu, Xiangming Mu, Kun Lu University of Wisconsin-Milwaukee School of Information Science Information Intelligence & Architecture Research Lab

  2. Problems in Genomic Information Retrieval

  3. Introduction

  4. Augmented Index using Neural Networks Document Collection Neural Networks TF, IDF Baseline Index Training Part of speech Word Location (FO, LO, WD) TF, IDF Keywords and Keyphrases Author-assigned Keyphrase A: information B: science Augmented Index

  5. Keyword Extraction: Learning (Training) Author Assigned Keyword? 1 or 0 Hidden Layer TF*IDF Part of Speech First Occurrence Last Occurrence Word Distribution Various Features of Each Word

  6. Keyword Extraction: Learning (Training) Keyword Suitability Score 0≤x≤1 Hidden Layer TF*IDF Part of Speech First Occurrence Last Occurrence Word Distribution Various Features of Each Word

  7. Example of Automatically Extracted Keywords and Keyphrases

  8. Performance of the Keyword Extraction

  9. Performance of the Keyphrase Extraction

  10. Test Collection

  11. Text Retrieval Experiment 26 TREC Queries Indri Search Engine (Based on Lemur and Language Modeling) Baseline Index MAP Mean Average Precision Augmented Index MAP Mean Average Precision

  12. Results + 4.54% (df=25, p<0.01) + 3.12% (df=25, p<0.05)

  13. AP Difference by Topic (for top 50 returned documents)

  14. Topic 176 Retrieval by the augmented Index Baseline Retrieval MAP=0.25 MAP=0.32 1 2 … 6 7 8 … 14 15 16 17 18 19 20 1 2 … 6 7 8 … 14 15 16 17 18 19 20 12426234 CFTR: 94cystic: 59degradation: 3fibrosis: 60Sec61: 0 16166089 CFTR:191cystic: 21degradation: 9fibrosis: 21Sec61: 20 Neural network processing 16166089 CFTR:182cystic: 6degradation: 9fibrosis: 6Sec61: 14 12426234 CFTR: 106cystic: 71degradation: 3fibrosis: 72Sec61: 0 Irrelevant Document Relevant Document

  15. Topic 170 Retrieval by the augmented Index Baseline Retrieval MAP=0.87 MAP=1 1 2 3 4 5 … 17 18 19 20 21 … 38 39 40 1 2 3 4 5 … 17 18 19 20 21 … 38 39 40 11799116 CFTR: 247endoplasm: 15reticulum: 16 11799116 CFTR: 238endoplasm: 3reticulum: 4 Neural network processing 15459206 CFTR: 12endoplasm: 5reticulum: 5 15459206 CFTR: 12endoplasm: 5reticulum: 5 Irrelevant Document Relevant Document

  16. Topic 183 Retrieval by the augmented Index Baseline Retrieval 16106028 NM23: 14development: 4gene: 14mutation: 0tracheal: 0 MAP=0.59 MAP=0.56 10952986 NM23: 91development: 1gene: 1mutation: 2tracheal: 0 … 10 … 23 … 30 31 32 33 34 35 36 37 38 … 44 … 10 … 23 … 30 31 32 33 34 35 36 37 38 … 44 Neural network processing 14960567 NM23: 174development: 9gene: 9mutation: 45tracheal: 0 16106028 NM23: 2development: 4gene: 2mutation: 0tracheal: 0 10952986 NM23: 109development: 1gene: 1mutation: 2tracheal: 0 14960567 NM23: 159development: 9gene: 9mutation: 21tracheal: 0 Irrelevant Document Relevant Document

  17. Conclusion

  18. Implications for Genomic Information Retrieval

  19. Discussion

More Related