1 / 31

Inferring Hidden Relationships from Biological Literature with Multi-level Context T erms

Inferring Hidden Relationships from Biological Literature with Multi-level Context T erms. Introduction. Literature Based Discovery (LBD). PKC1. 3. 8. Alzheimer. Insulin. CATS. 5. 9. Drug repositioning. 4. 2. SOS2. Swanson’s ABC model.

keren
Download Presentation

Inferring Hidden Relationships from Biological Literature with Multi-level Context T erms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring Hidden Relationships from Biological Literature with Multi-level Context Terms

  2. Introduction • Literature Based Discovery (LBD) PKC1 3 8 Alzheimer Insulin CATS 5 9 Drug repositioning 4 2 SOS2 Swanson’s ABC model

  3. Literature-based discovery (LDA)? ---the very idea. • It means deriving, from the public record of science new solutions to scientific problems. • The possibility arises, for example, when two articles considered together for the first time suggest new information of scientific interest not apparent from either article alone.

  4. Venn Diagram -- ABC Model Articles about an AB relationship. A C B BC AB Articles about a BC relationship. AB and BC are complementary but disjoint : They can reveal an implicit relationship between A and C in the absence of any explicit relation.

  5. An ABC example based on title words in Medline The relation of migraine and epilepsy. Brain 92: 285-300, 1969 Magnesium-deficient rat as a model of epilepsy. Lab Animal Sci 28:680-5, 1978 45 22 A magnesium 8011 C migraine 2756 B epilepsy An unintended link Venn diagram: sets of Medline records; A,C are disjoint.

  6. Related work • CTD • A manually curated database. • Inferring chemical – gene – disease relations using ABC models • Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (2008, NAR)

  7. Related work • CoPub discovery • Co-occurrence score based ABC models • Inferring diseases, genes, drugs relations • Literature Mining for the Discovery of hidden connections between drugs, genes and diseases (2010, PLOS computational biology)

  8. MeSH Terms

  9. Objective • Objective • Inferring hidden drug-disease relations accurately from the literatures • Limitations on previous models • generate large volume of false positive candidate relations • are semi-automatic, labor-intensive technique requiring human experts’ input. • Solution strategy • Incorporate context information into relation inference

  10. Suggested approaches • Ourapproach • Key Idea • Inferring drug-disease relations based on context term similarity • Drug - gene relation • Disease - gene relation • Our hypothesis • The similarity of context terms between Drug-Gene and Gene-Disease model enables to infer more meaningful Drug-Disease relations. Similarity Disease Drug Context Context Gene

  11. Suggested approaches • Context vector • Set of bio-medical terms in paper abstracts Interaction Bio-medical Term Abstact1 Abstact2 Average A context vector of an interaction

  12. Suggested approaches Score Comparison - All frequencies V.S. - Context similarity based filtered frequencies • Similarity measures and score comparison Insulin Alzheimer Scoring PCK1 Context Vectors Similarity Measures - Cosine similarity - Spearman Correlation Similarity Measure

  13. Suggested approaches Answer set Known disease-drug interactions (PharmGKB) (1,992) Known disease-drug interactions (CTD) (336,693) • Experiment overview Scored Result Evaluation Entity Dictionary Interaction Extraction Prev. model VS Our model Drug – Disease Inference Performance analysis PubMed Abstracts Entity Tagging Context Vector Extraction Literature analysis CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

  14. Dataset UMLS • 96,031 disease, 45,527 gene, 6,132 symptom synonyms PharmGKB • 25,693 disease, 28,091 drug, 258,840 gene synonyms CTD • 68,211 disease, 384,141 chemical, 679,701 gene synonyms. Pubmed • 77,711 Alzheimer's disease related abstracts

  15. Multi-level entity recognition • Dictionary based entity recognition from the abstracts. • We import data from three external databases to generate the multi-level entity dictionaries: PharmGKB, CTD, and UMLS. • We define the entity levels of the dictionaries into four different levels: gene, drug, disease, and symptom.

  16. Multi-level entity recognition • From Alzheimer’s disease related 77,711 abstracts • Parse the sentences using the Condition Random Field (CRF) based sentence detector • Extract Bio-medical entities using LingPipe • Match the extracted entities with PharmGKB, and CTD entity dictionary databases to extract interaction data • Map the extracted entities to the UMLS entity dictionary database to extract members of context vector

  17. Suggested approaches Answer set Known disease-drug interactions (PharmGKB) (1,992) Known disease-drug interactions (CTD) (336,693) • Experiment overview Scored Result Evaluation Entity Dictionary Interaction Extraction Prev. model VS Our model Drug – Disease Inference Performance analysis PubMed Abstracts Entity Tagging Context Vector Extraction Literature analysis CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

  18. Interaction Extraction • To extract biologically meaningful interactions, we limited to extract the patterns of ‘drug - gene’ and ‘gene - disease’ from the recognized entities. • We generated entity dictionaries from PharmGKB and CTD databases. PharmGKB and CTD have different number of terms, so their tagging results are different from each other. • We tagged biological entities from PubMed records. After we tagged them, we extracted candidate interactions when two different types of entities are co-occurred within a sentence.

  19. Suggested approaches Answer set Known disease-drug interactions (PharmGKB) (1,992) Known disease-drug interactions (CTD) (336,693) • Experiment overview Scored Result Evaluation Entity Dictionary Interaction Extraction Prev. model VS Our model Drug – Disease Inference Performance analysis PubMed Abstracts Entity Tagging Context Vector Extraction Literature analysis CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

  20. Evaluation method • We compare our method to the ABC model that is based on entity frequency in Alzheimer’s disease related abstracts. • The comparison was made for top 100, 500 results • Literature analysis in top 10 ranked interactions. ABC model VS Our model ABC model VS Our model PharmGKB (1,992) CTD (336,693) Answer set

  21. Results Entity Tagging • From 77,711 abstracts related with “Alzhemier • 1,640,761 biomedical entities • 295,419 were tagged by the PharmGKB entity dictionary • 438,987 were tagged by the CTD entity dictionary • 260,291 were tagged by the UMLS entity dictionary Interaction Extraction • PharmGKB tagged entities • From 60,415 interactions • We inferred 14,481 new disease-drug interactions • CTD tagged entities • From 119,464 interactions • We inferred 136,570 interactions • Size of context vector • 1,641 terms

  22. Results PharmGKB • The PharmGKB case does not achieve outstanding performance (between 0%~1%). • The weak performance is attributed to the fact that PharmGKBhas only 1,992 drug-disease interactions. • Furthermore, our dataset was not all PubMed abstracts but only Alzheimer’s disease related context.

  23. Results CTD • The Context based approach is superior to the baseline in all cases (Top 100, 500). • When we filtered the inferred interactions using the context term based similarity, we observed that it helped improving performance, which is better than the frequency used only.

  24. Results Top 10 ranked interactions (CTD based)

  25. Results • Alzheimer’s disease - Insulin • A low score case (0.28) Alzheimer – CATS – Insulin • A relatively high score case (0.95) Alzheimer- CYC-1- Insulin

  26. Conclusion • We suggested context-vectors to infer unknown relationships based on biologically meaningful terms. • We constructed multi-level entity dictionary to recognize multi-level entities from the literature. • We utilized our context vectors to discover putative drugs and diseases relationships. • We evaluated the results by drug-disease relations which are curated from the literature.(PharmGKB, CTD). • In the Alzheimer’s disease 77,711 papers, we found that our context vector based hybrid approach has better precision than previous frequency based ABC model.

  27. Future Study: Difference Approach to Context Terms • Based on Interaction words (verb terms), define possible direct interaction among entities, and assume that interactions among the rest of entities are context. Sentence 2 C-Ent I-Ent1 I-verb I-En2 C-Ent C-Ent Sentence 3 C-Ent C-Ent I-En1 I-verb I-Ent2 C-Ent Sentence 1 I-Ent1 I-verb I-En2 C-Ent C-Ent C-Ent

  28. Future Study

  29. Questions? • Thank you! Questions? Thank You!

More Related