1 / 24

Biosemantics group

Biosemantics group. Martijn Schuemie. Overview. The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile creation Nucleolus. Biosemantics group. ErasmusMC University Medical Center Rotterdam Department of Medical Informatics Biosemantics group

jmazurek
Download Presentation

Biosemantics group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biosemantics group Martijn Schuemie

  2. Overview • The biosemantics group • Ontology assembly • Concept tagging • Homonym disambiguation • Concept profile creation • Nucleolus

  3. Biosemantics group • ErasmusMC University Medical Center Rotterdam • Department of Medical Informatics • Biosemantics group • Jan Kors • Barend Mons • Erik van Mulligen • Martijn Schuemie • Rob Jelier • Kristina Hettne • Antoinne van Veldhoven

  4. Biosemantics group Biosemantics • Molecular Biology • High througput experiment data (genomics and proteomics) • Gene and protein databases, MEDLINE, Gene Ontology Biosemantics • Concept-based text-mining • Interpretation of experiment data • Knowledge discovery

  5. Entrez Gene Swiss-Prot HUGO Combination P=37%, R=76% ABC1 -> ABC-1 DEF3 -> DEF-III Add spelling variations CO2, membrane-bound obesity, open reading frame Remove highly ambiguous terms P=50%, R=75% Ontology assembly

  6. Malaria fever is a disease. It is spread by mosquitos. MEDLINE text Sentence splitting [Malaria fever is a disease.] [It is spread by mosquitos.] [Malaria] [fever] [is] [a] [disease] Tokenization Word normalisation [malaria] [fever] [be] [a] [disease] Concept mapping [malaria fever] C24530 [disease] C12634 PSA -> Prostate Specific Antigen or Poultry Science Association? Homonym disambiguation Concept profile of text Concept tagging

  7. Homonym disambiguation • Some simple rules: • Is it likely that a term has multiple meanings? • - 3-letter-acronym (e.g. PSA): highly likely • - long forms (e.g. Prostate Specific Antigen): highly unlikely • - terms that refer to several concepts by definition • Is a synonym found? (e.g. “KLK3 (PSA)”) • Is a keyword found? (e.g. “PSA is secreted by the prostate”) • These simple rules change performance from P=50%, R=75% to P=71%, R=71%.

  8. Homonym disambiguation Concept profile of Prostate Specific Antigen Similarity? Concept profile of text containing PSA Concept profile of Phosphoserine Aminotransferase Unknown meaning Previous tests showed an overall accuracy of 93%

  9. From databases • By concept mapping Text Text Text Concept Concept profile of text Concept profile of text Concept profile of text Concept profile of concept Concept profile creation

  10. Concept profile creation Uncertainty cf. X IDF Log likelihood Binary

  11. Concept profile creation Profile of gene ESR1: estrogen receptor 1 breast neoplasm 0.5 BRCA1 0.34 PGR 0.30 Estrogen 0.28 BRCA2 0.25 TP53 0.15 gene suppressor tumor 0.12 genetics polymorphism 0.12 genetic predisposition to disease 0.10 female 0.05

  12. Concept profile comparison

  13. Concept profile comparison

  14. Nucleolus • main function: ribosome biogenesis • over 700 proteins identified and classified into 8 main categories

  15. Concept profile of text Concept profile of text Concept profile of text Concept profile of protein Nucleolus – Concept profiles • From databases MEDLINE article MEDLINE article MEDLINE article Protein

  16. Nucleolus – Concept profiles BLAST (Basic Local Alignment Search Tool) Query: nucleolar protein • Results: homologs in • human • mouse • fruitfly • yeast

  17. Nucleolus – Concept profiles

  18. Nucleolus – fun with protein profiles • 2D visualization of high-dimensional space • Automatic functional annotation of proteins • Finding similar proteins

  19. Nucleolus - visualisation Exosome comp. 10 P98179 O43390 SRP PARN Multi-Dimensional Scaling Q8N220

  20. Concept profile of text Concept profile of text Concept profile of text Concept profile of GO term Nucleolus – Assigning GO terms • From GO MEDLINE article MEDLINE article MEDLINE article GO term

  21. Nucleolus – Assigning GO terms AuC : Area under Curve

  22. Nucleolus – Assigning GO terms ‘Mistakes’ in automatic annotation • Manual assignment to one category only • e.g. SFRS protein kinase 1plays a role in splicing, • but is also in kinase • Assumptions do not always hold • Sequence homology ≠ function homology • Concept co-occurrence ≠ functional relationship • Homonyms

  23. Nucleolus – Finding new proteins Concept profile of human protein Concept profile of nucleolar protein Concept profile of human protein Concept profile of human protein

  24. Nucleolus – Finding new proteins 60S ribosomal protein L3-like Probable ATP-dependent RNA helicase DDX4 ATP-dependent RNA helicase DDX3Y Guanine nucleotide binding protein-like 3 Importin-11 (importin beta family) Putative Brix domain containing protein 1P Probable ATP-dependent RNA helicase DDX20 (Gemin 3) 60S acidic ribosomal protein P0 Helicase SKI2W ATP-dependent RNA helicase DDX39 40S ribosomal protein S20 Probable ATP-dependent RNA helicase DDX6 Probable ATP-dependent RNA helicase DDX23 Double-stranded RNA-binding protein Staufen homolog 1 ATP-dependent RNA helicase DDX25 Probable nucleolar complex protein 14 Eukaryotic initiation factor 4A-II ATP-dependent RNA helicase DDX19B 40S ribosomal protein S3 Ribosomal protein DEAD-box DEAD-box Found in nucleolus Associated with nucleolar p. DEAD-box DEAD-box DEAD-box Found in nucleolus DEAD-box Ribosomal protein DEAD-box DEAD-box Indirect evidence DEAD-box Nucleolar DEAD-box DEAD-box Ribosomal protein

More Related