Enriching Bio-ontologies with Non-hierarchical Relations - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Enriching Bio-ontologies with Non-hierarchical Relations PowerPoint Presentation
Download Presentation
Enriching Bio-ontologies with Non-hierarchical Relations

play fullscreen
1 / 38
Enriching Bio-ontologies with Non-hierarchical Relations
121 Views
Download Presentation
riona
Download Presentation

Enriching Bio-ontologies with Non-hierarchical Relations

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Workshop on bio-ontologiesOctober 28, 2005 Enriching Bio-ontologieswith Non-hierarchical Relations Olivier Bodenreider Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

  2. Acknowledgments • Marc AubryUMR 6061 CNRS,Rennes, France • Anita BurgunUniversity of Rennes, France

  3. Gene Ontology • Annotate gene products • Coverage • Molecular functions • Cellular components • Biological processes • Explicit relations to other terms within the same hierarchy • No (explicit) relations • To terms across hierarchies • To concepts from other biological ontologies

  4. Gene Ontology Molecularfunctions Cellularcomponents Biologicalprocesses BP: metal ion transport MF: metal ion transporter activity

  5. Motivation • Richer ontology • Beyond hierarchies • Easier to maintain • Explicit dependence relations • More consistent annotations • Quality assurance • Assisted curation

  6. Related work • Ontologizing GO • GONG • Identifying relations among GO terms across hierarchies • Lexical approach • Non-lexical approaches • Identifying relations between GO terms and OBO terms • ChEBI • Representing relations among GO terms and between GO terms and OBO terms • Obol • See also: [Wroe & al., PSB 2003] [Ogren & al., PSB 2004-2005] [Bodenreider & al., PSB 2005] [Burgun & al., SMBM 2005] [Mungall, CFG 2005] [Bada & al., 2004], [Kumar & al., 2004], [Dolan & al., 2005]

  7. Non-lexical approaches to identifyingassociative relations in the Gene Ontology

  8. GO and annotation databases • 5 model organisms • FlyBase • GOA-Human • MGI • SGD • WormBase 270,000 gene-term associations

  9. Three non-lexical approaches All based on annotation databases Similarity in the vector space model Statistical analysis of co-occurring GO terms Association rule mining

  10. GO terms Genes Genes GO terms g1 g2 … gn t1 t2 … tn t1 g1 t2 g2 … … gn tn  Similarity in the vector space model Annotationdatabase

  11. Genes   Sim(ti,tj) = ti · tj GO terms t1 t2 … tn t1 GO terms t2 GO terms … tn tj g1 g2 … gn ti t1 t2 … tn  Similarity in the vector space model Similaritymatrix

  12. GO terms t2 t7 t9 Genes t3 t5 t7 t9 g5 g2 t1 t2 … tn g1 g2 … gn  Analysis of co-occurring GO terms … Annotationdatabase

  13. Co-oc. = 46 GO:0008009 immune response GO:0006955 chemokine activity  Analysis of co-occurring GO terms • Statistical analysis: test independence • Likelihood ratio test (G2) • Chi-square test (Pearson’s χ2) • Example from GOA (22,720 annotations) • C0006955 [BP] Freq. = 588 • C0008009 [MF] Freq. = 53 G2 = 298.7 p < 0.000

  14. GO terms t2 t7 t9 transaction Genes g2 t1 t2 … tn g1 g2 … gn  Association rule mining Annotationdatabase apriori • Rules: t1 => t2 • Confidence: > .9 • Support: .05

  15. Examples of associations • Vector space model • MF: ice binding • BP: response to freezing • Co-occurring terms • MF: chromatin binding • CC: nuclear chromatin • Association rule mining • MF: carboxypeptidase A activity • BP: peptolysis and peptidolysis

  16. Associations identified 7665 by at least one approach

  17. Associations identified VSM MF: ice binding BP: response to freezing

  18. Associations identified COC MF: chromatin binding CC: nuclear chromatin

  19. Associations identified ARM MF: carboxypeptidase A activity BP: peptolysis and peptidolysis

  20. Lexical vs. non-lexical Among non-lexical Limited overlap among approaches VSM 7665 5493 COC 3689 121 2587 201 305 309 230 453 ARM

  21. Linking the Gene Ontologyto other biological ontologies

  22. Related domains • Organisms • Cell types • Physical entities • Gross anatomy • Molecules • Functions • Organism functions • Cell functions • Pathology cytosolic ribosome (sensu Eukaryota) T-cell activation brain development transferrin receptor activity visual perception T-cell activation regulation of blood pressure

  23. GO and other domains [adapted from B. Smith]

  24. GO and other domains (revisited) [adapted from B. Smith]

  25. Biological ontologies (OBO)

  26. GO and other domains (revisited) [adapted from B. Smith]

  27. Integrating biological ontologies Organism Biologocalprocesses Other OBO ontologies Cellularcomponents Molecule Molecularfunctions Cell

  28. Linking GO to ChEBI [Burgun & al., SMBM 2005]

  29. ChEBI • Member of the OBO family • Ontology ofChemical Entities of Biological Interest • Atom • Molecule • Ion • Radical • 10,516 entities • 27,097 terms [Dec. 22, 2005]

  30. Methods • Every ChEBI term searched in every GO term • Maximize precision • Ignored ChEBI terms of 3 characters or less • Proper substring • Maximize recall • Case insensitive matches • Normalized ChEBI names(generated singular forms from plurals)

  31. Examples • iron [CHEBI:18248] • uronic acid [CHEBI:27252] • carbon [CHEBI:27594] BPiron ion transport[GO:0006826] MFiron superoxide dismutase activity [GO:0008382] CCvanadium-iron nitrogenase complex[GO:0016613] BPuronic acid metabolism[GO:0006063] MFuronic acid transporter activity[GO:0015133] BPresponse to carbon dioxide[GO:0010037] MFcarbon-carbon lyase activity[GO:0016830]

  32. 2,700 ChEBI entities (27%) identified in some GO term 9,431 GO terms (55%) include some ChEBI entity in their names Quantitative results 10,516 entities 17,250 terms 20,497 associations

  33. Leydic cell tumor Cell types MousePathology FMA cerebellar aplasia CC MF BP irondeposition cellmembraneviscosity enzymaticreaction CHEBI REX FIX Generalization

  34. Conclusions

  35. Conclusions (1) • Links across OBO ontologies need to be made explicit • Between GO terms across GO hierarchies • Between GO terms and OBO terms • Between terms across OBO ontologies • Automatic approaches • Effective (GO-GO, GO-ChEBI) • At least to bootstrap the process • Needs to be refined

  36. Conclusions (2) • Affordable relations • Computer-intensive, not labor-intensive • Methods must be combined • Cross-validation • Redundancy as a surrogate for reliability • Relations identified specifically by one approach • False positives • Specific strength of a particular method • Requires (some) manual curation • Biologists must be involved

  37. References • Bodenreider O, Aubry M, Burgun A. Non-lexical approaches to identifying associative relations in the Gene Ontology. In: Altman RB, Dunker AK, Hunter L, Jung TA, Klein TE, editors. Pacific Symposium on Biocomputing 2005: World Scientific; 2005. p. 91-102.http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf • Burgun A, Bodenreider O. An ontology of chemical entities helps identify dependence relations among Gene Ontology terms. Proceedings of the First International Symposium on Semantic Mining in Biomedicine (SMBM-2005)Electronic proceedings: CEUR-WS/Vol-148http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf

  38. MedicalOntologyResearch Contact: Web: olivier@nlm.nih.gov mor.nlm.nih.gov Olivier Bodenreider Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA