1 / 29

Applying Semantic Technologies to the Glycoproteomics Domain

Applying Semantic Technologies to the Glycoproteomics Domain. W. S York May 15, 2006. Some Goals of Glycoproteomics . How do changes in the expression levels of specific genes alter the expression of specific glycans on the cell surface?

trapper
Download Presentation

Applying Semantic Technologies to the Glycoproteomics Domain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Semantic Technologies to the Glycoproteomics Domain W. S YorkMay 15, 2006

  2. Some Goals of Glycoproteomics • How do changes in the expression levels of specific genes alter the expression of specific glycans on the cell surface? • Are changes in the expression of specific glycans at the cell surface related to cell function, cell development, and disease? • What are the mechanisms by which specific glycans at the cell surface affect cell function, cell development, and the progression of disease?

  3. Challenges of Glycoproteomics • Vast amounts of data collected by high-throughput experiments - better methods for data archival, retrieval, and analysis are needed • Complex structures of glycans and glycoproteins – better methods for representing branched structures and finding structural and functional homologies are needed • Complex Biology and Biochemistry – better methods to find relationships between the glycoproteome and biological processes are needed

  4. Glycoproteomics Solutions • Brute-force analysis of flat data files • Too much data • Data is heterogeneous • What does the data represent? • Relational databases • Data is well organized • Data organization is relatively rigid • What does the data represent? • Semantic Technologies • Data is well organized • Data organization is flexible • Concepts represented by data are accessible • Relationships between concepts are accessible

  5. What is Semantic Technology? The implication is that enabling computers to “understand” the meanings of and relationships between concepts will allow them to reason and communicate in a way that is analogous to the way humans do. Semantics:1. (Linguistics) The study or science of meaning in language.2. (Linguistics) The study of relationships between signs and symbols and what they represent. The American Heritage® Dictionary of the English Language, Fourth Edition Semantic Technology: The use of formal representations of concepts and their relationships to enable efficient, intelligent software. Ontology (Computer Science): A model that represents a domain and is used to reason about the objects in that domain and the relations between them. http://en.wikipedia.org/wiki/Ontology_(computer_science)

  6. A Simple Ontology Organism is_a is_a Animal Plant is_a is_a is_a is_a is_a Lion Cow Deer Hosta Alfalfa is_a is_a is_a is_a is_a Elsa Elsie Bambi My Hosta Peter’s Alfalfa ate is_a ate ate ate Simba

  7. A Simple Ontology Organism is_a is_a Animal Plant eats is_a eats is_a Carnivore Herbivore is_a is_a is_a is_a is_a Lion Cow Deer Hosta Alfalfa is_a is_a is_a is_a is_a Elsa Elsie Bambi My Hosta Peter’s Alfalfa ate is_a ate ate ate Simba

  8. is_a molecule molecular fragment is_a carbohydrate moiety is_a monoglycosyl moiety residue glycan moiety is_a N-glycan is_a amino acid residue O-glycan carbohydrate residue The Structure of GlycO – Concept Taxonomy chemical entity

  9. residue glycan moiety is_a N-glycan is_a amino acid residue O-glycan carbohydrate residue The Structure of GlycO – Concept Taxonomy

  10. The Structure of GlycO – Concept Taxonomy – Instances and Properties has_residue N-glycan_00020 is_linked_to residue is_instance_of glycan moiety N-glycan a-D-Manp 4 N-glycan core b-D-Manp is_a N-glycan is_a amino acid residue is_instance_of is_instance_of O-glycan carbohydrate residue

  11. The GlycO Ontology in Protégé 3 Top-Level Classes are Defined in GlycO

  12. The GlycO Ontology in Protégé Semantics Include Chemical Context This Class Inherits from 2 Parents

  13. The GlycO Ontology in Protégé The -D-Manp residues in N-glycans are found in 8 different chemical environments

  14. b-D-GlcpNAc -(1-6)+ b-D-GlcpNAc -(1-2)- b-D-GlcpNAc -(1-2)+ b-D-GlcpNAc -(1-4)- a-D-Manp -(1-6)+ b-D-Manp -(1-4)- b-D-GlcpNAc -(1-4)- b-D-GlcpNAc a-D-Manp -(1-3)+ GlycoTree – A Canonical Representation of N-Glycans We give a residue in this position the same name, regardless of the specificstructure it resides in Semantics! N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15: 235-251

  15. The GlycO Ontology in Protégé Bisecting -D-GlcpNAc

  16. The GlycO Ontology in Protégé

  17. The GlycO Ontology in Protégé 1,3-linked -L-Fucp

  18. The GlycO Ontology in Protégé

  19. Ontology Population Workflow

  20. Ontology Population Workflow [][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}}

  21. Ontology Population Workflow <Glycan> <aglycon name="Asn"/> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="Man" > <residue link="3" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" > <residue link="2" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" > </residue> <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" > </residue> </residue> <residue link="6" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" > <residue link="2" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc"> </residue> </residue> </residue> </residue> </residue> </Glycan>

  22. The ProPreO Ontology in Protégé 3 Top-Level Classes are Defined in ProPreO

  23. The ProPreO Ontology in Protégé This Class Inheritsfrom 2 Parents

  24. The ProPreO Ontology in Protégé This Class Inheritsfrom 2 Parents

  25. Semantic Annotation of MS Data parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data

  26. Semantically Annotated MS Data <ms/ms_peak_list> <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode = “ms/ms”/> <parent_ion m/z = 830.9570 abundance=194.9604 z=2/> <fragment_ion m/z = 580.2985 abundance = 0.3592/> < fragment_ion m/z = 688.3214 abundance = 0.2526/> < fragment_ion m/z = 779.4759 abundance = 38.4939/> < fragment_ion m/z = 784.3607 abundance = 21.7736/> < fragment_ion m/z = 1543.7476 abundance = 1.3822/> < fragment_ion m/z = 1544.7595 abundance = 2.9977/> < fragment_ion m/z = 1562.8113 abundance = 37.4790/> < fragment_ion m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> OntologicalConcepts

  27. Web Services Based Workflow for Proteomics1 Biological Sample Analysis by MS/MS Agent Raw Data to Standard Format Agent Data Pre- process2 Agent DB Search (Mascot/Sequest) Agent Results Post-process (ProValt3) O I O I O I O I O Storage Raw Data Standard Format Data Filtered Data Search Results Final Output Biological Information 1 Design and Implementation of Web Services based Workflow for proteomics. Journal of Proteome Research. Submitted 2 Computational tools for increasing confidence in protein identifications. Association of Biomolecular Resource Facilities Annual Meeting, Portland, OR, 2004. 3 A Heuristic method for assigning a false-discovery rate for protein identifications from Mascot database search results. Mol. Cell. Proteomics. 4(6), 762-772.

  28. An Integrated Semantic Information System • Formalized domain knowledge is in ontologies • The schema defines the concepts • Instances represent individual objects • Relationships provide expressiveness • Data is annotated using concepts from the ontologies • The semantic annotations facilitate the identification and extraction of relevant information • The semantic relationships allow knowledge that is implicit in the data to be discovered

  29. Satya Sahoo Christopher Thomas Cory Henson Ravi Pavagada Amit Sheth Krzysztof Kochut John Miller James Atwood Lin Lin Alison Nairn Gerardo Alvarez-Manilla Saeed Roushanzamir Michael Pierce Ron Orlando Kelley Moremen Parastoo Azadi Alfred Merrill

More Related