Web services for n glycosylation process l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Web Services for N-Glycosylation Process PowerPoint PPT Presentation


  • 170 Views
  • Uploaded on
  • Presentation posted in: General

Web Services for N-Glycosylation Process . Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller . Presentation at International Symposium on Web Services For Computational Biology and Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005 .

Download Presentation

Web Services for N-Glycosylation Process

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Web services for n glycosylation process l.jpg

Web Services for N-Glycosylation Process

Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller

Presentation at

International Symposium on Web Services For Computational Biology and Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005

Integrated Technology Resource for Biomedical Glycomics

NCRR/NIH


Glycomics l.jpg

Glycomics

  • Study of structure, function and quantity of ‘complex carbohydrate’ synthesized by an organism

  • Carbohydrates added to basic protein structure - Glycosylation

Folded protein structure (schematic)


Slide3 l.jpg

Glycosylation – why is it important?

  • Genome (comprised of DNA) or Proteome (proteins) are not the only factors in life functions of an organism

  • Carbohydrates attached to different protein structures (by glycosylation) are important for:

    • Identification of foreign entities by immune system cells

    • Markers to accurately diagnose diseases

    • Regulate signaling activities

  • Categorization of glycosylation - the way carbohydrates are attached to proteins. Example: N-glycosylation


N glycosylation process ngp l.jpg

N-GlycosylationProcess (NGP)

Cell Culture

By N-glycosylation Process, we mean the identification and quantification of glycopeptides

extract

Glycoprotein Fraction

proteolysis

Glycopeptides Fraction

1

Separation technique I

n

Glycopeptides Fraction

PNGase

n

Peptide Fraction

Separation technique II

n*m

Peptide Fraction

Mass spectrometry

ms data

ms/ms data

Data reduction

Data reduction

ms peaklist

ms/ms peaklist

binning

Peptide identification

Glycopeptide identification

and quantification

N-dimensional array

Peptide list

Data correlation

Signal integration


Slide5 l.jpg

NGP – part of the Bioinformatics coreIntegrated Technology Resource for Biomedical Glycomics

  • This Resource was established by the National Center for Research Resources

  • The aim is to develop the tools and technology to analyze glycoprotein and glycolipid expression of embryonic stem cells

  • Our research provides bioinformatics support for four research groups:

    • Embryonic Stem Cell Culture Program

    • Glycomic Analysis of Glycoproteins

    • Glycomic Analyses of Glycosphingolipids and Sphingolipids

    • Transcript analysis by kinetic RT-PCR


Slide6 l.jpg

NGP – need in Glycomics

  • Unlike proteomics or genomics, high-throughput experimental protocols are still being established in Glycomics

  • NGP involves a multitude of heterogeneous tasks, including human-mediated tasks

  • NGP attempts to encapsulate particular computational steps as platform-independent, scalable and Web-accessible tools – Web Services

  • Enables glycobiologists to integrate automated data generation tasks with data processing tools (Web Services) end-to-end experimental lifecycle


Slide7 l.jpg

N-Glycosylation identification - Problems

  • Extremely difficult to identify glycosylated peptide sequences using standard analytical methods

  • N-glycosylation occurs at particular sites on the protein structure – consensus sequences

Asparagine

Aspartate

Consensus Sequence

Peptide

N

D

J

X

S/T

PNGaseF

Glycan

An example glycopeptide (schematic)


Ngp implementation l.jpg

NGP - implementation

  • NGP,currently,implements a Web Process constituted of two Web Services:

    • DB Modifier Web Service– modifies the search database by replacing N (in consensus sequences) by J

    • Collator Web Service – identifies a probable N-glycosylated peptide, using three parameters:

      • Calculated molecular mass

      • Presence of ‘J’ in a peptide sequence

      • MASCOT* Score assigned to a hit

  • NGP also involves propriety Mass Spectrometer search engine service (MASCOT*) as an intermediate task

  • Hence, NGP Web Process identifies probable glycosylated peptides – enabling rapid processing of data from high throughput experiment

*http://www.matrixscience.com/


Ngp architecture current l.jpg

NGP – Architecture (current)

PEAK

LIST FILE

ms/ms raw data

Primary

Sequence

Database

ModifyDB

Web Service

MASCOT* Mass Spectrometer Search Engine

Collator

Web Service

MASCOT* output file (contains both glycosylated and non-glycosylated peptide sequences)

Deglycosylated peptide list

*http://www.matrixscience.com/


Ngp results l.jpg

NGP Results

q1_p1=-1

q2_p1=0,626.349945,-0.023321,2,APGVAGR,18,000000000,1.49,00020000000000000,0,0;"gi|51465537":0:190:196:1

q2_p2=1,626.361191,-0.034567,2,APARGR,18,00000000,1.33,00020000000000000,0,0;"gi|10140845":0:2:7:2

q2_p3=0,626.349945,-0.023321,2,APAVGGR,18,000000000,1.33,00020000000000000,0,0;"gi|51470766":0:212:218:1,"gi|51470768":0:212:218:1

q3_p3=0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,00010020000000000,0,0;"gi|47078238":0:364:368:2,"gi|47078240":0:328:332:2

q3_p4=0,634.351227,0.023897,4,MPLFK,12,0000000,25.24,00010020000000000,0,0;"gi|41197108":0:95:99:1,"gi|4557311":0:1:5:2

q3_p5=0,634.343811,0.031313,3,NNLFK,12,0000000,15.34,00010020000000000,0,0;"gi|31377725":0:539:543:1

q3_p6=0,634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010020000000000,0,0;"gi|39725634":0:891:895:1

q3_p7=0,634.343811,0.031313,3,NNIFK,12,0000000,15.34,00010020000000000,0,0;"gi|7661646":0:212:216:1

q3_p8=0,634.368973,0.006151,3,LDLFK,12,0000000,15.34,00010020000000000,0,0;"gi|51474898":0:237:241:1

q3_p9=0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,00010020000000000,0,0;"gi|28376662":0:67:71:1

q3_p10=0,634.368958,0.006166,3,VELFK,12,0000000,13.61,00010020000000000,0,0;"gi|51467300":0:493:497:1,"gi|51467535":0:99:103:1

q4_p1=-1

q5_p1=0,662.375122,0.004702,5,DLLFR,14,0000000,18.41,00020020000000000,0,0;"gi|21536369":0:84:88:1,"gi|21536367":0:17:21:1,"gi|4557871":0:647:651:1

q5_p2=0,662.375122,0.004702,3,DLFLR,14,0000000,12.81,00010020000000000,0,0;"gi|33695153":0:407:411:1,"gi|4504043":0:330:334:1,"gi|11968045":0:6:10:1

q5_p3=0,662.375122,0.004702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0;"gi|4505725":0:924:928:1,"gi|29788751":0:1170:1174:1

q5_p4=0,662.349960,0.029864,3,NNFIR,14,0000000,11.84,00010020000000000,0,0;"gi|24416002":0:667:671:1

q5_p5=0,662.375122,0.004702,4,IDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|12957488":0:602:606:1,"gi|41148707":0:536:540:1,"gi|51464463":0:646:650:1

q5_p6=0,662.375122,0.004702,4,LDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|42657517":0:335:339:1

q5_p7=0,662.375107,0.004717,4,VELFR,14,0000000,9.98,00020020000000000,0,0;"gi|6912230":0:436:440:1

q5_p8=0,662.375122,0.004702,4,LDIFR,14,0000000,9.98,00020020000000000,0,0;"gi|8922081":0:2699:2703:1

q5_p9=0,662.349960,0.029864,4,NLNFR,64,0000000,5.89,00010020000000000,0,0;"gi|19923416":0:816:820:1

q5_p10=1,662.361191,0.018633,2,NRFAR,14,0000000,3.37,00010020000000000,0,0;"gi|4758704":0:97:101:1

q6_p1=0,674.359863,-0.006639,4,VSDNIK,35,00000000,11.27,00010020000000000,0,0;"gi|32130516":0:935:940:1

q6_p2=0,674.323456,0.029768,5,EGDLGGK,21,000000000,7.97,00020020000000000,0,0;"gi|13569928":0:1058:1064:1

q6_p3=0,674.359848,-0.006624,5,EATVAGK,21,000000000,7.88,00020020000000000,0,0;"gi|51475822":0:527:533:1

q6_p4=1,674.389740,-0.036516,3,QRMLK,14,0000000,7.46,00020010000000000,0,0;"gi|24307905":0:467:471:2,"gi|24307905":0:638:642:2

q6_p5=0,674.359863,-0.006639,5,LSSSPGK,56,000000000,7.38,00000020000000000,0,0;"gi|8922075":0:806:812:1

q6_p6=0,674.338730,0.014494,4,WDLGGK,42,00000000,6.40,00010020000000000,0,0;"gi|13375817":0:123:128:1

q6_p7=0,674.359879,-0.006655,4,QATDLK,56,00000000,6.21,00020010000000000,0,0;"gi|21361684":0:451:456:1

q6_p8=1,674.371094,-0.017870,3,QTNKGK,14,00000000,6.03,00020010000000000,0,0;"gi|41117716":0:85:90:1

q6_p9=1,674.389740,-0.036516,6,QMRIK,28,0000000,5.77,00020020000000000,0,0;"gi|28329439":0:269:273:1,"gi|28558993":0:278:282:1

q6_p10=1,674.389740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000000000,0,0;"gi|40255096":0:300:304:1

q7_p1=0,695.348969,0.007855,4,YDASLK,14,00000000,8.86,00020020000000000,0,0;"gi|4758454":0:2761:2766:1

  • A typical MASCOT output file is about 3MB!

  • High-throughput experiment protocol generate thousands of such files - manual identification is not feasible


Ngp web services adding semantics l.jpg

NGP Web Services – Adding Semantics

  • Two Ontologies developed as part of the NCRR-Glycomics project:

    • GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans

      • Contains 770 classes – describe structural features of glycans

      • URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco

    • ProPreO: a comprehensive process Ontology modeling experimental proteomics

      • Contains 296 classes

      • Models three phases of experimental proteomics* – Separation techniques, Analytical techniques and, Data analysis

      • URL: http://lsdis.cs.uga.edu/projects/glycomics/propreo

*http://pedro.man.ac.uk/uml.html (PEDRO UML schema)


Propreo experimental proteomics process ontology l.jpg

ProPreO - Experimental Proteomics Process Ontology

  • ProPreO models the phases of proteomics experiment using five fundamental concepts:

    • Data: (Example: a peaklist file from ms/ms raw data)

    • Data_processing_applications: (Example: MASCOT* search engine)

    • Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF)

    • Parameter_list: describes the different types of parameter lists associated with experimental phases

    • Task: (Example: component separation, used in chromatography)

*http://www.matrixscience.com/


Service description using wsdl s l.jpg

Service description using WSDL-S

<?xml version="1.0" encoding="UTF-8"?>

<wsdl:definitions targetNamespace="urn:ngp"

…..

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<wsdl:types>

<schema targetNamespace="urn:ngp“

xmlns="http://www.w3.org/2001/XMLSchema">

…..

</complexType>

</schema>

</wsdl:types>

<wsdl:message name="replaceCharacterRequest">

<wsdl:part name="in0" type="soapenc:string"/>

<wsdl:part name="in1" type="soapenc:string"/>

<wsdl:part name="in2" type="soapenc:string"/>

</wsdl:message>

<wsdl:message name="replaceCharacterResponse">

<wsdl:part name="replaceCharacterReturn" type="soapenc:string"/>

</wsdl:message>

<?xml version="1.0" encoding="UTF-8"?>

<wsdl:definitions targetNamespace="urn:ngp"

……

xmlns:

wssem="http://www.ibm.com/xmlns/WebServices/WSSemantics"

xmlns:

ProPreO="http://lsdis.cs.uga.edu/ontologies/ProPreO.owl" >

<wsdl:types>

<schema targetNamespace="urn:ngp"

xmlns="http://www.w3.org/2001/XMLSchema">

……

</complexType>

</schema>

</wsdl:types>

<wsdl:message name="replaceCharacterRequest"

wssem:modelReference="ProPreO#peptide_sequence">

<wsdl:part name="in0" type="soapenc:string"/>

<wsdl:part name="in1" type="soapenc:string"/>

<wsdl:part name="in2" type="soapenc:string"/>

</wsdl:message>

Description of a

Web Service using:

Web

Service

Description

Language

  • Formalize description and classification of Web Services using ProPreO concepts

data

sequence

peptide_sequence

Concepts defined in

process Ontology

ProPreO

process Ontology

WSDL ModifyDB

WSDL-S ModifyDB


Biological uddi buddi ws registry for proteomics and glycomics l.jpg

Biological UDDI (BUDDI) WS Registry for Proteomics and Glycomics

  • There are no current registries that use semantic classification of Web Services in glycoproteomics

  • BUDDI classification based on proteomics and glycomics classification – part of integrated glycoproteomics Web Portal called Stargate

  • NGP to be published in BUDDI

  • Can enable other systems such as myGrid to use NGP Web Services to build a glycomics workbench


Conclusions l.jpg

Conclusions

  • As part of NCRR Integrated Technology Resource for Biomedical Glycomics, we implemented a Semantic Web Process for high throughput glycomics in open, web-centric environment

  • Large domain specific ontologies with process (ProPreO) and domain (GlycO) knowledge concepts was used to describe and classify Web Services – at Semantic level

  • Used proposed Semantic Web Service specification (WSDL-S) to add semantics to Web Service description

  • Biological UDDI (BUDDI) – part of Stargate is being developed as a single-window resource to discover and publish Web Services in glycoproteomics domain


Resources l.jpg

Resources

  • NCRR (Integrated Technology Resource for Biomedical Glycomics): http://cell.ccrc.uga.edu/world/glycomics/glycomics.php

  • Bioinformatics core of Glycomics project: http://lsdis.cs.uga.edu/projects/glycomics/

  • ProPreO process Ontology: http://lsdis.cs.uga.edu/projects/glycomics/propreo/

  • GlycO domain Ontology:

    http://lsdis.cs.uga.edu/projects/glycomics/glyco/

  • Stargate – GlycoProteomics Web Portal:

    http://128.192.9.86/stargate

  • WSDL-S: joint UGA-IBM technical note

    http://lsdis.cs.uga.edu/library/download/WSDL-S-V1.pdf


Acknowledgement l.jpg

Acknowledgement

Special Thanks:

James Atwood (CCRC, UGA)

Meenakshi Nagarajan (LSDIS Lab, UGA)

Blake Hunter (LSDIS Lab, UGA)


Slide18 l.jpg

Extra Slides: Stargate subsystems – a bit of detail

  • BUDDI – BioUDDI is envisioned as the ‘yellow pages’ for all WS in life sciences

    • The classification of WS uses biological taxonomy

    • Open resource for the worldwide community of life sciences research

  • Format Converter – Enables conversion of two available representation formats into a xml-based representation

    • IUPAC to LINUCS to GLYDE (a xml-based representation)

  • Web Service Generator – Enables existing java application to be exposed as Web Services

    • Generates required files from a java application to allow deployment as a Web Service

    • Enable the newly generated Web Service to be published on BioUDDI


Slide19 l.jpg

Extra Slides: Stargate subsystems – a bit of detail

  • Group Forum – Members of the research group use it to foster a sense of community

    • Schedule meetings, discuss issues, collaborate on papers…

    • Post papers for peer reviews, publications on relevant topic

  • Stargate Search – is an integrated unit of the Stargate

    • Enables search for research publication within the research group

    • Enables search on the internet

  • Login – Allows restrictions on accessibility of selected parts of Stargate


Slide20 l.jpg

Extra Slides: The take home message…

Forum

Internet

Search

Web Service Generator

BUDDI


  • Login