150 likes | 221 Views
Explore the development of an integrated platform for proteomic data resources enabled as Grid/Web services, including middleware support for distributed querying and workflows. The goal is to create novel databases specific to proteomics, increase data capture and modeling, and demonstrate the infrastructure's effectiveness in proteomic studies.
E N D
ISPIDER – A Pilot Grid for Integrative Proteomics BEP-II grantholders meeting, Edinburgh 24th Nov 2004
Diversity of proteome data gels sequences >A01562 MAPKATYLIGAADKFHW >A01567 MAQQPKEMLNILADKFHWFLYC Other data: Species, PTMS, pathways, functional annotation, transcriptome data Structures/folds mass spec
Integration problems • Lack of specific middleware • Existing resources not wrapped • Lack of data standards • Standards for proteomics, incl. MS and protein identification are emerging • Data not modelled • New challenges from proteomics • Data not captured/modelled • Data not captured • No mature repositories/databases for some proteome data • But there is lots of data …
Aims • To develop an integrated platform of proteomic data resources enabled as Grid/Web services • Integrate existing proteome resources, enabling them as Grid/Web services. • To develop novel, proteome-specific databases as part of ISPIDER delivered as Grid/Web and browser-based services: • A repository for experimental proteome data • A proteome protein identification server and database • A phosphoproteome specific database • To develop middleware & support for distributed querying, workflows and other integrated data analysis tasks • Demonstrate effectiveness of the resulting infrastructure studies in proteomics, including: • Visualisation clients for proteomic data e.g. LRF data • Analyses for fungal species of industrial interest • Protein structural/functional trends in experimental proteomics e.g. linking domain structural patterns
RA2 RA6 RA3&4 RA2 RA1 2D Gel Visualisation Client + Phosph. Extensions + Aspergil. Extensions Proteome Request Handler Proteomic Ontologies/ Vocabularies Source Selection Services Instance Ident/Mapping Services Data Cleaning Services RA1-6 myGrid Ontology Services myGrid DQP myGrid Workflows RA3&4 AutoMed DAS RA1 WS WS WS WS WS WS WS WS PRIDE PEDRo GS PS PF TR FA PPI WS WS RA5 &6 Phos PID RA2 Integrated Proteomics Informatics Platform - Architecture ISPIDER Proteomics Clients WP3 Vanilla Query Client PPI Validation + Analysis Client Protein ID Client WP4 WP6 WP1 WP5 WP2 Web services ISPIDER Proteomics Grid Infrastructure Existing E-Science Infrastructure WP1 Public Proteomic Resources WP6 WP3 ISPIDER Resources Existing Resources KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package
Work packages • WP1 – A Skeleton Integrated Proteomics Grid • WP2 - Integration of gel-based data with structural and functional annotation • WP3 - Data mining tools for the phosphoproteome • WP4 - Structural and functional proteomics for the Aspergilli • WP5 - Integration of protein:protein interaction data with structural & functional annotations • WP6 - A protein identification server and database
Personnel WP1 WP2 WP4 WP6 RA1 Manchester: Khalid Belhajjame WP6 WP4 WP3 RA2 Manchester: Jennifer Siepen WP2 WP1 WP3 RA3 UCL: TBA WP1 WP2 WP5 RA4 Birkbeck: Lucas Zamboulis / Hao Fan RA5 WP1 WP2 WP3 WP4 WP5 WP6 EBI: Nishia Vinod RA6 WP1 WP2 WP3 WP4 WP5 WP6 EBI: TBA
Deliverables Primary RA Also involved RA6 RA2 RA5 • PRIDE db • Protein ID server • Phosphoproteome db • Extended isoform model • Integrated generic workflows/DQP/etc • “2D”-DAS clients • Grid wrapped BIOMAP • Integrated Protein-protein workflows RA2 RA6 RA2 RA5 RA6 RA6 RA3 RA4 RA1 RA3 RA1 RA4 RA4 RA3 RA1 RA6
Existing infrastructure and skills • myGRID • OGSA-DQP • AutoMed • PSI/Pedro infrastructure/standards • Protein id tools at Manchester • 3 primary data integration strategies • Workflows • DQP using OGSA-DAI • Heterogenous schema integration technologies
SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflow Components Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available
Used in Grave’s Disease Uses OGSA-DAI data access services to access individual data resources. A single query to access and join data from more than one OGSA-DAI wrapped data resource. Supports orchestration of computational as well as data access services. Interactive interface for integrating resources and executing requests. Implicit, pipelined and partitioned parallelism and optimisation OGSA-DQP http://www.ogsa-dai.org.uk/dqp
AutoMed infrastructure • Bidirectional mappings between schemas • Available in global and local views • Transformations between schemas
Potential clients and outputs • A Vanilla client • Markup with: • Identified peptides • Across different tissues • Different species • PTMs • etc
2D gel visualisation client Potential annotations Comparative proteomics Real vs virtual Add/subtract PTMs Display pathways Functional annotation PPIs Folds
in silico Proteome Integrated Data Resource Environment Summary • Alex Poulovassilis • Nigel Martin • Lucas Zamboulis • Hao Fan • Simon Hubbard • Suzanne Embury • Steve Oliver • Norman Paton • Carole Goble • Robert Stevens • Jennifer Siepen • Khalid Bellhajjame • Rolf Apweiler • Weimin Zhu • Henning Hermjakob • Chris Taylor • Nishia Vinod • TBA • David Jones • Christine Orengo • TBA