ISPIDER: Integrative Proteomics Grid Platform - Meeting the Challenges

ISPIDER – A Pilot Grid for Integrative Proteomics BEP-II grantholders meeting, Edinburgh 24th Nov 2004

Diversity of proteome data gels sequences >A01562 MAPKATYLIGAADKFHW >A01567 MAQQPKEMLNILADKFHWFLYC Other data: Species, PTMS, pathways, functional annotation, transcriptome data Structures/folds mass spec

Integration problems • Lack of specific middleware • Existing resources not wrapped • Lack of data standards • Standards for proteomics, incl. MS and protein identification are emerging • Data not modelled • New challenges from proteomics • Data not captured/modelled • Data not captured • No mature repositories/databases for some proteome data • But there is lots of data …

Aims • To develop an integrated platform of proteomic data resources enabled as Grid/Web services • Integrate existing proteome resources, enabling them as Grid/Web services. • To develop novel, proteome-specific databases as part of ISPIDER delivered as Grid/Web and browser-based services: • A repository for experimental proteome data • A proteome protein identification server and database • A phosphoproteome specific database • To develop middleware & support for distributed querying, workflows and other integrated data analysis tasks • Demonstrate effectiveness of the resulting infrastructure studies in proteomics, including: • Visualisation clients for proteomic data e.g. LRF data • Analyses for fungal species of industrial interest • Protein structural/functional trends in experimental proteomics e.g. linking domain structural patterns

RA2 RA6 RA3&4 RA2 RA1 2D Gel Visualisation Client + Phosph. Extensions + Aspergil. Extensions Proteome Request Handler Proteomic Ontologies/ Vocabularies Source Selection Services Instance Ident/Mapping Services Data Cleaning Services RA1-6 myGrid Ontology Services myGrid DQP myGrid Workflows RA3&4 AutoMed DAS RA1 WS WS WS WS WS WS WS WS PRIDE PEDRo GS PS PF TR FA PPI WS WS RA5 &6 Phos PID RA2 Integrated Proteomics Informatics Platform - Architecture ISPIDER Proteomics Clients WP3 Vanilla Query Client PPI Validation + Analysis Client Protein ID Client WP4 WP6 WP1 WP5 WP2 Web services ISPIDER Proteomics Grid Infrastructure Existing E-Science Infrastructure WP1 Public Proteomic Resources WP6 WP3 ISPIDER Resources Existing Resources KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package

Work packages • WP1 – A Skeleton Integrated Proteomics Grid • WP2 - Integration of gel-based data with structural and functional annotation • WP3 - Data mining tools for the phosphoproteome • WP4 - Structural and functional proteomics for the Aspergilli • WP5 - Integration of protein:protein interaction data with structural & functional annotations • WP6 - A protein identification server and database

Personnel WP1 WP2 WP4 WP6 RA1 Manchester: Khalid Belhajjame WP6 WP4 WP3 RA2 Manchester: Jennifer Siepen WP2 WP1 WP3 RA3 UCL: TBA WP1 WP2 WP5 RA4 Birkbeck: Lucas Zamboulis / Hao Fan RA5 WP1 WP2 WP3 WP4 WP5 WP6 EBI: Nishia Vinod RA6 WP1 WP2 WP3 WP4 WP5 WP6 EBI: TBA

Deliverables Primary RA Also involved RA6 RA2 RA5 • PRIDE db • Protein ID server • Phosphoproteome db • Extended isoform model • Integrated generic workflows/DQP/etc • “2D”-DAS clients • Grid wrapped BIOMAP • Integrated Protein-protein workflows RA2 RA6 RA2 RA5 RA6 RA6 RA3 RA4 RA1 RA3 RA1 RA4 RA4 RA3 RA1 RA6

Existing infrastructure and skills • myGRID • OGSA-DQP • AutoMed • PSI/Pedro infrastructure/standards • Protein id tools at Manchester • 3 primary data integration strategies • Workflows • DQP using OGSA-DAI • Heterogenous schema integration technologies

SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflow Components Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available

Used in Grave’s Disease Uses OGSA-DAI data access services to access individual data resources. A single query to access and join data from more than one OGSA-DAI wrapped data resource. Supports orchestration of computational as well as data access services. Interactive interface for integrating resources and executing requests. Implicit, pipelined and partitioned parallelism and optimisation OGSA-DQP http://www.ogsa-dai.org.uk/dqp

AutoMed infrastructure • Bidirectional mappings between schemas • Available in global and local views • Transformations between schemas

Potential clients and outputs • A Vanilla client • Markup with: • Identified peptides • Across different tissues • Different species • PTMs • etc

2D gel visualisation client Potential annotations Comparative proteomics Real vs virtual Add/subtract PTMs Display pathways Functional annotation PPIs Folds

in silico Proteome Integrated Data Resource Environment Summary • Alex Poulovassilis • Nigel Martin • Lucas Zamboulis • Hao Fan • Simon Hubbard • Suzanne Embury • Steve Oliver • Norman Paton • Carole Goble • Robert Stevens • Jennifer Siepen • Khalid Bellhajjame • Rolf Apweiler • Weimin Zhu • Henning Hermjakob • Chris Taylor • Nishia Vinod • TBA • David Jones • Christine Orengo • TBA

ISPIDER: Integrative Proteomics Grid Platform - Meeting the Challenges

ISPIDER: Integrative Proteomics Grid Platform - Meeting the Challenges

Presentation Transcript

An Overview Grid Computing and Applications

Advancing from CAM to Integrative Pediatrics – Research Barriers and Opportunities

Simulation for Grid Computing

Grid Computing

From Scientific Inquiry to Global Applications- Development of a Two-course Sequence in Integrative Sciences

What is a Computer Grid?

Proteogenomics

Ch 5 Rate, Ratio and Percent

Grid Computing and LA Grid

Quantitative Proteomics: Applications and Strategies

Protein Identification by Sequence Database Search

The GENIUS Grid Portal : some success stories!

PROGETTI E ARCHITETTURE GRID Progetti Grid INFN-NA e UniNA

Private Pilot Ground School

Structural proteomics lecture 4: Biophysical dissection of protein complexes

醫學新知導論 Mass Spectrometry in Biotechnology/Proteomics

PILOT NAVIGATION

Integrative Medicine Complementary and Alternative Medicine

Cluster / Grid with Web and Semantic Services

POWER GRID CORPORATION OF INDIA LTD.

Integrative Functions of the Nervous System