EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3
Introduction EMBRACE is a EU-sponsored Network of Excellence aimed at enabling bioinformatics research through better operability of databases, servers, and services.
Example You want to predict phosphorylation sites just outside transmembrane helices in 1329 membrane proteins. Yesterday: 1) Obtain software to predict transmembrane helices; 2) Obtain software to predict phosphorylation sites; 3) Install both programs; 4) Write software that calls both programs; 5) Write software that combines outputs and presents results. Tomorrow: 1) Import APIs for the two services; 2) Write software that combines outputs and presents results.
The Goal Of EMBRACE Embrace aims at building a « knowledge grid » allowing integrated exploitation of data: • collection, curation and provision of biomolecular information • Availability of most of the popular databases and software products • tools and programming interfaces to exploit that information • taking away the need for maintaining local copies of databases and software
Data EMBRACE includes nearly all European bioinformaticians with longstanding track-records in terms of providing databases, servers, and services. Data types that they will make available: DNA sequences, protein sequences, macromolecular structures, SNPs, expression information, alignments, untranslated regions, structure domains, protein families, literature, electron micrographs, orthologs, ORFs, genome annotation, proteomics patterns, GPCRs, protein interactions, nucleotid
Software EMBRACE includes nearly all large European bioinformatics centers that all will make their servers, services, and computational tools available using the EMBRACE-GRID. Computational facilities that all European bioinformaticians will get at their finger tips include: DNA sequence analysis, genome annotation, homology searches at sequence and structure level, structure analysis, visualization, protein sequence analysis, phylogeny, protein domain mapping, pattern matching, HMM, neural nets, micro-arrays, workflow management, text-mining, systems biology, database techno
Education The EMBRACE portal (http://www.embracegrid.info/) lists the courses that EMBRACE has presented and will present: July 2005 France Grid technology October 2005 England Data modeling and integration February 2006 England Portal tools October 2006 Finland Tools for grid usage February 2007 Denmark Bioinformatics of immunology April 2007 Sweden Regulatory sequence motifs (10 more courses not listed) July 2009 Spain Databases and gene annotation
The EMBRACE Challenge • Applied bioinformatics need various computer resources • The amount and size of databases and tools are growing rapidly • Systems Biology is predicted to become more important • A lot of existing tools and data sources to integrate
TechnologyRecommendation • Use Web Services, especially WS-I profile • Use of XML-schema to describe DataTypes • Give standard definition to DataTypes • Use Standardized Databases Interfaces (make workflows with the EMBRACE services)
Web Servicesadvantages • Replace local resources with remote resources • Web Services provide a standardized access method • Web Services are widely adopted in the BioInformatics community • They are evolving constantly with new specifications
The EMBRACE VO on EGEE • Infrastructure to deploy cpu-intensive and data-intensive applications • Testbed to validate the technology recommendation • 400 CPUs and 3 TB of Data Storage
An Example of Application: PDB Database Refinement • Recomputation of 19000 protein structures in 3 steps, using the WISDOM Environment. • Deployment on EGEE (Spring 2007): • 673 CPUs used • 70000 jobs submitted • 17 CPU years • 500 GB of data produced
Application example: WISDOM • Type of computations: docking with proteins and ligands databases. • Web service interface to submit jobs. • Users can use the interface to send docking jobs without specific knowledge of the grid, and embed dockings into their workflows
Application example: Automatic update of databases • Service that automatically replicate and update biological databases (file databases) • Web service interface to deploy new databases or retrieve status of a deployment. • Service can be used also in workflow to make an update before experiment • Hide the datamanagement for non grid expert users
Contacts EMBRACE is coordinated by Graham Cameron and Kerstin Nyberg at the EBI. Peter Rice coordinates the content integration Alan Bleasby coordinates the tools integration Vincent Breton coordinates technology recommendation Erik Bongcam Rudloff coordinates the test cases Gert Vriend coordinates outreach and education
Acknowledgements The EMBRACE project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG-CT-2004-512092.