1 / 18


EMBRACE. An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3. Introduction. EMBRACE is a EU-sponsored Network of Excellence aimed at enabling bioinformatics research through better operability of databases, servers, and services. Example.

Download Presentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3

  2. Introduction EMBRACE is a EU-sponsored Network of Excellence aimed at enabling bioinformatics research through better operability of databases, servers, and services.

  3. Example You want to predict phosphorylation sites just outside transmembrane helices in 1329 membrane proteins. Yesterday: 1) Obtain software to predict transmembrane helices; 2) Obtain software to predict phosphorylation sites; 3) Install both programs; 4) Write software that calls both programs; 5) Write software that combines outputs and presents results. Tomorrow: 1) Import APIs for the two services; 2) Write software that combines outputs and presents results.

  4. The Goal Of EMBRACE Embrace aims at building a « knowledge grid » allowing integrated exploitation of data: • collection, curation and provision of biomolecular information • Availability of most of the popular databases and software products • tools and programming interfaces to exploit that information • taking away the need for maintaining local copies of databases and software

  5. Data EMBRACE includes nearly all European bioinformaticians with longstanding track-records in terms of providing databases, servers, and services. Data types that they will make available: DNA sequences, protein sequences, macromolecular structures, SNPs, expression information, alignments, untranslated regions, structure domains, protein families, literature, electron micrographs, orthologs, ORFs, genome annotation, proteomics patterns, GPCRs, protein interactions, nucleotid

  6. Software EMBRACE includes nearly all large European bioinformatics centers that all will make their servers, services, and computational tools available using the EMBRACE-GRID. Computational facilities that all European bioinformaticians will get at their finger tips include: DNA sequence analysis, genome annotation, homology searches at sequence and structure level, structure analysis, visualization, protein sequence analysis, phylogeny, protein domain mapping, pattern matching, HMM, neural nets, micro-arrays, workflow management, text-mining, systems biology, database techno

  7. Education The EMBRACE portal (http://www.embracegrid.info/) lists the courses that EMBRACE has presented and will present: July 2005 France Grid technology October 2005 England Data modeling and integration February 2006 England Portal tools October 2006 Finland Tools for grid usage February 2007 Denmark Bioinformatics of immunology April 2007 Sweden Regulatory sequence motifs (10 more courses not listed) July 2009 Spain Databases and gene annotation

  8. The EMBRACE Challenge • Applied bioinformatics need various computer resources • The amount and size of databases and tools are growing rapidly • Systems Biology is predicted to become more important • A lot of existing tools and data sources to integrate

  9. TechnologyRecommendation • Use Web Services, especially WS-I profile • Use of XML-schema to describe DataTypes • Give standard definition to DataTypes • Use Standardized Databases Interfaces (make workflows with the EMBRACE services)

  10. Web Servicesadvantages • Replace local resources with remote resources • Web Services provide a standardized access method • Web Services are widely adopted in the BioInformatics community • They are evolving constantly with new specifications

  11. The EMBRACE VO on EGEE • Infrastructure to deploy cpu-intensive and data-intensive applications • Testbed to validate the technology recommendation • 400 CPUs and 3 TB of Data Storage

  12. Statistics on EGEE

  13. Sites of EMBRACE VO

  14. An Example of Application: PDB Database Refinement • Recomputation of 19000 protein structures in 3 steps, using the WISDOM Environment. • Deployment on EGEE (Spring 2007): • 673 CPUs used • 70000 jobs submitted • 17 CPU years • 500 GB of data produced

  15. Application example: WISDOM • Type of computations: docking with proteins and ligands databases. • Web service interface to submit jobs. • Users can use the interface to send docking jobs without specific knowledge of the grid, and embed dockings into their workflows

  16. Application example: Automatic update of databases • Service that automatically replicate and update biological databases (file databases) • Web service interface to deploy new databases or retrieve status of a deployment. • Service can be used also in workflow to make an update before experiment • Hide the datamanagement for non grid expert users

  17. Contacts EMBRACE is coordinated by Graham Cameron and Kerstin Nyberg at the EBI. Peter Rice coordinates the content integration Alan Bleasby coordinates the tools integration Vincent Breton coordinates technology recommendation Erik Bongcam Rudloff coordinates the test cases Gert Vriend coordinates outreach and education

  18. Acknowledgements The EMBRACE project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG-CT-2004-512092.

More Related