1 / 18

The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones,

The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff University). Data Integration in Bioinformatics Using OGSA-DAI. Overview. Bioinformatics Data Access and Integration Requirements Generic BioDA Workshop and Questionnaire

weston
Download Presentation

The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff University) Data Integration in Bioinformatics Using OGSA-DAI

  2. Overview • Bioinformatics Data Access and Integration Requirements • Generic • BioDA Workshop and Questionnaire • BDWorld-specific • OGSA-DAI exemplar

  3. The BioDA Project • Independent Evaluation of OGSA-DAI • the suitability of that software in its present form • how to leverage OGSA-DAI in bioinformatics GRID • OGSA-DAI Product Improvement • Feedbacks to the DAIT Team • Knowledge Dissemination • Evaluation Report • Publications/Presentations • Workshop on OGSA-DAI for the bioinformatics eResearch community

  4. Bioinformatics The Application and development of computing of mathematics to the management, analysis an understanding of data to solve biological question. Attwood, TK and Parry-Smith, DJ 1999 Data Management Data Analysis

  5. Grid Computing ... “... flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources…” Foster, Kesselman and Tuecke, 2001

  6. 1st BioDA Workshop • Objectives • examine bioinformatics community’s needs for data access and integration (DAI) on the grid, and • to explore the application of OGSA-DAI, a middleware developed expressly to address DAI requirements of eScience projects

  7. The BioDA Survey

  8. The Results 17 key requirements, top of the list include: • schema integration • schema mapping • mixed language query • complex join across databases • provenance data • flexible resource discovery • RDF database access

  9. The BioDA Exemplar The BioDiversity World • To create a GRID-based problem solving environment.  • Enable collaborative exploration and analysis of global biodiversity patterns using workflow and rich data sources from around the world • Example applications would be modeling species distributions against climate change, conservation prioritization and linking evolutionary changes to past climates. 

  10. BDWorld(Source: BDWolrd) Taxonomic index (Species 2000& ITIS Catalogue of Life) Analytic tool GSD GSD GSD GSD Proxy Proxy Analytic tool BDGrid Proxy • Ontology: • Metadata • Intelligent links • Resource & analytic tool descriptions • Maintenance tools • Problem Solving Environment: • Broker agents • Facilitator agents • Presentation agents Thematicdatasource Proxy User Problem Solving Environment user interface Proxy Proxy Local tools Abiotic data source

  11. BDWorld Data Resources :Key Issues • geographically distributed and autonomous • heterogeneous in structure and data standards • mainly read via HTTP/XML protocols using custom wrappers • SQL queries are limited to the EBI EMBL store and BDWorld cache databases • potentially resource-intensive to harvest • a single taxa name may resolve into a large number of ‘accepted’ taxon names • same query repeated on different data collections

  12. Resource Wrapping(Source:BDWorld) BGI API BGI API BDWorld-GRID Interface (BGI) BDWorld-GRID Interface (BGI) User Workflow enactment engine Remote Resource Wrapper The GRID

  13. Implications for BioDA • abstraction layer (BGI)  Proprietary invocation mechanism • InvokeOperation (ResourceHandler, Operation, XmlDataCollection) • prepared search statements defined in individual data resource wrapper • BGI protocols  BDW communication objects. Search parameters and results passed as XmlDataCollecton

  14. BioDA Exemplar • Two main possibilities within BDW: • Augment BGI to support inclusion of queries in workflows and to be sent directly to OGSA-DAI enabled databases. • Distributed query processing facilities could assist in planning execution & distribution of data-orientated parts of a workflow. (For the current status of OGSA-DQP see Section 4.) • Very major revision to BDW protocols; also, • many resources of interest are simply not exposed as databases. • Provide facilities within individual wrappers that benefit from OGSA-DAI.

  15. OGSA-DAI Prototype(What we’d have liked) 1. BGI InvokeOperation() Wrapper Wrapper Wrapper 4. Query Web DBs OGSA-DAI R5 GDS 3. Invoke wrapper Wrapper Module BDWQueryActivity 2. Create GDS and query 7. url 6. Download url deliverFromURL(url) 5. Download URL OGSA-DAI Client deliverFromURL(xsl) 8. XSL transform to BDW format XSLTransform 9. To WF unit deliverToURL/GFTP

  16. Key Issues encountered • Complex client-side coding to orchestrate the application flow • require several GDS perform requests… • Difficult to synchronise • Remote web databases have different response time (or not response at all!) • Different data transformation series applicable to different data resources • BDW Protocols specify data returned as a BDW XmlDataCollection object

  17. OGSA-DAI Prototype(What we ended up doing) 1. BGI InvokeOperation() Wrapper Module 2. Create GDS and query Wrapper Wrapper Wrapper 7. return XmlDataCollection Web DBs 5. Write cache file Cache File OGSA-DAI R5 GDS BDWQueryActivity 6. return XmlRemoteData 3. Invoke wrapper/s 4. Query, transform OGSA-DAI Client

  18. Conclusion • Highlighted key bioinformatics eScience project requirements for OGSA-DAI • support for a metadata-driven two-step access to data and data integration… • Reviewed BDWorld DAI requirements • uniform access to disparate, heterogeneous data resources • including anonymous access to web information system • Reviewed the BDWorld OGSA-DAI exemplar and issues encountered

More Related