1 / 17

ENABLE project notes

ENABLE project notes Don Gilbert, gilbertd@indiana.edu Sept 2003 Biology information access projects Bio-info archiving and distribution IUBio Archive, http://iubio.bio.indiana.edu/ -- public molecular biology data / software archive

omer
Download Presentation

ENABLE project notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENABLE project notes Don Gilbert, gilbertd@indiana.edu Sept 2003

  2. Biology information access projects • Bio-info archiving and distribution • IUBio Archive, http://iubio.bio.indiana.edu/ -- public molecular biology data / software archive • Bio-Mirrors, http://www.bio-mirror.net/ -- Sequence and related biology databanks • Genome information systems • FlyBase, http://flybase.bio.indiana.edu/ -- genome infosystem of Drosophila fruitfly • euGenes, http://eugenes.org/ -- infosystem for 8 important eukaryotes with 180,000 genes • Bio-Data Grids • http://iubio.bio.indiana.edu/grid/ -- experimental distributed computing

  3. FlyBase and euGenes

  4. ENABLE: data sets & access • Data sets and access to data • Back-end architecture and protocols • Bio-Grids and Bio-Directories

  5. Major Bio Databanks from EBI (www.ebi.ac.uk), Sept. 2002

  6. Constellation of Bio-Data (SRS - Lion Bioscience)

  7. ENABLE access architecture

  8. Data Access Needs Computable genome data access -- Page scraping and bulk files not enough -- Internet search & retrieval of all genome objects distributed among many sources -- Simple, flexible client program model -- Efficient for high volumes (105 objects; >1 GB sizes)

  9. Directories of Genome Data • Directories are a necessary step for bio grids • "broad and shallow" directories federate the "narrow and deep" databases • Bio-Data Access Tools • SRS, Sequence Retrieval System; Entrez ; AceDB; Genome relational databases (Ensembl, FlyBase, WormBase) ; IBM DiscoveryLink; BioDAS ; BioMoby • Directory services for data access • Layer onto access tools for common query/retrieval • LDAP: mature, efficient for high volumes, query distributed directories ; works well with bio-access tools • Web Services: XML messages over Web ; wide industry support , standards are in progress

  10. Directory components

  11. Bio Directory Needs • Build on existing technology for finding distributed objects • Efficient for millions of objects, by the gigabyte and terabyte • Queries distributed across directories of collaborating services • Support existing and new bioinformatics data access (relational dbs, object and XML dbs, SRS, Entrez, AceDB) • Simple client program methods for computable use of directories • Flexible, common schema for describing objects • Replicate directories and objects among bioinformatics centers • Peer-to-peer directories for collaborative projects • Strong authentication and security for data access

  12. Directory Standards • Open Grid Services Architechture (OGSA) • SOAP based; query support for XML-SQL, Xpath, Xquery. • Data Access project: http://www.ogsa-dai.org.uk/ • Lightweight Directory Access (LDAP) • Robust system for distributed search and retrieval • Object-centric, optimized for efficient read operations • Hierarchical, distributed and replicated in nature • Life Sciences ID (LSID) • new standard for bio-object naming, with LDAP and WebServices implementations • Moby project web services repository system

  13. Directory Tests • Design and test distributed access with LDAP and Web Services • SRS backend for efficient search/retrieval from GenBank, SwissProt/TrEMBL, LocusLink, Medline, many others • Find & fetch 20,000 to 1.2 million objects • LDAP is ~10x faster than WebServices • Tests in progress for IUBio, FlyBase

  14. Directory Tests

  15. ENABLE biodata access issues • Basic Web-Services and LDAP access working in testing form; not stable nor finalized • Bio-Data categorization, schema, and meta-data for directories needs work • Grid (OGSA), OAI, other interfaces to be developed Directory tests at http://iubio.bio.indiana.edu/biogrid/directories/

More Related