enable project notes l.
Skip this Video
Loading SlideShow in 5 Seconds..
ENABLE project notes PowerPoint Presentation
Download Presentation
ENABLE project notes

Loading in 2 Seconds...

play fullscreen
1 / 17

ENABLE project notes - PowerPoint PPT Presentation

  • Uploaded on

ENABLE project notes Don Gilbert, gilbertd@indiana.edu Sept 2003 Biology information access projects Bio-info archiving and distribution IUBio Archive, http://iubio.bio.indiana.edu/ -- public molecular biology data / software archive

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

ENABLE project notes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
enable project notes

ENABLE project notes

Don Gilbert, gilbertd@indiana.edu

Sept 2003

biology information access projects
Biology information access projects
  • Bio-info archiving and distribution
    • IUBio Archive, http://iubio.bio.indiana.edu/ -- public molecular biology data / software archive
    • Bio-Mirrors, http://www.bio-mirror.net/ -- Sequence and related biology databanks
  • Genome information systems
    • FlyBase, http://flybase.bio.indiana.edu/ -- genome infosystem of Drosophila fruitfly
    • euGenes, http://eugenes.org/ -- infosystem for 8 important eukaryotes with 180,000 genes
  • Bio-Data Grids
    • http://iubio.bio.indiana.edu/grid/ -- experimental distributed computing
enable data sets access
ENABLE: data sets & access
  • Data sets and access to data
  • Back-end architecture and protocols
  • Bio-Grids and Bio-Directories
major bio databanks
Major Bio Databanks

from EBI (www.ebi.ac.uk), Sept. 2002

data access needs
Data Access Needs

Computable genome data access

-- Page scraping and bulk files not enough

-- Internet search & retrieval of all genome objects distributed among many sources

-- Simple, flexible client program model

-- Efficient for high volumes (105 objects; >1 GB sizes)

directories of genome data
Directories of Genome Data
  • Directories are a necessary step for bio grids
    • "broad and shallow" directories federate the "narrow and deep" databases
  • Bio-Data Access Tools
    • SRS, Sequence Retrieval System; Entrez ; AceDB; Genome relational databases (Ensembl, FlyBase, WormBase) ; IBM DiscoveryLink; BioDAS ; BioMoby
  • Directory services for data access
    • Layer onto access tools for common query/retrieval
    • LDAP: mature, efficient for high volumes, query distributed directories ; works well with bio-access tools
    • Web Services: XML messages over Web ; wide industry support , standards are in progress
bio directory needs
Bio Directory Needs
  • Build on existing technology for finding distributed objects
  • Efficient for millions of objects, by the gigabyte and terabyte
  • Queries distributed across directories of collaborating services
  • Support existing and new bioinformatics data access (relational dbs, object and XML dbs, SRS, Entrez, AceDB)
  • Simple client program methods for computable use of directories
  • Flexible, common schema for describing objects
  • Replicate directories and objects among bioinformatics centers
  • Peer-to-peer directories for collaborative projects
  • Strong authentication and security for data access
directory standards
Directory Standards
  • Open Grid Services Architechture (OGSA)
    • SOAP based; query support for XML-SQL, Xpath, Xquery.
    • Data Access project: http://www.ogsa-dai.org.uk/
  • Lightweight Directory Access (LDAP)
    • Robust system for distributed search and retrieval
    • Object-centric, optimized for efficient read operations
    • Hierarchical, distributed and replicated in nature
  • Life Sciences ID (LSID)
    • new standard for bio-object naming, with LDAP and WebServices implementations
  • Moby project web services repository system
directory tests
Directory Tests
  • Design and test distributed access with LDAP and Web Services
  • SRS backend for efficient search/retrieval from GenBank, SwissProt/TrEMBL, LocusLink, Medline, many others
  • Find & fetch 20,000 to 1.2 million objects
  • LDAP is ~10x faster than WebServices
  • Tests in progress for IUBio, FlyBase
enable biodata access issues
ENABLE biodata access issues
  • Basic Web-Services and LDAP access working in testing form; not stable nor finalized
  • Bio-Data categorization, schema, and meta-data for directories needs work
  • Grid (OGSA), OAI, other interfaces to be developed

Directory tests at