1 / 18

BODHI, A Bio-diversity Database Pla(n)tform

BODHI, A Bio-diversity Database Pla(n)tform. Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science. Team. B. J. Srikanta (next talk) Prof. Madhav Gadgil Prof. V. Nanjundiah (Centre for Ecological Sciences, IISc)

darryl
Download Presentation

BODHI, A Bio-diversity Database Pla(n)tform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BODHI,A Bio-diversity Database Pla(n)tform Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science BODHI

  2. Team • B. J. Srikanta (next talk) • Prof. Madhav GadgilProf. V. Nanjundiah(Centre for Ecological Sciences, IISc) • Several Masters Students • Funded by DBT BODHI

  3. Motivation • GATT – Patent Laws • To be in place by 2005 • Loss • Neem • Basmati (estimated export value: Rs. 1,198 crore) • Turmeric • Global and local efforts • GBIF (Global Biodiversity Information Facility) • Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000] BODHI

  4. Bio-diversity Data • Taxonomy of species • Phenetic (physical) characteristics • Phylogenetic (evolutionary) characteristics • Habitat / Spatial distribution • Political Layout • Geographic Layout • Biospheres • Genetic information • Bio-molecular sequences • Structural information BODHI

  5. MULTI-DOMAIN QUERY • Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”. BODHI

  6. Difficulties: • Complex range of data types • sets, hierarchies, aggregations, sequences, geometries, maps, audio, images … • Multidimensional data • spatial (latitude, longitude, elevation) toproteins (hundreds of coordinates) • Computationally-intensive operators • species relationships, spatial distributions, sequence alignments, ... BODHI

  7. Current Solutions • Small-Scale • MS-Access / FoxPro / Excel / ... • Pentium PCs • Large-Scale • RDBMS: Oracle / DB2 / Informix / Sybase / … • Unix servers: Sun / SGI / IBM / HP / ... BODHI

  8. Limitations: • RDBMS approach of “the world is a flat collection of tables with simple attributes” suits financial applications, NOT scientific (biological) applications • In particular,taxonomic / spatial / sequence / multimedia data modeling and processingare very cumbersome and coarse BODHI

  9. Limitations (contd) • Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle. • Problem: Slow and ugly! BODHI

  10. Is there Hope? • Object-Oriented DBMS • “Natural” for biological applications • High-performance data access methods • Path Dictionary Index, Multi-key Type Index,Pyramid Tree, ... • High-performance specialized operators • spatial join, data mining, sequence processing, … • XML = HTML + Semantics BODHI

  11. Goals of BODHI • Seamless integration of taxonomic, spatial and genomic data using OO technology • Latest access methods and operatorsfor all three types of data • Utilize XML for data exchange • Low-cost (ideally, free!) BODHI

  12. The Internet Architecture of BODHI Client Interface Framework Query Processor Spatial Operations Object Operations Genome Operations Spatial Indexes Object Indexes Genome Indexes Spatial Model Taxonomy Model Genome Model Spatial Services Object Services Sequence Services OBJECT STORAGE MANAGER BODHI

  13. The Internet Implementation of BODHI Client Interface Framework –DB Overlaps, Contains,Closest, Within Inheritance Aggregation Alignment BLAST, FASTA R*-tree, Hilbert-Rtree Multi-Key Type, Path-Dictionary ??? Indexes (next talk) Country, State, City, River, Road Species, Genera, Family, Order DNA, Protein Spatial Services Object Services Sequence Services Basic Types (Point, Line, Polygon, Sets, Sequences, ...) SHORE MICRO-KERNEL BODHI

  14. Query Flow BODHI

  15. Project Status • Prototype (minus Client Interface Framework) is operational since last month ! • Platform: PIII-700MHz running Redhat Linux. • For Code, contact “bodhi@dsl.serc.iisc.ernet.in” BODHI

  16. Performance Evaluation • SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin • Taxonomy + Spatial Queries: Reasonably fast • But Genomics slows things down a lot due to absence of indexes (next talk) BODHI

  17. More details • “Design and Implementation of a Biodiversity Information System”,Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000 • “The Building of BODHI, A Bio-diversity Database System”,TechRep-2001-02, DSL/SERC, IISc • Available at http://dsl.serc.iisc.ernet.in BODHI

  18. End of Talk BODHI

More Related