1 / 15

BioJava in 2002

BioJava in 2002. An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD). What is BioJava?. Java code (Java2 required – 1.2 and higher) Open-Source Bioinformatics Library for building Applications Sequence Centric (we’d love to do more)

gabe
Download Presentation

BioJava in 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

  2. What is BioJava? • Java code (Java2 required – 1.2 and higher) • Open-Source • Bioinformatics • Library for building Applications • Sequence Centric (we’d love to do more) • Part of the Open Bioinformatics Foundation (OBF) • Drop biojava.jar into your CLASSPATH & go

  3. Where is BioJava? • http://www.biojava.org • mailto:biojava-l@biojava.org • #biojava on irc.openprojects.net

  4. Who is BioJava? • 35+ Developers in most continents and time-zones • Core team >5 individuals • Ever expanding user group

  5. A look at some API Stuff

  6. What’s Been There for a While? • Sequences with hierarchical features • Sequence databases • Sequence IO • Various sequence formats (embl, genbank, gff, swissprot…) • Object model can be bypassed for high-performance scanning • Probability distributions over symbols and Dynamic programming toolkit • Blast Parsers

  7. What’s Reasonably New? • TagValue parser API • Sequence Search APIs • Interoperable with BioJava XML-based parsers for many common sequence search algorithms • Pure-Java SSAHA implementation • Bit-packed sequence storage • Taxonomies • Literature References • Phred

  8. What’s Recently Improved? • Gap handling • Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps • DAS Client is now very robust • Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented • More ‘framey’ annotation bundles • Sequence Rendering • Looks much better now • Handles ‘dotter-style’ 2d rendering • We now actually write JUnit Tests!

  9. Java 1.4-reliant Source • Java 1.4 offers APIs that are really useful for Bioinformatics • Logging • NIO interfaces for fast IO and raw data access • Regular expressions • Cascading Exceptions • Biojava code relying on 1.4 APIs are conditionally built • SSAHA implementation • Some parsers and handlers for TagValue • Restriction enzyme digests

  10. OBDA and Fun Trips • Sponsored by O’Reilly and Electric Genetics • Developers attended a two-part Hackathon in Tuscon, AZ, USA and Cape Town, South Africa • Representatives from BioJava, BioPerl, BioPython, BioRuby, Ensembl, Emboss and others • We hammered out and implemented a range of standards designed from the ground up to be • Interoperable between the Bio* projects • Relatively easy to implement from scratch • We drank lots of red wine

  11. OBDA Support • BIOCORBA – corba sequence interfaces • BioSQL – relational tables and standard semantics for storing sequences • BioFetch – cgi-bin-based sequence fetching • XEMBL – xml-based sequence fetching • Bio Directories – configuration file for resolving resources • Flat-file Indexing – fetch records by ID and secondary ID from multiple ASCII files

  12. Things We’d Like To Do in the Near Future • Support non-DNA areas of Bioinformatics • Cladistics, evolutionary trees, clusters • Expression data • Proteomics • Networks/pathways • Biochemical reactions • Integrate pre- and post-1.4 exception systems • Modify the change notification system • Better synchronization and transaction support • Easier to optimize events that don’t have listeners • More robust handling of event cascades

  13. What Will We See in BioJava 2? • Pervasive use of Ontologies • Storing annotating data • Definition of processing pipelines (e.g. customizing parsers) • Bindings between BioJava interfaces and external data sources • Das, biosql, biocorba • Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches • Much more code generation • Push most repetitive code into code generators • Auto-generate much of the event notification web • Much better transactionallity • Reduce implementation cost for developers • Expose any/all BioJava instances through SOAP • Naming and Directory Services

  14. And the Biggest Change of All? • Make the library accessible to casual developers for writing throw-away scripts as well as system architects • Documentation • Tutorials • Training • Utility classes (e.g. SeqIOTools)

  15. Some Contributors

More Related