BioJava in 2002

BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

What is BioJava? • Java code (Java2 required – 1.2 and higher) • Open-Source • Bioinformatics • Library for building Applications • Sequence Centric (we’d love to do more) • Part of the Open Bioinformatics Foundation (OBF) • Drop biojava.jar into your CLASSPATH & go

Where is BioJava? • http://www.biojava.org • mailto:biojava-l@biojava.org • #biojava on irc.openprojects.net

Who is BioJava? • 35+ Developers in most continents and time-zones • Core team >5 individuals • Ever expanding user group

A look at some API Stuff

What’s Been There for a While? • Sequences with hierarchical features • Sequence databases • Sequence IO • Various sequence formats (embl, genbank, gff, swissprot…) • Object model can be bypassed for high-performance scanning • Probability distributions over symbols and Dynamic programming toolkit • Blast Parsers

What’s Reasonably New? • TagValue parser API • Sequence Search APIs • Interoperable with BioJava XML-based parsers for many common sequence search algorithms • Pure-Java SSAHA implementation • Bit-packed sequence storage • Taxonomies • Literature References • Phred

What’s Recently Improved? • Gap handling • Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps • DAS Client is now very robust • Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented • More ‘framey’ annotation bundles • Sequence Rendering • Looks much better now • Handles ‘dotter-style’ 2d rendering • We now actually write JUnit Tests!

Java 1.4-reliant Source • Java 1.4 offers APIs that are really useful for Bioinformatics • Logging • NIO interfaces for fast IO and raw data access • Regular expressions • Cascading Exceptions • Biojava code relying on 1.4 APIs are conditionally built • SSAHA implementation • Some parsers and handlers for TagValue • Restriction enzyme digests

OBDA and Fun Trips • Sponsored by O’Reilly and Electric Genetics • Developers attended a two-part Hackathon in Tuscon, AZ, USA and Cape Town, South Africa • Representatives from BioJava, BioPerl, BioPython, BioRuby, Ensembl, Emboss and others • We hammered out and implemented a range of standards designed from the ground up to be • Interoperable between the Bio* projects • Relatively easy to implement from scratch • We drank lots of red wine

OBDA Support • BIOCORBA – corba sequence interfaces • BioSQL – relational tables and standard semantics for storing sequences • BioFetch – cgi-bin-based sequence fetching • XEMBL – xml-based sequence fetching • Bio Directories – configuration file for resolving resources • Flat-file Indexing – fetch records by ID and secondary ID from multiple ASCII files

Things We’d Like To Do in the Near Future • Support non-DNA areas of Bioinformatics • Cladistics, evolutionary trees, clusters • Expression data • Proteomics • Networks/pathways • Biochemical reactions • Integrate pre- and post-1.4 exception systems • Modify the change notification system • Better synchronization and transaction support • Easier to optimize events that don’t have listeners • More robust handling of event cascades

What Will We See in BioJava 2? • Pervasive use of Ontologies • Storing annotating data • Definition of processing pipelines (e.g. customizing parsers) • Bindings between BioJava interfaces and external data sources • Das, biosql, biocorba • Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches • Much more code generation • Push most repetitive code into code generators • Auto-generate much of the event notification web • Much better transactionallity • Reduce implementation cost for developers • Expose any/all BioJava instances through SOAP • Naming and Directory Services

And the Biggest Change of All? • Make the library accessible to casual developers for writing throw-away scripts as well as system architects • Documentation • Tutorials • Training • Utility classes (e.g. SeqIOTools)

Some Contributors

BioJava in 2002

BioJava in 2002

Presentation Transcript

MONETARY POLICY IN 2002 (Updated and revised as of July 2002) August 2002

BioJava Core API

2002

Drought in Rajasthan : 2002

Created in April 2002.

2002

SIP in 2002

HERMES in 2002

LIBRARIES IN 2002

DOCLINE in 2002

Formed in July 2002

Implementation in 2002-3

Biojava

2002/2003 2002/2003 2002/2003 2002/2003

HCAL1 performance in 2002

GEOSCOPE in 2002