1 / 38

DAS developer workshop

DAS developer workshop. Tim Hubbard th@sanger.ac.uk 26th February 2007 Wellcome Trust Sanger Institute. Distributed Annotation System. Origins: xml client/server specification (http://biodas.org/) Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier acedb based prototype server

telma
Download Presentation

DAS developer workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DAS developer workshop Tim Hubbard th@sanger.ac.uk 26th February 2007 Wellcome Trust Sanger Institute

  2. Distributed Annotation System • Origins: • xml client/server specification (http://biodas.org/) • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier • acedb based prototype server • Java based prototype client • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2. • Genome campus adoption • Initially via Ensembl becoming a DAS client (now also a DAS server) • Code: Dazzle and Proserver servers; Bio::DASLite and biojava client libraries • Hosts DAS registry

  3. DAS in a nutshell • Standardized set of web services • Reference servers (the sequence) • Annotation servers (features: chr:start-end) • Alignment servers (chr:start-end matches chr:start-end) • Identifier based servers (ref item X rather than coordinate) • Standardization allows clients to connect to different DAS sources without additional programming

  4. Data integration • Complete genomes provide the framework to pull all biological data together such that each piece says something about biology as a whole • Biology is too complex for any organisation to have a monopoly of ideas or data • The more organisations provide data or analysis separately, the harder it becomes for anyone to make use of the results

  5. Utility of bioinformatics Scientific impact Too little bioinformatics Too many databases Too diverse interfaces

  6. Split data and presentation • Databases responsible for curating data and serving it as primitive datatypes defined by open standards (high cost) • Different front ends or components of front ends compete for users (development of each low cost) c.f. browsers.

  7. Data Services

  8. Data Services

  9. e! contigview epigenome Apollo 3D structure Servers Campus DAS systems Clients Genome Coordinates Dazzle CDS Coordinates Sources Ensembl Pfam Swissprot PubMed Proserver e! geneview Protein Coordinates LDAS otterlace Stable Identifiers Pfam Sequence Alignments Registry

  10. DAS infrastructure status • Lots of progress • Servers: Dazzle, Proserver, Bio::Daslite • Clients: Ensembl, Vega, Dasty, SPICE, Pfam, Jalview, Pepper, IGB • >200 sources in DAS registry (http://www.dasregistry.org/) • Broadly adopted by Ensembl, biosapiens, efamily, ZF-models, eProtein • Lots still to do… • Slow adoption rate, particularly in US: upload still easier than distributed… • Lack of searching, write back: slow development of DAS2 • Encourage/facilitate programming against DAS servers • Opportunities • Source ranking, credit, social networking • Inter-client communications protocol • Async delivery/caching; servers built on servers/workflows • Alternative entry points from servers? Next left/right? Date of addition?

  11. Modern day maps: topography…

  12. … plus annotation

  13. New synteny aware vertebrate curation environment based on rewrite of acedb (zmap)

  14. Consensus Annotation Assembly DAS viewer Annotation Servers of data derived from other servers

  15. Consensus Annotation Assembly DAS viewer Annotation Servers of data derived from other servers tracing back evidence

  16. Acknowledgements Ewan Birney Tony Cox Thomas Down Rob Finn Stefan Graf David Jackson Andreas Kahari Eugene Kulesha Roger Pettett Matt Pocock Andreas Prlic James Smith Jim Stalker Ensembl/Sanger Web team efamily, biosapiens, eProtein Zebrafish analysis (ZF-models) Anacode/Acedb (otterlace/Zmap)

  17. Coordinate Synchronisation Server Server Server Server Sequence Programs Annotation Viewer Distributed Annotation External Contributors Database providers html xml Users xml Hubbard & Birney, Open annotation offers a democratic solution to genome sequencing (1999) Nature, 403, 825.

  18. WWW browser Ensembl MySQL Database Ensembl WWW server http BioJava DAS viewer Data Adaptor Dazzle BioJava DAS server XFF BioJava DAS client library DASGFF (http) Apollo viewer/ editor Data Adaptor Dazzle BioJava DAS server Data Adaptor AceDB GFF files Local GFF files BioJava DAS implementation

  19. WWW browser Ensembl MySQL Database Ensembl WWW server http BioJava DAS client library Data Adaptor Dazzle BioJava DAS server Dazzle BioJava DAS server Data Adaptor AceDB GFF files Data from DAS servers integrated into web displays

  20. DAS Server DAS Server DAS Server Viewer DAS v Web Different Web sites Different interfaces No integration Web Model: links DAS Model: Different DAS sites Automatic Integration Single interface

  21. Distributed Annotation System • xml client/server specification (http://biodas.org/) • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier • acedb based prototype server • Java based prototype client • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2. • Ensembl (http://www.ensembl.org/das/) • das mailing list • server/client combination available (alpha release) • Based on BioJava, with BioJava viewer • Interface to Apollo, as an alternative viewer

  22. External data from DAS sources Data integration with Ensembl User data (Upload from flat file) NCBI data (DAS server)

  23. All data from DAS sources Virtual data integration User data Vega genes Ensembl

  24. DAS like model applied to other data types • features on a linear sequence • DNA, protein sequences, protein structures • Campus wide MRC ‘grid’ protein family integration project (SCOP, CATH, Pfam, InterPro, MSD) will develop DAS for protein structures. • annotation connected to stable identifiers • References, experimental observations • Sanger note book, attached to genes • group relationships between identifiers • protein-protein interactions; protein families, orthologues

  25. Ensembl MySQL Database Ensembl WWW server Dazzle BioJava DAS server Upload to Sanger DAS server Setup local DAS server and load Data into it Dazzle BioJava DAS server Data from DAS servers integrated into web displays WWW browser Data mapped to Genome Sequence Sanger

  26. Ensembl MySQL Database Ensembl WWW server Dazzle BioJava DAS server Setup local DAS server and load Data into it Data from DAS servers integrated into web displays WWW browser Data mapped to Genome Sequence Sanger

  27. Ensembl MySQL Database Virtual server using Ensembl WWW code Dazzle BioJava DAS server Setup local DAS server and load Data into it Dazzle BioJava DAS server Data from DAS servers integrated into web displays CustomWWW views Data mapped to Genome Sequence Sanger

  28. DAS annotation From other research projects HumanENSGxxx MouseENSMUSGxxx Zebrafish Worm? Yeast? Orthologueview pages OTTOxxxxx1

  29. Identifier Synchronisation Server Server xml 2D Distributed Annotation External Contributors Database providers Server xml Viewer Users

  30. Component models • Do one thing, but do it well • Would rely on databases providing public APIs to components of their services • Interoperability: standardised return (e.g. XML) as well as standardised query interface • Example: OpenDoc • Apple attempt to split desktop applications into components, which users would mix and match. Would have allowed competition at component level. Failed. (Microsoft? Poor implementation?)

  31. Database apoptosis • Software developers think nothing of rewriting software and throwing the old version away • More features, more complexity, more confusing (different, incompatible ways of getting same or worse result) • Retire feature if another database does it better and it can be used as a component?

  32. Solution 3: integrate using DAS • Many Ensembl web views are DAS clients • Whole of Ensembl is a DAS server (from release38) • Ensembl site integrated with other DAS clients (e.g. SPICE for protein structure)

  33. Integration using DAS • Whole of Ensembl is a DAS server (from release38) • Viewing Ensembl annotation on PDB • SPICE DASclient linkedto contigview

More Related