Das developer workshop
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

DAS developer workshop PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

DAS developer workshop. Tim Hubbard [email protected] 26th February 2007 Wellcome Trust Sanger Institute. Distributed Annotation System. Origins: xml client/server specification (http://biodas.org/) Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier acedb based prototype server

Download Presentation

DAS developer workshop

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Das developer workshop

DAS developer workshop

Tim Hubbard

[email protected]

26th February 2007

Wellcome Trust Sanger Institute


Distributed annotation system

Distributed Annotation System

  • Origins:

    • xml client/server specification (http://biodas.org/)

    • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier

    • acedb based prototype server

    • Java based prototype client

    • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2.

  • Genome campus adoption

    • Initially via Ensembl becoming a DAS client (now also a DAS server)

    • Code: Dazzle and Proserver servers; Bio::DASLite and biojava client libraries

    • Hosts DAS registry


Das in a nutshell

DAS in a nutshell

  • Standardized set of web services

    • Reference servers (the sequence)

    • Annotation servers (features: chr:start-end)

    • Alignment servers (chr:start-end matches chr:start-end)

    • Identifier based servers (ref item X rather than coordinate)

  • Standardization allows clients to connect to different DAS sources without additional programming


Data integration

Data integration

  • Complete genomes provide the framework to pull all biological data together such that each piece says something about biology as a whole

  • Biology is too complex for any organisation to have a monopoly of ideas or data

  • The more organisations provide data or analysis separately, the harder it becomes for anyone to make use of the results


Utility of bioinformatics

Utility of bioinformatics

Scientific impact

Too little

bioinformatics

Too many databases

Too diverse interfaces


Split data and presentation

Split data and presentation

  • Databases responsible for curating data and serving it as primitive datatypes defined by open standards (high cost)

  • Different front ends or components of front ends compete for users (development of each low cost) c.f. browsers.


Data services

DataServices


Data services1

DataServices


Servers campus das systems clients

e! contigview

epigenome

Apollo

3D structure

ServersCampus DAS systemsClients

Genome

Coordinates

Dazzle

CDS

Coordinates

Sources

Ensembl

Pfam

Swissprot

PubMed

Proserver

e! geneview

Protein

Coordinates

LDAS

otterlace

Stable

Identifiers

Pfam

Sequence

Alignments

Registry


Das infrastructure status

DAS infrastructure status

  • Lots of progress

    • Servers: Dazzle, Proserver, Bio::Daslite

    • Clients: Ensembl, Vega, Dasty, SPICE, Pfam, Jalview, Pepper, IGB

    • >200 sources in DAS registry (http://www.dasregistry.org/)

    • Broadly adopted by Ensembl, biosapiens, efamily, ZF-models, eProtein

  • Lots still to do…

    • Slow adoption rate, particularly in US: upload still easier than distributed…

    • Lack of searching, write back: slow development of DAS2

    • Encourage/facilitate programming against DAS servers

  • Opportunities

    • Source ranking, credit, social networking

    • Inter-client communications protocol

    • Async delivery/caching; servers built on servers/workflows

    • Alternative entry points from servers? Next left/right? Date of addition?


Modern day maps topography

Modern day maps: topography…


Plus annotation

… plus annotation


New synteny aware vertebrate curation environment based on rewrite of acedb zmap

New synteny aware vertebrate curation environment based on rewrite of acedb (zmap)


Servers of data derived from other servers

Consensus

Annotation

Assembly

DAS viewer

Annotation

Servers of data derived from other servers


Servers of data derived from other servers tracing back evidence

Consensus

Annotation

Assembly

DAS viewer

Annotation

Servers of data derived from other servers tracing back evidence


Acknowledgements

Acknowledgements

Ewan Birney

Tony Cox

Thomas Down

Rob Finn

Stefan Graf

David Jackson

Andreas Kahari

Eugene Kulesha

Roger Pettett

Matt Pocock

Andreas Prlic

James Smith

Jim Stalker

Ensembl/Sanger Web team

efamily, biosapiens, eProtein

Zebrafish analysis (ZF-models)

Anacode/Acedb (otterlace/Zmap)


Distributed annotation

Coordinate

Synchronisation

Server

Server

Server

Server

Sequence

Programs

Annotation

Viewer

Distributed Annotation

External Contributors

Database providers

html

xml

Users

xml

Hubbard & Birney, Open annotation offers a

democratic solution to genome sequencing (1999) Nature, 403, 825.


Biojava das implementation

WWW browser

Ensembl MySQL Database

Ensembl WWW server

http

BioJava DAS viewer

Data Adaptor

Dazzle BioJava DAS server

XFF

BioJava DAS client library

DASGFF

(http)

Apollo viewer/ editor

Data Adaptor

Dazzle BioJava DAS server

Data Adaptor

AceDB GFF files

Local GFF files

BioJava DAS implementation


Data from das servers integrated into web displays

WWW browser

Ensembl MySQL Database

Ensembl WWW server

http

BioJava DAS client library

Data Adaptor

Dazzle BioJava DAS server

Dazzle BioJava DAS server

Data Adaptor

AceDB GFF files

Data from DAS servers integrated into web displays


Das v web

DAS Server

DAS Server

DAS Server

Viewer

DAS v Web

Different Web sites

Different interfaces

No integration

Web Model:

links

DAS Model:

Different DAS sites

Automatic Integration

Single interface


Distributed annotation system1

Distributed Annotation System

  • xml client/server specification (http://biodas.org/)

    • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier

    • acedb based prototype server

    • Java based prototype client

    • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2.

  • Ensembl (http://www.ensembl.org/das/)

    • das mailing list

    • server/client combination available (alpha release)

      • Based on BioJava, with BioJava viewer

      • Interface to Apollo, as an alternative viewer


Data integration with ensembl

External data

from

DAS sources

Data integration with Ensembl

User data

(Upload from

flat file)

NCBI data

(DAS server)


Virtual data integration

All data from

DAS sources

Virtual data integration

User data

Vega genes

Ensembl


Das like model applied to other data types

DAS like model applied to other data types

  • features on a linear sequence

    • DNA, protein sequences, protein structures

    • Campus wide MRC ‘grid’ protein family integration project (SCOP, CATH, Pfam, InterPro, MSD) will develop DAS for protein structures.

  • annotation connected to stable identifiers

    • References, experimental observations

    • Sanger note book, attached to genes

  • group relationships between identifiers

    • protein-protein interactions; protein families, orthologues


Data from das servers integrated into web displays1

Ensembl MySQL Database

Ensembl WWW server

Dazzle BioJava DAS server

Upload to Sanger DAS server

Setup local DAS server

and load Data into it

Dazzle BioJava DAS server

Data from DAS servers integrated into web displays

WWW browser

Data mapped to

Genome Sequence

Sanger


Data from das servers integrated into web displays2

Ensembl MySQL Database

Ensembl WWW server

Dazzle BioJava DAS server

Setup local DAS server

and load Data into it

Data from DAS servers integrated into web displays

WWW browser

Data mapped to

Genome Sequence

Sanger


Data from das servers integrated into web displays3

Ensembl MySQL Database

Virtual server using Ensembl WWW code

Dazzle BioJava DAS server

Setup local DAS server

and load Data into it

Dazzle BioJava DAS server

Data from DAS servers integrated into web displays

CustomWWW views

Data mapped to

Genome Sequence

Sanger


Orthologueview pages

DAS annotation

From other

research projects

HumanENSGxxx

MouseENSMUSGxxx

Zebrafish

Worm? Yeast?

Orthologueview pages

OTTOxxxxx1


2d distributed annotation

Identifier

Synchronisation

Server

Server

xml

2D Distributed Annotation

External Contributors

Database providers

Server

xml

Viewer

Users


Component models

Component models

  • Do one thing, but do it well

  • Would rely on databases providing public APIs to components of their services

    • Interoperability: standardised return (e.g. XML) as well as standardised query interface

  • Example: OpenDoc

    • Apple attempt to split desktop applications into components, which users would mix and match. Would have allowed competition at component level. Failed. (Microsoft? Poor implementation?)


Database apoptosis

Database apoptosis

  • Software developers think nothing of rewriting software and throwing the old version away

  • More features, more complexity, more confusing (different, incompatible ways of getting same or worse result)

  • Retire feature if another database does it better and it can be used as a component?


Das developer workshop

Solution 3: integrate using DAS

  • Many Ensembl web views are DAS clients

  • Whole of Ensembl is a DAS server (from release38)

  • Ensembl site integrated with other DAS clients (e.g. SPICE for protein structure)


Das developer workshop

Integration using DAS

  • Whole of Ensembl is a DAS server (from release38)

  • Viewing Ensembl annotation on PDB

  • SPICE DASclient linkedto contigview


  • Login