1 / 37

GadFly

GadFly. Building a Genome Annotation Database. What is it?. SQL Database Perl Objects Perl API Client Applications Analysis Pipeline. History. BFD Celera Annotation Jamboree GAME XML (Suzi, Erwin) Ensembl BioPerl GO. Data Stored. Sequence Analyses (genomic, cDNA, peptide)

iniko
Download Presentation

GadFly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GadFly Building a Genome Annotation Database

  2. What is it? • SQL Database • Perl Objects • Perl API • Client Applications • Analysis Pipeline

  3. History • BFD • Celera Annotation Jamboree • GAME XML (Suzi, Erwin) • Ensembl • BioPerl • GO

  4. Data Stored • Sequence • Analyses (genomic, cDNA, peptide) • Genome Annotations • Gene Ontology

  5. Architecture

  6. SQL Database Design • Generic Modeling • Simplicity • Abstraction • Extensibility/Evolvability • Heavily Normalised

  7. Diversion: RDBs

  8. Normalization

  9. GadFly Tables: seq_feature

  10. GadFly Tables: seq

  11. GadFly Tables: seq(2)

  12. Location Graphs

  13. Location Graphs (II) • Locations are transitive relationships • Locations can be transformed • e.g. gene loc in arm or contig coordinates • Linear transforms simple function • Nonliner transforms more difficult • eg if seq_feature to seq relationship involves splicing or translation

  14. Extensible Properties

  15. Composition Graphs

  16. Expect the unexpected

  17. Minimal Graphs • Not necessary to store everything • e.g. Exon => Intron • e.g. Gene to translation implied • Arcs implied from spatial relationships • Some redundancy useful • Flexibility essential • Sets vs lists

  18. seq_feature relationships

  19. Unlimited Possibilities • Evidence networks • TFs + binding sites • Intersection graphs • precompute cytology • insertions + gene features • Yeast 2 hybrid / P-P interactions • Similarity Graphs

  20. Similarity Results: Pairs

  21. Structured Controlled Vocabularies

  22. Strawman alternative to generic modeling

  23. GadFly Object Model • Objects: • in-memory representation • Inheritance • Gene is a kind of SeqFeature • Interfaces • bioperl/gbrowse • Methods and attributes • e.g. length(), get_seq(), start(), etc

  24. SeqFeature Class

  25. Inheritance

  26. GadFly perl API • How do we get/put objects? • Application Programmer Interface • Means of making requests about objects • Fetching Objects • database, file, XML, GFF • Putting Objects • database, file, XML, GFF, text, HTML • Adapters

  27. API Requests • fetch all Genes that are transcription factors on 2L • write an annotated sequence to XML • fetch all the blastp results against human • find all sim4 hits to SD ESTs in the first megabase of 2L

  28. Transcription Factors

  29. Fetching Blastp ResultSets

  30. Fetching a sequence

  31. Adapters • Objects are datasource-ignorant • Different In/Out adapters have different properties • No constraint on the number of database adapters • GadFly db: GxAdapters

  32. Client Applications • flyshell • Web/CGI interface • multitude of scripts • pipeline • Apollo (kind of)

  33. Future • intelligent denormalisation • Ontologies • GMOD • pan-flybase database (with Dave) • Data • other species, comparative, expression, proteomic • UI

  34. Discussion • Object Models - the way to go? • language lock-in • insulated from db • complexity • Utilise DBMS more? • postgres: views, procedures • Ontologies + graph based systems?

  35. Acknowledgements • BDGP • FlyBase • Ensembl - Ian, Ewan • WormBase - Lincoln Stein • In advance - • UC Davis • new folks

More Related