1 / 60

The GMOD Project

The GMOD Project. Lincoln Stein Cold Spring Harbor Laboratory. Test Subject: Michael Caudy. Drosophila neurobiologist Proneural differentiation notch pathway HLH transcriptional activators/repressors achaete/scute complex No computer science training

vince
Download Presentation

The GMOD Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory

  2. Test Subject: Michael Caudy • Drosophila neurobiologist • Proneural differentiation • notch pathway • HLH transcriptional activators/repressors • achaete/scute complex • No computer science training • Took my “bioinformatics for biologists” course

  3. “Simple” Problem • Discover the transcriptional factor binding site code controlling proneural differentiation.

  4. Regular Expression Search • Using achaete promoter as exemplar, search for combinations of known binding sites in particular architectures

  5. Mike’s Got Lots of Data • 90-11,000 TF binding site clusters • 100s-1000s of genes • millions of interactions • Which genes are involved in neural differentiation? • Which have interactions with the pathway? • Which have suggestive mutant phenotypes?

  6. Mike Needs a Database • Database management system for proneural differentiation genes. • Visualization/exploration tools for relationship of genes to putative TF clusters. • Literature citations • Link out to FlyBase, Genbank & other DBs. • Add notes and other annotations.

  7. Try to do it with Filemaker • “Cluster-centric” vs “gene-centric”? • Data import from FlyBase? • Storing images? • Maintaining relationships between genes & clusters? • Updates?

  8. Mike Needs a MOD • Model Organism Database • Repository for reagents • Stocks, vectors, clones • Genetic & physical maps • Large-scale data sets • Genome • EST sets, microarray results, 2-cell hybrid interactions • Literature • Ontologies & Nomenclature • Meetings, announcements

  9. Example MOD: WormBase

  10. Looking for Sex

  11. An Author Entry

  12. Bibliography

  13. Citation

  14. Gene

  15. Genome

  16. Proteome

  17. Comparative Genomics

  18. Functional Genomics

  19. Anatomy

  20. How WormBase Works Web server Images, Movies Perl scripts You Database access library Genomic Data ACeDB MySQL

  21. Can Mikereuse WormBaseto manage his data? No!

  22. Sorry Mike • WormBase website difficult to install • Data model nematode-centric • Data entry tools very process-specific • Customization difficult • Software documentation uneven • Standard operating procedure documentation uneven

  23. MOD Redux • SGD, MGD, FlyBase, TAIR, RGD… • The same basic idea as WormBase • Implementation entirely different • Wheel reinvented many times • Little software sharing • This madness must stop!

  24. The GMOD Project • Portable, open source software to support model organism databases • Multiple MODs involved • Worm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] • Funded by NIH as of June 2002 • Programmers, coordinator, quarterly meetings http://www.gmod.org

  25. GMOD Home Page

  26. Modular Applications The GMOD Pyramid Modular Schema Open Source DBMS & Middleware

  27. genetic maps liter- ature genome A MOD Construction Set map browser map editor Appplication Layer annotation pipeline genome browser genome editor citation browser citation editor Bioperl BioJava BioPython Middleware Layer genomes maps citations Database Layer

  28. Chado – Modular Schema • Common schema for use by FlyBase and WormBase • Ontology Driven • Small number of generic tables e.g. “feature” • Controlled vocabulary names object types and relationships among them: • “achaeteproteinis aHLH activator” • “m8 proteininhibitsachaetetranscription” • Evidence-Savvy

  29. GMOD Applications • Apollo genome annotation editor • Gbrowse generic genome browser • PubSearch literature curation editor • CMAP comparative map browser • IMD insertional mutagenesis database management system

  30. Apollo – BDGP & Sanger Center

  31. Apollo Data adapters • Parser -> data models -> display • Existing data adapters • GAME XML • GFF • Ensembl CGI server • DAS • Write your own data adapter! • Extend AbstractDataAdapter class • Display options defined in config file

  32. Who is Using Apollo? • BDGP • Reannotated Drosophila genome • Bristol-Myers Squibb • Launching Apollo from web browser via mime types • GNF • JDBC adapter layer over BioSQL • Biogen • View human genome alignment between public and Biogen internal database • Connected BLAT pipeline to Apollo • HGMP-RC Fugu Genomics group • Displaying annotations on fugu scaffolds

  33. PubSearch – TAIR & RatDB

  34. PubSearch – Gene Association

  35. IMD – Insertional Mutagenesis Db

  36. CMap – Gramene

  37. Cmap – Detailed View

  38. GBrowse – WormBase

  39. GBrowse – Zoomed in

  40. GBrowse – Zoomed Way In

  41. GBrowse – Zoomed Way Way In

  42. GBrowse – Keyword Search

  43. GBrowse – Third Party Annotations

  44. Sequence dumps & other reports

  45. Extensively Customizable • End-user • Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) • Data provider • Change fonts, colors, text. • Change overview – genetic map, contigs, coverage, karyotype. • Define new tracks using simple config file. • Tinker with track appearance to hearts content.

  46. Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901 . . . Deletion d101k2 Chr1 targeted deletion 8239811 8241116 . . . Deletion d680k2 Chr2 targeted deletion 5866382 5866500 . . . Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature= deletion glyph= span fgcolor= red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation= These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html)

  47. Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library Oracle adaptor BioPerl library Flat File adaptor Bio::DB::GFF adaptor Chado adaptor Oracle MySQL/Postgres Flat Files

  48. GenBank Proxy Adaptor Bio::DB::GFF adaptor GenBank MySQL GBrowse on GenBank? GBrowse on GenBank! Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library BioPerl library

  49. B. burgdorferi via GenBank proxy

  50. Who is Using GBrowse? • GMOD Members • WormBase, FlyBase, RatDB • HGMP-RC Fugu genomics group • KEGG (multiple microorganisms) • Ingenium AG (mouse) • Bristoll-Myers Squibb (drosophila) • Texas A&M University (salmonella) • McGill University (human chr7) • Institute of Systems Biology (human)

More Related