1 / 29

generic model/many/my organism database

generic model/many/my organism database. GMOD. Oct/Nov 2007. Don Gilbert. Genome Informatics Lab, Biology Dept., Indiana University gilbertd@indiana.edu. Indiana GMOD Potpourri. Recent Updates for GMOD-CSHL-0711 Genome Grid GMODTools update Gene Summary Pages in XML. Genome Grid.

joanv
Download Presentation

generic model/many/my organism database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. genericmodel/many/my organismdatabase GMOD Oct/Nov 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University gilbertd@indiana.edu

  2. Indiana GMOD Potpourri Recent Updates for GMOD-CSHL-0711 • Genome Grid • GMODTools update • Gene Summary Pages in XML http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  3. Genome Grid • Middleware to easily use TeraGrid (& other Grid) for genome analyses • Give me your genomes to Gridalyze • Collaborators wanted ! • Apply BioMart, Ergatis, LuceGene, Galaxy • Science gateway to use TeraGrid for genome analyses • Blast: proteome x non-redudant; organisms x genome • gene finders, interproscan, others gmod.org/Genome_grid http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  4. GMODTools update • Update: config for new genome chado dbs (sea urchin, paramecium) • loaded via GMOD gff2chado • New: GO gene-association output • Please publish your Chado DB • gmod.org/Public_Chado_Databases • each project chado has variations • Cleans database contents for public use • Todo: add gene page xml, others? gmod.org/GMODTools http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  5. Gene Summary Pages • Simple, readable XML summarizes gene info. • In use at Daphnia (wFleaBase.org) base • wfleabase.org/lucegene/lookup?id=NCBI_GNO_149114 • Created from Chado DB or overloaded GFF • Software is simple Perl lib, XML DTD • eugenes.org/gmod/gene-report-examples/ http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  6. <GeneSummary id="wFleaBase:NCBI_GNO_200214"> <Type>Gene Summary</Type> <BASIC_INFORMATION> <Date>2007-Sep-02</Date> <GeneID>NCBI_GNO_200214</GeneID> <Species>Daphnia pulex</Species> </BASIC_INFORMATION> <GENE_ONTOLOGY> <terms> <goterm id="GO:0016021">C:integral to membrane</goterm> <goterm id="GO:0001584">F:rhodopsin-like receptor activity</goterm> <goterm id="GO:0007186">P:G-protein coupled receptor protein signalin...</goterm> <goterm id="GO:0007602">P:phototransduction</goterm> </terms> </GENE_ONTOLOGY> <SIMILAR_GENES> <Similarity> <Description>Rh3-PA</Description> <Species>Drosophila virilis</Species> <db_xref>UniProt:Q8I138</db_xref> </Similarity> </SIMILAR_GENES> <FUNCTION> <Expression type="biotic">Bacterial infection</Expression> <Protein_domains> <db_xref>Pfam:PF00001 7tm_1</db_xref> </Protein_domains> </FUNCTION> <REAGENTS> <Reagent type="EST"> <db_xref>WFes0143594</db_xref> </Reagent> </REAGENTS> </GeneSummary> Gene Page XML http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  7. .. on to Introduction to GMOD .. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  8. GMOD Introduction • Generic Model Organism Database • Built by and for many contributing projects • Loosely coupled tool kit • Work as separate parts and together • Complex and simple • No more complex than necessary; complexity is part of this territory. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  9. Your project needs? • New Genome? • Draft assembly in parts; many computed annotations; little literature; • Known Genome? • Large literature base; rich and complex biology knowledge; • Lab integration? • Support and integrate with focused lab research project http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  10. Getting Started w/ GMOD • gmod.org/Getting Started • Documentation is now rich and improving • Installation options: • distribution tar-ball • Virtual Machine-Ware for demo • YUM Unix packages http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  11. GMOD Components • Chado – database schema and middleware • GBrowse – Web-based genome annotation viewing • Apollo – Desktop-based genome annotation editing • CMap – Web-based comparative map viewing • BioMart – Genome data mining from Ensembl/GMOD http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  12. Chado Database How-To • Chado - Getting Started • gmod.org/Chado_Manual modules, conventions, design principles • Worked examples @ gmod.org Load_RefSeq_Into_Chado Load_BLAST_Into_Chado Sample_Chado_SQL http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  13. Chado Design • Modularity: inherent Chado schema, core module, biology groupings, with common structure. • Ontologies: standard biology vocabularies a core of Chado design. • Associatedsoftware: Perl and Java middleware, stand-alone programs with Chado adaptors. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  14. Chado Design [2] • Complexity and Detail: inherent in genome data, Chado embraces with room to grow, plus long-term stability. • Data Integration: key component of Chado, public and lab data sets can be combined. • Support: shared responsibility among the GMOD community. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  15. Chado Schema: Core • CV: Controlled vocabularies and ontologies • Sequence: Biological sequences and objects which can be localized on them • Companalysis: Adjunct to sequence module for in-silico analysis • Map: Adjunct to sequence module for non-sequence localization • Organism: Taxonomy / species information • Pub: Publication / Biblio. / Reference information • General: General information / database cross-references http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  16. Chado Schema: More • Expression: Transcript and protein expression events • Mage: for microarray data • Genetics: Genetic/phenotypic interactions in genotypic/environmental context • Phenotype: for phenotypic data • Library: for descriptions of molecular libraries • Phylogeny: for organisms and phylogenetic trees • Stock: for specimens and biological collections • Contact: for people, groups, and organizations http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  17. Chado Middleware • GFF to Chado data loader, with BioPerl extensions (GenBank2GFF -> Chado , …) • GMODTools - Output Bulk genome data • XORT - Chado XML input and output • Modware - OO-Perl Chado access package (in/out) • Java middleware (Hibernate; others) http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  18. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  19. GMOD Components [2] • Sybil – Web-based synteny viewing at gene & chromosome level • Turnkey – “Skinable” Chado-based web site • Pathway Tools – metabolic pathways • PubFetch – Literature management • Textpresso – Automatic paper classification • LuceGene - Genome object/text/web search system http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  20. GMOD Components [3] • Wikipedia Community Annotation (in development; EcoliWiki ++) • Comparative visualization - SynBrowse & SynView • Genome grid - Teragrid methods for genome computations (in dev.) http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  21. WikiGenomes (ecoliwiki.net) http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  22. GMOD Components [4] Database Frameworks: • VMWare: virtual machine package with basic GMOD components for demo • YUM distribution package • ARGOS : replication framework for genome databases http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  23. Putting GMOD together • Core: PostgreSQL database; Chado Schema; Sequence & OBO Ontologies • System: Apache web server; Unix; BioPerl; … • Load data: GFF to Chado • View: Gbrowse (Chado; MySql; ..) • Edit/Update: Apollo, Wiki (coming), bulk-file updates • Output: BulkFiles; BioMart; http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  24. Example new MOD http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  25. Recap:Your project needs? • New Genome? Known? Lab integration? • Assess your customer needs • Full database/toolset is overkill for some • Loosely coupled tools; complex and simple • Pick the parts you need • Learn tools with examples first http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  26. Chado-centric Genome • Genome Annotations • Proteome annotations, EST/cDNA, gene predictions, RNA, transposon, promotor, etc. • Database cross-refs: UniProt, Gene Ontology, KEGG, KOG, etc. • Web-Database • Gbrowse maps, Blast server with Chado output, Gene detail reports, BioMart data mining; Wikipedia community editing http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  27. Contributing to GMOD • Current components • Need adopters to share effort • Re-use rather than re-invent • Describe : GMOD.org Wiki needs more examples • New components • Discuss with other projects: common need? • Shared specifications, use cases • GMOD recommended practices http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  28. Active GMOD Mailing Lists • https://lists.sourceforge.net/lists/listinfo/ • gmod-announce • gmod-schema All Chado schema issues • gmod-gbrowse GBrowse mailing list • gmod-devel General development • Related: Ontologies (SO, OBO); BioPerl; Apollo; Biomart; http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

  29. http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

More Related