1 / 18

GUS Overview

GUS Overview. June 18, 2002. GUS-3.0. Genomics Unified Schema. Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses an underlying relational database management system (Oracle).

jamuna
Download Presentation

GUS Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GUS Overview June 18, 2002

  2. GUS-3.0 Genomics Unified Schema • Supports application and data integration • Uses an extensible architecture. • Is object-oriented even though it uses an underlying relational database management system (Oracle). • Warehouse instead of federation for local stable copy • Uses standards for bulk data exchange (e.g., MAGE)

  3. GUS Usage • Annotation • of genomes - gene models, sequence features • of genes - gene function, gene expression, gene regulation • Data mining • Develop algorithms and queryable resource • Publish • Map identifiers with other resources/ databases • URL for entry retrieval/ ad hoc queries in web interface

  4. GUS-3.0 Name Spaces GUS has 5 name spaces compartmentalizing different types of information.

  5. Application Integration: PlasmoDB PublicDatabases TIGRSangerStanford PlasmodiumInvestigators Existing implementation Future implementation QTL,POP, SNP, Clinical GenBank, InterPro, GO, etc GenomicSequence microArray& SAGEExperiments GSSs &ESTs MappingData Annotation Object Layer Oracle/SQL DoTS TESS RAD Core SRes AutomatedAnalysis &Integration Annotator’s Interface Java Servlets &Perl CGI GenePlotCD WWW queries,browsing, & download GenePlotSoftware

  6. DoTS RAD TESS SRES Core GUS Supports Multiple Projects AllGenes PlasmoDB EPConDB Java Servlets Oracle RDBMS Other sites, Other projects Object Layer for Data Loading

  7. Main Aspects of GUS Development • Choice of development tools • Schema: • CREATE TABLE statements • Documentation plug-in: input is tab- delimited text • UML - Rational Rose, PowerDesigner • Code: CVS • Areas to emphasize • Plug-ins • Work flow • TESS • Proteomics • Images • Preferred type of user interface • JSP • PHP

  8. Data Integration DoTS • GO • Species • Tissue • Dev. Stage • Genes, gene models • STSs, repeats, etc • Cross-species analysis Genomic Sequence Ontologies • Characterize transcripts • RH mapping • Library analysis • Cross-species analysis • DOTS Transcribed Sequence SRes RAD TESS • Arrays • SAGE • Conditions • Binding Sites • Patterns • Grammars • Domains • Function • Structure • Cross-species analysis Gene Regulation Transcript Expression Protein Sequence Core • Ownership • Protection • Algorithms • Similarity • Versioning • Workflow Data Provenance Transcription factors up-regulated in acute myeloid leukemia with sequence similarity to c-fos and common promoter motifs

  9. Identify shared TF binding sites Genomic alignment and comparative Sequence analysis TESS RAD GUS EST clustering and assembly

  10. GUS Approach to Schema • Think objects • Parents and children • Subclassing with views • Views • Start with generic Imp table (e.g., NAFeatureImp) that contains base attributes plus generic attributes of various datatypes • Superclass view (e.g., NAFeature) just has base attributes • Subclass views (e.g., RNAFeature) have additional attributes using generic attributes • Strongly-typed • Tend to avoid “name-value” pairs

  11. DoTS Central Dogma Gene Genomic Sequence Gene Instance Gene Feature NA Feature NA Sequence RNA RNA Sequence RNA Instance RNA Feature Protein Protein Sequence Protein Instance Protein Feature AA Sequence AA Feature

  12. DoTS Schema Has Been Driven By Building Gene Indices Genomic Sequence mRNA/EST Sequence Clustering and Assembly Gene predictions GenScan/ HMMer, PHAT SIM4 or BLAT Predicted Genes DoTS consensus Sequences Merge Genes Gene/RNA cluster assignment Annotate DoTS Manual Annotation Tasks Gene Index framefinder RNAs Proteins translation BLASTX PFAM, Smart, ProDom BLASTP Other computed annotation (EPCR, AssemblyAnatomyPercent, Index Key Words, SNP analysis) BLAST Similarities Functional predictions Protein Motifs GO Functions

  13. DoTS Gene Indices Are Based on Clustering and Assembling ESTs

  14. RAD 3.0 Schema Incorporates MAGE and Experience With Microarrays LIMS for Data Analysis. Also holds SAGE.

  15. Status of GUS Namespaces • Core • Tables exist, Workflow documented • Sres • Tables exist • DoTS • Tables exist, some documentation • RAD • Version 3.0 to include MAGE, experience • Pretty much complete • Tables exist, mostly documented • TESS • Tables ready but not created

  16. Schema Development • Releases on Sourceforge: • CREATE TABLE statements • Table dumps from Core::TableInfo, Core::DatabaseDocumentation • Gifs of ER diagrams • Adding tables between releases • In CVS tree? • Use message forum for discussion

  17. Documentation • Schema Browser looks at TableInfo • Plug-in • Populates DatabaseDocumentation • Input: Table\t\tDescription of table Table\tAttribute\tDescription of attribute

  18. GUS Schema Browser • http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30 • Points at GUS30 on CBIL development database server (erebus). • Need to move? Maintain release view? • DoTS Tables: • Central dogma • Evidence/ Similarity • ProjectLink • SequenceGroupImp/ SequenceGroupExperimentImp • Plasmomap? • Other tables of interest?

More Related