Gus overview
Download
1 / 18

GUS Overview - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

GUS Overview. June 18, 2002. GUS-3.0. Genomics Unified Schema. Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses an underlying relational database management system (Oracle).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' GUS Overview' - jamuna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Gus overview

GUS Overview

June 18, 2002


Gus 3 0
GUS-3.0

Genomics Unified Schema

  • Supports application and data integration

  • Uses an extensible architecture.

  • Is object-oriented even though it uses an underlying relational database management system (Oracle).

  • Warehouse instead of federation for local stable copy

  • Uses standards for bulk data exchange (e.g., MAGE)


Gus usage
GUS Usage

  • Annotation

    • of genomes - gene models, sequence features

    • of genes - gene function, gene expression, gene regulation

  • Data mining

    • Develop algorithms and queryable resource

  • Publish

    • Map identifiers with other resources/ databases

    • URL for entry retrieval/ ad hoc queries in web interface


Gus 3 0 name spaces
GUS-3.0 Name Spaces

GUS has 5 name spaces compartmentalizing different types of information.


Application integration plasmodb
Application Integration: PlasmoDB

PublicDatabases

TIGRSangerStanford

PlasmodiumInvestigators

Existing implementation

Future implementation

QTL,POP,

SNP, Clinical

GenBank, InterPro, GO, etc

GenomicSequence

microArray& SAGEExperiments

GSSs &ESTs

MappingData

Annotation

Object Layer

Oracle/SQL

DoTS

TESS

RAD

Core

SRes

AutomatedAnalysis &Integration

Annotator’s Interface

Java Servlets &Perl CGI

GenePlotCD

WWW queries,browsing, & download

GenePlotSoftware


Gus supports multiple projects

DoTS

RAD

TESS

SRES

Core

GUS Supports Multiple Projects

AllGenes

PlasmoDB

EPConDB

Java Servlets

Oracle RDBMS

Other sites,

Other projects

Object Layer for Data Loading


Main aspects of gus development
Main Aspects of GUS Development

  • Choice of development tools

    • Schema:

      • CREATE TABLE statements

      • Documentation plug-in: input is tab- delimited text

      • UML - Rational Rose, PowerDesigner

    • Code: CVS

  • Areas to emphasize

    • Plug-ins

    • Work flow

    • TESS

    • Proteomics

    • Images

  • Preferred type of user interface

    • JSP

    • PHP


Data integration
Data Integration

DoTS

  • GO

  • Species

  • Tissue

  • Dev. Stage

  • Genes, gene models

  • STSs, repeats, etc

  • Cross-species analysis

Genomic

Sequence

Ontologies

  • Characterize transcripts

  • RH mapping

  • Library analysis

  • Cross-species analysis

  • DOTS

Transcribed

Sequence

SRes

RAD

TESS

  • Arrays

  • SAGE

  • Conditions

  • Binding Sites

  • Patterns

  • Grammars

  • Domains

  • Function

  • Structure

  • Cross-species analysis

Gene Regulation

Transcript

Expression

Protein

Sequence

Core

  • Ownership

  • Protection

  • Algorithms

  • Similarity

  • Versioning

  • Workflow

Data Provenance

Transcription factors

up-regulated in

acute myeloid leukemia

with sequence similarity to c-fos

and common promoter motifs


Identify shared

TF binding sites

Genomic alignment

and comparative

Sequence analysis

TESS

RAD

GUS

EST clustering

and assembly


Gus approach to schema
GUS Approach to Schema

  • Think objects

    • Parents and children

    • Subclassing with views

  • Views

    • Start with generic Imp table (e.g., NAFeatureImp) that contains base attributes plus generic attributes of various datatypes

    • Superclass view (e.g., NAFeature) just has base attributes

    • Subclass views (e.g., RNAFeature) have additional attributes using generic attributes

  • Strongly-typed

    • Tend to avoid “name-value” pairs


Dots central dogma
DoTS Central Dogma

Gene

Genomic

Sequence

Gene

Instance

Gene

Feature

NA

Feature

NA

Sequence

RNA

RNA

Sequence

RNA

Instance

RNA

Feature

Protein

Protein

Sequence

Protein

Instance

Protein

Feature

AA

Sequence

AA

Feature


DoTS Schema Has Been Driven By Building Gene Indices

Genomic

Sequence

mRNA/EST

Sequence

Clustering and

Assembly

Gene predictions

GenScan/ HMMer, PHAT

SIM4 or BLAT

Predicted

Genes

DoTS consensus

Sequences

Merge Genes

Gene/RNA cluster

assignment

Annotate DoTS

Manual Annotation

Tasks

Gene

Index

framefinder

RNAs

Proteins

translation

BLASTX

PFAM, Smart, ProDom

BLASTP

Other computed annotation

(EPCR,

AssemblyAnatomyPercent,

Index Key Words,

SNP analysis)

BLAST Similarities

Functional predictions

Protein

Motifs

GO Functions



Rad 3 0 schema incorporates mage and experience with microarrays
RAD 3.0 Schema Incorporates MAGE and Experience With Microarrays

LIMS for Data Analysis. Also holds SAGE.


Status of gus namespaces
Status of GUS Namespaces Microarrays

  • Core

    • Tables exist, Workflow documented

  • Sres

    • Tables exist

  • DoTS

    • Tables exist, some documentation

  • RAD

    • Version 3.0 to include MAGE, experience

      • Pretty much complete

    • Tables exist, mostly documented

  • TESS

    • Tables ready but not created


Schema development
Schema Development Microarrays

  • Releases on Sourceforge:

    • CREATE TABLE statements

    • Table dumps from Core::TableInfo, Core::DatabaseDocumentation

    • Gifs of ER diagrams

  • Adding tables between releases

    • In CVS tree?

    • Use message forum for discussion


Documentation
Documentation Microarrays

  • Schema Browser looks at TableInfo

  • Plug-in

    • Populates DatabaseDocumentation

    • Input:

      Table\t\tDescription of table

      Table\tAttribute\tDescription of attribute


Gus schema browser
GUS Schema Browser Microarrays

  • http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30

  • Points at GUS30 on CBIL development database server (erebus).

    • Need to move? Maintain release view?

  • DoTS Tables:

    • Central dogma

    • Evidence/ Similarity

    • ProjectLink

    • SequenceGroupImp/ SequenceGroupExperimentImp

    • Plasmomap?

  • Other tables of interest?


ad