annotator interface n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Annotator Interface PowerPoint Presentation
Download Presentation
Annotator Interface

Loading in 2 Seconds...

play fullscreen
1 / 29

Annotator Interface - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Annotator Interface. Sharon Diskin GUS 3.0 Workshop June 18-21, 2002. Outline. Current annotation efforts Motivation for new annotation tool Requirements for new annotation tool Thoughts on design and implementation Future plans. Current Annotation Efforts. Overview of Current Efforts.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Annotator Interface' - lea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
annotator interface

Annotator Interface

Sharon Diskin

GUS 3.0 Workshop

June 18-21, 2002

outline
Outline
  • Current annotation efforts
  • Motivation for new annotation tool
  • Requirements for new annotation tool
  • Thoughts on design and implementation
  • Future plans
overview of current efforts
Overview of Current Efforts
  • Automated annotation has been applied to the DoTS transcripts
    • Predicted gene ownership (clustering of assemblies)
    • BlastX against NR
      • Automated assignment of descriptions based on similarity
    • BlastX against ProDom and RPS-Blast against CDD
      • Predicted GO Functions
    • Framefinder
      • Predicted Protein Sequences
    • Blat alignments
    • EPCR, Index Words, etc…
  • Manual annotation efforts have focused on
    • validating the automated annotation and
    • adding additional information at the central dogma level
  • Manual annotation of the gene index utilizes an annotation tool, the GUS Annotator Interface, which directly updates the GUSdev database.
slide5

DoTS RNA transcripts

Incoming

Sequences (EST/mRNA)

  • GenBank, dbEST sequences
  • Make Quality (remove vector, polyA, NNNs)

The assembly of sequences generates

a consensus sequence or DoTS transcript

“Quality” sequences

  • Block with RepeatMasker

Blocked sequences

  • Blastn to cluster sequences

“Unassembled” clusters

  • Assemble sequences with CAP4

CAP4 assemblies

(generate consensus sequences)

BLASTn DoTs

consensus sequences

(98% identity, 150bps)

Gene Cluster

(RNA s in the Gene)

Dots Consensus

sequences

current efforts gene annotation 1

Assembly_1

Instance_1

Feature_1

RNA_1

Assembly_2

Instance_2

Feature_2

RNA_2

Assembly_3

Instance_3

Feature_3

RNA_3

Assembly_4

Instance_4

Feature_4

RNA_4

Assembly_5

Instance_5

Feature_5

RNA_5

Current Efforts: Gene Annotation (1)

Gene

RNA

RNAInstance

RNAFeature

Assembly

Generate

DoTS

transcripts

Gene_A

Task 1: Validation of Gene Membership

current efforts gene annotation 2

Assembly_1

Instance_1

Feature_1

Assembly_2

Instance_2

Feature_2

Assembly_3

Instance_3

Feature_3

Assembly_4

Instance_4

Feature_4

Assembly_5

Instance_5

Feature_5

Current Efforts: Gene Annotation (2)

Gene

RNA

RNAInstance

RNAFeature

Assembly

Generate

DoTS

transcripts

RNA_1

Gene_A

RNA_2

RNA_3

Gene_B

RNA_4

RNA_5

  • - Removing RNAs from the cluster results in the creation of a new Gene
  • An entry is made in the MergeSplit table for tracking purposes
  • Similar process followed when an RNA is added to a Gene
current efforts gene annotation 3
Current Efforts: Gene Annotation (3)

Task 2: Assign Reference RNA

    • will be annotated further
    • RNA table
  • Task 3: Assign Approved Gene Name/Symbol
    • Gene Table
    • Evidence: Comment (specifies database link)
  • Task 4: Assign Gene Description
    • Gene Table
    • Evidence: Comment
  • Task 4: Associate known Gene synonyms
    • GeneSynonym table
    • Evidence: Comment
current efforts rna annotation
Current Efforts: RNA Annotation

Annotation of “Reference Sequence”

  • Task 1: Assign/Confirm Description of assembly
    • RNA table
  • Task 2: Confirm/Add/Delete GO Functions
    • ProteinGOFunction (in GUSdev, GO tables have been re-designed in GUS3.0)
    • Evidence: Comments or Similarity (ProDom, CDD-Pfam, CDD-Smart, or NR)
slide10

Current Annotator Interface Architecture

Erebus

Zeus

Annotator Interface

JDBC (Query Only)

GUSdev

JavaServlet

writes

executes

“XML” file

Perl

Object

Layer

DBI(Insert/Update/Delete)

reads

AnnotatorInterface

Submitter

GA-Plugin

current gene annotation
Current Gene Annotation

Validate Cluster and Assign Reference RNA/Assembly

current gene annotation cont
Current Gene Annotation (cont.)

Assign Gene Name/Symbol

Assign Gene Description

Assign Gene Synonym(s)

Evidence

current rna and protein annotation
Current RNA (and Protein) Annotation

RNA Description

Evidence

GOFunctions

allgenes display of rna annotation
Allgenes Display of RNA Annotation

RNA Description

(Confirmed or manually added GO Functions)

status of current annotation as of june 20 2002
Status of Current Annotation(as of June 20, 2002)
  • 1289 manually reviewed genes
    • 1003 with gene name
    • 697 with gene synonyms
    • 1046 with description
  • 6146 manually reviewed RNAs/DoTS assemblies
  • 949 ‘proteins’ with reviewed GO function
motivation for new tool
Motivation for new tool

Want to annotate using genomic sequence

  • Create “curated” gene models specifying structure
  • Increase structure of annotation in GUS
  • Annotation of proteins
  • Redefinition of annotation tasks
  • Current interface not designed for this purpose
some other annotation tools
Some Other Annotation Tools
  • Artemis
    • Developed and used at Sanger
    • Reads and writes flat files
    • Supports rich set of annotations
      • Save as EMBL format
  • Apollo
    • Combined effort including members from Sanger and Berkeley
    • Flat files (CORBA access to ENSEMBL)
    • 2 versions, currently being merged
      • Sanger: annotation viewer
      • Berkeley: focus on editing

No Existing Tool To Meet All of Our Needs

requirements graphical view
Requirements: Graphical View
  • Provide alignment of features on genomic sequence
    • could potentially display any feature type currently stored in GUS3.0
    • features can be selected and used to generate “curated” features
    • similar to display and functionality in Apollo
  • Toggle (or configure) the display of each feature type
  • Zoom to sequence level and will include links to functionality relevant to the feature highlighted
  • Also support creation of features “from scratch”
    • based on literature, etc.
  • Detail editors provide ability to change endpoints, etc.
gene annotation
Gene Annotation
  • Create curated gene model
    • specify gene boundaries
    • specify location of exons (and thus introns)
      • 5' exon boundary (putative transcription start site)
      • 3' exon boundary (include poly adenylation signal)
    • automatic creation of Gene entry
    • merge with existing gene instances through GeneInstance table
    • tables/views affected:
      • GeneFeature
      • ExonFeature
      • GeneInstance
      • Gene
      • MergeSplit
    • evidence: features used to create model, PubMed ID
    • should be as easy as clicking on existing features and saying make curated (then can modify endpoints, etc. if needed)
gene annotation 2
Gene Annotation (2)
  • Assign (HUGO or MGI approved) abbrievated gene name/symbol
    • Gene Table
    • Evidence: ExternalDatabaseLink
  • Assign full gene name (MGI or HUGO full gene name)
    • Gene Table
    • Evidence: ExternalDatabaseLink
  • Assign abbrievated gene name/symbol synonyms (non-approved gene symbols)
    • GeneSynonym Table
    • Evidence: ExternalDatabaseLink
  • Assign full gene name aliases
    • GeneAlias Table
    • Evidence: ExternalDatabaseLink
gene annotation 3
Gene Annotation (3)
  • Assign gene category (e.g. non-coding)
    • Gene Table
    • Evidence:
      • ExternalDatabaseLink/Literature Reference
      • Similarity (eg. to known non-coding RNA)
  • Confirm/assign gene chromosomal location
    • GeneChromosomalLocation
    • Evidence:
      • ExternalDatabaseLink/Literature Reference
      • RH mapping data
      • Alignments/Features
  • OMIM Link assignment (verification if computationally determined)
    • ExternalDatabaseLink
rna annotation 1
RNA Annotation (1)
  • Create “curated RNAs”
    • Define RNA transcript forms of gene (create RNAs)
    • Using exons defined by curated gene
    • 5' and 3' UTRs
    • Automatic creation of RNA entry
    • Merge existing RNA instances
    • Tables affected:
      • RNAFeature
      • UTRFeature
      • RNAInstance
      • RNA
    • Evidence: Features used to create
  • Assign RNA categories to created RNAs (e.g. alternative form)
    • RNARNACategory Table
rna annotation
RNA Annotation
  • Assign (or confirm computed) RNA description
    • RNA table
    • Evidence: Gene from which it is derived
  • Anatomy expression assignment(s)
    • RNAAnatomy
    • RNAAnatomyLOE
    • Evidence:
      • ExternalDatabaseLink/Literature references
      • Assembly anatomy percent from DoTS
      • RAD experiments
  • Assign GO terms to curated RNA (non-coding RNAs, e.g. small RNA involved in splicing)
    • GOTermAssociation
    • GOTermAssociationEvid
    • Evidence: ExternalDatabaseLInk, Literature References
  • Computational analysis performed on curated RNA sequences
    • Annotation workflow
      • Framefinder translation, GO terms, Similarities, etc.
requirements protein annotation
Requirements: Protein Annotation
  • Confirm/assign GO Function
    • GOTermAssociation, GOTermAssociationEvid
    • Evidence: ExternalDatabaseLink and/or Literature References
  • Confirm/assign GO Biological Process
    • GOTermAssociation, GOTermAssociationEvid
    • Evidence: ExternalDatabaseLink and/or Literature References
  • Confirm/assign GO Cellular Component
    • GOTermAssociation, GOTermAssociationEvid
    • Evidence: ExternalDatabaseLink and/or Literature References
  • Assign protein name
    • Protein Table
    • Evidence: ExternalDatabaseLink, Literature Ref, Similarities
  • Assign protein name synonyms
    • Protein Table
    • Evidence: ExternalDatabaseLink, Literature Ref, Similarities
protein annotation 2
Protein Annotation (2)
  • Assign protein category (post-translational modifications)
    • ProteinProteinCategory
    • Evidence: ExternalDatabaseLink, Literature References
  • Protein-protein interactions assigned
    • Interaction
    • InteractionInteractionLOE
    • Evidence: PubMed ID, etc.
  • Protein pathway assignments
    • PathwayInteraction (for newly created interactions)
    • Still under consideration: What is best way to link with existing pathway
      • for example, Pathway is represented in DoTS, and we want to say that this curated Protein is really the same as a protein in a pathway.
  • Assign post translational modification category
  • Assign interactions involving this protein
  • Assign pathway protein is known to be involved in
  • Assign protein family
  • Ability to modify and/or delete curated protein

Evidence will be associated with all annotation

next steps open issues
Next Steps/ Open Issues
  • Completion of Java Object Layer
  • Decision regarding BioJava wrappers
    • What exactly will this give us to aid in interface development (eg. FeatureRenderer, etc…)
  • Discussion on layout of interface
    • Joan’s input after experimentation with other tools
  • Depending on the above :
    • Client Side portion which communicates with remote GUS Server
    • Interface Implementation