slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Genome analysis and annotation Part II PowerPoint Presentation
Download Presentation
Genome analysis and annotation Part II

Loading in 2 Seconds...

play fullscreen
1 / 15

Genome analysis and annotation Part II - PowerPoint PPT Presentation

  • Uploaded on

Genome analysis and annotation Part II. Modeling a gene. S.mansoni PASA assemblies. S. japonicum EST alignments. Genewise alignments(predictions). nr Protein Alignments. Caenorhabditis sp. Protein Alignments. Brugia malayi Protein Alignments. Evidence View.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Genome analysis and annotation Part II

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evidence view

Modeling a gene

S.mansoni PASA assemblies

S. japonicum EST alignments

Genewise alignments(predictions)

nr Protein Alignments

Caenorhabditis sp. Protein Alignments

Brugia malayi Protein Alignments

Evidence View

Attributes of individual annotated genes

Sequence Database Hits

Top: Protein matches

Bottom: EST matches

Not shown graphically: gene name, nucleotide and protein sequence, MW, pI, organellar targeting sequence, membrane spanning regions, other domains.



Annotated Gene

Top: editing panel

Bottom: final curation

Splice site predictions:

red: acceptor sites

blue: donor sites

Screenshot of a component within Neomorphic’s annotation station:




H. influenzae

H. influenzae

H. influenzae

H. influenzae

M. genitalium

M. genitalium

Assigning function to predicted gene products

The primary tool for assigning function is homology to well characterized proteins

…however transitive annotation can lead to errors that propagate.

the modular nature of proteins can provide the basis for functional annotation
The modular nature of proteins can provide the basis for functional annotation
  • Proteins may share features that give clues to their structure and/or function
  • A domain is a region of a protein that can adopt a particular three-dimensional structure. Together a group of proteins that share a domain is called a family. There are several databases of protein families such as Pfam (
  • Motifs are short, conserved regions of proteins, typically consisting of a pattern of amino acids that characterizes a prrotein family (


  • HMM domains can also be defined and used to group proteins into families

Domain based Paralogous Families can be genrated

Domain Content of Entire Proteome can be computed

All the proteins from a genome

HMM search against Pfam profiles

Alignment search against homology-based domain alignments

The search results are stored in the database in the form of domain-based alignments

Organize the proteins into domain-based paralogous families

  • Related families share one or more domains with other families
  • Many putative novel domains are extensions of existing domains
hidden markov models hmms
Hidden Markov Models (HMMs)

Statistical representations of sequence patterns.

A query sequence is scored by how likely is it that the HMM would produce it.








procedure for preparing a hmm seed
Procedure for Preparing a HMM Seed
  • Inspect and edit a pairwise aligned group of gene products:

- Eliminate fragments

- Correct the alignment

- Remove sequence outside domain

- Eliminate redundancy

- BLAST, annotate and possibly expand the seed.


Homology-Based Alignment:

HMM Seed:

Trusted Hits:

what is gene ontology go
What is Gene Ontology (GO)?

The Gene Ontology is a set of dynamic controlled vocabularies used to describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner (

The Three Ontologies

Molecular function, biological process and cellular component are considered attributes of gene products.

  • Biological Process (a)
    • A biological objective
    • has more than one distinct step
  • Molecular Function (b)
    • what the gene product does
    • Think ‘activity’
  • Cellular Component (c)
    • location in the cell (or smaller unit)
    • or part of a complex
assigning go ids
Assigning GO IDs

Each GO ID is qualified with an evidence code.

Evidence codes are:

IMP – inferred from mutant phenotype

IGI—inferred from genetic interactionIPI—inferred from physical interaction

IDA—inferred from direct assay

IEP—inferred from expression pattern

ISS—inferred from structural similarity

IEA—inferred from electronic annotation

IC—inferred by curator

TAS—traceable author statement

NAS—non-traceable author statement

ND—no biological data available

NR—no longer used

  • Experimental evidence
  • Sequence similarity
  • Calculated by algorithm
  • Author statement

The “with/to” field

ISS, IPI, IGI require the accession of the similarity hit, the interacting entity

gene ontologies can help interpret large scale datasets
Gene ontologies can help interpret large scale datasets

K-means clustering using TIGR Multi-Experiment Viewer (TMEV)


Cluster 4

Cluster 10

Translation, transcription