1 / 15

BIOINFORMATICS - PowerPoint PPT Presentation

  • Uploaded on

BIOINFORMATICS. Ayesha M. Khan Spring 2013. GENE PREDICTION/GENE FINDING. The vast amount of raw sequence data generated because of advancement in sequencing technology needs biological interpretation Known as ‘annotation’ To find genes and determine their functions. Protein coding genes

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'BIOINFORMATICS' - emmly

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Ayesha M. Khan

Spring 2013


Gene prediction gene finding

  • The vast amount of raw sequence data generated because of advancement in sequencing technology needs biological interpretation

  • Known as ‘annotation’

  • To find genes and determine their functions



  • Protein coding genes

    • Prokaryotic

      • No introns, simpler regulatory features

    • Eukaryotic

      • Exon-intron structure

      • Complex regulatory features



  • Coding sequence

    • Actual region of DNA that is translated to form proteins. While the ORF may contain introns as well, the CDS refers to those nucleotides that can be divided into codons which are actually translated into amino acids by the ribosomal translation machinery. In prokaryotes the ORF and the CDS are the same.


What is gene prediction
What is gene prediction?

  • Which region codes for a protein?

  • Which DNA strand is used to encode the gene?

  • Where does the gene start and end?

  • Where are the exon-intron boundaries in eukaryotes?

  • Where (optionally) are the regulatory sequences for that gene?

The characterization of genomic features using computational and experimental methods is called gene prediction or annotation.


Computational methods of gene prediction
Computational methods of gene prediction

Computational gene finding is a process of:

  • Identifying common phenomena in known genes

  • Building a computational framework/model that can accurately describe the common phenomena

  • Using the model to scan uncharacterized sequence to identify regions that match the model, which become putative genes

  • Test and validate the predictions


Biological overview of gene
Biological overview of ‘gene’

  • Gene: defined as a segment of DNA that contains the necessary information to produce a functional product, usually a protein.

    • DNA (or RNA in some viruses)

    • Promoter: controls the activity of a gene

    • Coding sequence: determines what the gene produces

Core promoter-minimal portion of the promoter required to initiate transcription properly

Proximal promoter-tends to contain primary regulatory elements ; serves as a binding site for specific transcription factors

ORF -Open reading frame

Starts with ATG (start codon) though not always

Terminates with TAA, TAG or TGA (stop codons)


Extrinsic homology method
Extrinsic/Homology Method

  • Based on sequence similarity of query sequence with annotated genes present in databases.

  • It is known that only approx. half of the genes can be found by homology to other known genes or proteins.

  • Based on the following principles:

    • Coding regions evolve slower than non-coding regions, i.e. local sequence similarity can be used as a gene finder

    • Homologous sequences reflect a common evolutionary origin and possibly a common gene structure.

    • Standard pair-wise comparison methods can be used (BLAST or Smith-Waterman)

    • Include gene syntax information (start/stop codons etc.)

    • Useful to confirm predictions inferred by other methods


Intrinsic ab initio method
Intrinsic/Ab initio Method

  • Predicts genes based on statistical properties of the given DNA sequence.

  • Statistical patterns inside and outside of the gene regions as well as typical patterns at their boundaries.


Features for gene prediction in eukaryotes
Features for gene prediction in eukaryotes

Signal sensors

Content sensors (extrinsic and intrinsic content sensors)

  • Signal sensors

    Evaluates fixed-length features in DNA

    Signals: splice sites, start/stop codon, branch points, promoters and terminators of transcription, polyadenylation sites, ribosomal-binding sites, topoisomerase II binding sites, various transcription factor-binding sites etc.

    • These are measures that try to detect the presence of the functional sites specific to a gene.

    • The basic signal sensor is a simple consensus sequence or an expression that describes a consensus sequence along with allowable variations.

    • Use of weight matrices


Features for gene prediction in eukaryotes contd
Features for gene prediction in eukaryotes (contd.)

  • Content sensors

    Evaluates variable length features which extend from one signal to another

    They classify a DNA region into different types, e.g. coding vs non-coding

    Extrinsic content sensor

    • These sensors perform similarity searching between a genomic sequences region and a protein or DNA sequence present in a database.

    • Basic tools needed for similarity searching, i.e. BLAST, FASTA etc.

    • Intragenomic and Intergenomic comparisons


Features for gene prediction in eukaryotes contd1
Features for gene prediction in eukaryotes (contd.)

Intrinsic content sensor

Based on statistical models of the nucleotide frequencies and dependencies present in codon structure

  • Use of MM

  • CpG islands (regions which often mark the beginning of genes where frequency of CG is not as low as it is in the rest of the genome)

  • Sensors for repetitive DNA (e.g. ALU sequences)


Gene prediction tools
Gene prediction tools

Software based on ab initio methods

GENSCAN, FGENESH, GeneMark.hmm, Glimmer,

Genie, GeneID

Software based on similarity-based methods

GeneWise, SYNCOD, ORFgene2, EbEST