An Introduction to Bioinformatics
Download
1 / 37

An Introduction to Bioinformatics - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

An Introduction to Bioinformatics. Finding genes in prokaryotes. AIMS. To establish the concept of ORFs and their relationship to genes. To describe the features used by software to find ORFs/genes. To become familiar with Web-based programmes used to find ORFs/genes. OBJECTIVES.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Introduction to Bioinformatics' - shanta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

An Introduction to Bioinformatics

Finding genes in prokaryotes


AIMS

To establish the concept of ORFs and their relationship to genes

To describe the features used by software to find ORFs/genes

To become familiar with Web-based programmes used to find

ORFs/genes

OBJECTIVES

To be able to distinguish between the concepts of ORF and gene

Use ORF Finder to find ORFs in prokaryotic nucleotide sequences


Usually the primary challenge that follows the sequencing of

anything from a small segment of DNA to a complete genome

is to establish where the location functional elements such as:

genes (intron/exon boundaries)

promoters,

terminators etc

DNA sequences that may potentially encode proteins are called

Open Reading Frames (ORFs)

The situation in prokaryotes is relatively straightforward since

scarcely any eubacterial and archaeal genes contain introns


FINDING ORFs

The simplest method in prokaryotes is to scan the DNA for

start and stop codons

The DNA is double stranded and each strand has three

potential reading frames (codons are groups of 3 bases)

THE CAT ATE THE RAT Frame 1

T HEC ATA TET HER AT Frame 2

TH ECA TAT ETH ERA T Frame 3

The scan must look at all 6 reading frames


Any region of DNA between a start codon and a stop codon in

the same reading frame could potentially code for a polypeptide

and is therefore an ORF

Start AUG (methionine)

Stop UAA UAG UGA

small potential coding sequences like this will occur frequently

by chance, and therefore the longer they are the more likely

they are to represent real coding regions, genes

Problems

Small genes may be missed

The actual start codon may be internal to the ORF

There may be overlapping genes


The simplest tool for finding ORFs is ORF Finder at NCBI

It simply scans all 6 reading frames and shows the position of

the ORFs which are greater than a user defined minimum size

The genetic code used for the analysis can be altered by the

user

This would be important if e.g. mitochondrial or ciliate nuclear

DNA were being analysed


To overcome the limitations of ORF finder, more sophisticated

programmes detect compositional biases and increase the

reliability of gene detection

These compositional biases are regular, though very diffuse,

And arise for a variety of reasons:

many organisms there is a detectable preference for G or C

over A and T in the third ("wobble") position in a codon

all organisms do not utilize synonymous codons with the same

frequency - consequently there is a codon bias

there is an unequal usage of amino acids in proteins sufficient to

cause a bias in all three positions of codons and increase the

overall codon bias


the %GC content of the first two codon positions of the

universal genetic code is approximately 50%, therefore,

organisms which have a low or high %GC content will exhibit

a marked bias at the third position of codons to achieve their

overall %GC content

The most recent approaches to using compositional features

to distinguish coding from non-coding regions employ ‘Markov

models’

such approaches include the popular GENEMARK and

GLIMMER programs


An Introduction to Bioinformatics

Finding Genes in Eukaryotes


AIMS

To establish the concept of ORFs and their relationship to genes

To describe the features used by software to find ORFs/genes

To become familiar with Web-based programmes used to find

ORFs/genes

To describe the complications of the eukaryote “signals”

To be aware of the Web-based programmes

OBJECTIVES

To be able to distinguish between the concepts of ORF and gene

Use ORF Finder to find ORFs in prokaryotic nucleotide sequences

To be able to use the eukaryote programmes for a number of organisms


Organisms whose cells have a membrane-bound nucleus and many specialised structures located within their cell boundary.

In these organisms, genetic material is organized into chromosomes that reside in the nucleus.


Principles
Principles specialised structures located within their cell boundary.

  • Content - codon usage

    • often species or class specific

  • Signals - PWMs

    • principle is the same, signals are different

  • Complication of introns/exons


  • Eukaryotic promoter
    Eukaryotic promoter specialised structures located within their cell boundary.

    -110 -40 -25 +1

    mRNA

    5’

    3’

    CAAT box

    GC box

    TATA box

    In addition - transcription factor binding sites

    Genes can be enormous!

    Controlled by “distant” enhancers


    Signals on the mRNA specialised structures located within their cell boundary.

    Polyadenylation sequence

    AAUAA

    ~ 12bp polyA

    AUG

    STOP

    AAAAA…...

    Kozak sequence

    At translational start


    Introns and exons
    Introns and Exons specialised structures located within their cell boundary.

    Chicken 12 collagen gene

    has - 38 kb > 50 Introns

    Muscular Dystrophy gene is 2.5 Mb and has

    ? Exons!


    Splicing signals
    Splicing signals specialised structures located within their cell boundary.

    3’Exon

    5’Exon

    (

    )

    C A T C

    A G C T

    AGGT AGT N AGG

    >11

    GT-AG rule


    Exon finding
    Exon finding specialised structures located within their cell boundary.

    • Initial exons, from the initiation codon to the first splice site;

    • Internal exons from splice site to splice site;

    • Terminal exons from splice site to stop codon;

    • Single introns corresponding to uninterrupted, intronless genes, i.e., running from initiation codon to stop codon.


    Intergrated gene parsing
    Intergrated Gene Parsing specialised structures located within their cell boundary.

    • Search for signals

    • Perform a content analysis

    • Define the intron/exon boundaries


    Gene finding web sites
    Gene finding web sites specialised structures located within their cell boundary.

    >25 listed sites

    GENSCAN

    FGENES

    http://www.tigr.org/~salzberg/appendixa.html


    ad