How to find a gene
Download
1 / 42

How to find a gene?* - PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on

How to find a gene?*. One way is too search for an open reading frame (ORF). An ORF is a sequence of codons in DNA that starts with a Start codon, ends with a Stop codon, and has no other Stop codons inside. * = inexact science. Each strand has 3 possible ORFs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' How to find a gene?*' - naava


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
How to find a gene
How to find a gene?*

  • One way is too search for an open reading frame (ORF).

  • An ORF is a sequence of codons in DNA that starts with a Start codon, ends with a Stop codon, and has no other Stop codons inside.

    * = inexact science


Each strand has 3 possible orfs
Each strand has 3 possible ORFs.

5'                                3’ atgcccaagctgaatagcgtagaggggttttcatcatttgagtaa

1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag taa M   P   K   L   N   S   V   E   G   F   S   S   F   E   * 

2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agt  C   P   S   *   I   A   *   R   G   F   H   H   L   S   

3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gta   A   Q   A   E   *   R   R   G   V   F   I   I   *   V  


Eukaryotic genomes
Eukaryotic Genomes

  • Finding a gene is much more difficult in eukaryotic genomes than in prokaryotic genomes. WHY??


Prokaryotic bacterial genomes
Prokaryotic (bacterial) genomes:

  • Are much smaller than eukaryotic genomes-

    E. coli = 4,639,221 bp, 4.6 Mb

    Human = ~~ 3,300 Mb


Prokaryotic bacterial genomes1
Prokaryotic (bacterial) genomes:

  • Contain fewer genes:E. coli- 4285 protein coding genes

    - 122 Structural RNA genes

  • Human- ~ ~ ~ 32,000 genes


Prokaryotic bacterial genomes2
Prokaryotic (bacterial) genomes:

Contain a small amount of noncoding DNA-

E. coli= ~ 11% (average intergenic distance = 130 bp)

Human = > 95% (there are islands, hundreds of thousands of bp, apparently without a gene.)


Eukaryotic genomes1
Eukaryotic Genomes:

  • Contain massive amounts of repetitive DNA sequences (Define).

  • Human- repeat seqeunces comprise over 50% of genome.

  • E. coli- DNA is almost entirely unique


What are the human repetitive dna sequences
What are the human repetitive DNA sequences?

  • Simple ‘stutters’ (CAGCAGCAGCAGCAGCAG . . . .)

  • Psuedogenes

  • Transposable elements (= > 40% of HG)

  • Segmental duplications (~ 10 - -300 kb)

  • Gene Families (maybe a reflection of genomic duplications)


Shocking discovery in mid 1970s

Shocking discovery in mid 1970s:

Eukaryotic genes are interrupted by noncoding DNA!

Almost all transcripts (mRNA) are spliced before leaving the nucleus.


Exon

=

Genetic code

Intron

=

Non-essential DNA ? ?



Variable mutation rate
Variable mutation rate?

  • Most mutations in introns and intergenic DNA are (apparently) harmless

  • Consequently, intron and intergenic DNA sequences diverge much quicker than exons.


Shocking discovery in late1990s
Shocking discovery in late1990s:

  • Some eukaryotic genomes have thousands of genes that are alternatively spliced.

  • In the human genome, it is now estimated that 35% of the genes undergo alternative splicing



Bacteria cells are different
Bacteria cells are different:

  • Prokaryotic cells- No splicing (i.e. – no split genes)

  • Eukaryotic cells- Intronless genes are rare (avg. # of introns in HG is 3-7, highest # is 234); dystrophin gene is > 2.4 Mb.


Identifying all of the human genes
Identifying all of the human genes

  • Is tough

  • Is easy

  • Is really tough


Making it tough
Making it tough:

  • Pseudogenes

  • Large intergenic regions

  • Prevelant and long introns

  • Alternative splicing



8 genes in c elegans 5 intronic genes
8 genes in C. elegans- 5 intronic genes:



Is there a gene in there
Is there a gene in there?

5’ CAGACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGTCGT

CGAGGCACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTGGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGT

CAGACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTCGTAGTCGTAGTCGTAGTCGGGCTTGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGTCGC

3’


How to confirm the identification of a gene
How to confirm the identification of a gene?

  • Possible answer- Identify the gene by identifying its promoter.


Promoters are dna regions that control when genes are activated
Promoters are DNA regions that control when genes are activated.

Promoter coding region 

[ ]




Demonstration of a consensus sequence
Demonstration of a will be produced.consensus sequence.

  • De


Three current bioinformatic challenges
Three current bioinformatic challenges: will be produced.

  • 1) verification of the data (it is correct?)

  • 2) Thorough annotation of the data (includes developing appropriate means of annotating)

  • 3) How to handle data of ever-larger chunks


A dot a promoter dark purple left to right light purple right to left overlapping genes green
A dot = a promoter. Dark purple = left to right, light purple = right to left. Overlapping genes= green


Inner circle ccw direction outer circle cw direction
Inner circle = ccw direction, outer circle = cw direction purple = right to left. Overlapping genes= green


How to find a gene1
How to find a gene? purple = right to left. Overlapping genes= green

  • Look for a substantial ORFs and associated ‘features’.

    ORFs- open reading frames



Recombinant dna techniques
Recombinant DNA techniques? will hybridize.

  • Many popular tools of recDNA rely on the principle of DNA hybridization.

  • In large mixes of DNA molecules, complementary sequences will pair.


Hybridization in silico
Hybridization ‘in silico’ will hybridize.

  • Algorithms have been written that will compare two nucleic acid sequences. Two similar DNA sequences (they would hybridize in solution) are said ‘to match’ when software determines that they are of significant similarity.


8/10= 80% will hybridize.

Mouse ATGCCGTGCTA

: : : : : : : :

Human ATG--CGGGCAA


Protein protein similarity searches
Protein- Protein similarity searches? will hybridize.

  • Many algorithms have been designed to compare strings of amino acids (single letter amino acid code) and find those of a defined degree of similarity.


60 70 80 90

#1 TSIDQLRATTSYDELRQDGSTTISYDDYSR

: : : . : : : : : : : : : : : : : : : : : : : : . : : : : :

#2 TSIEQLRATTSYDELRQDGSTTISTDDYSR


Significance of sequence similarity
Significance of sequence similarity 90

  • DNA similarity suggests:

  • Similar function

  • Similar structure

  • Evolutionary relationship



ad