dna sequence analysis l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DNA sequence analysis PowerPoint Presentation
Download Presentation
DNA sequence analysis

Loading in 2 Seconds...

play fullscreen
1 / 70

DNA sequence analysis - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

DNA sequence analysis. Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome comparison Applications. DNA sequences gene structure (eucaryotes). Protein coding sequence. 3‘UTR. 5‘UTR. promotor. exon 1. exon 2. exon n. exon n-1.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DNA sequence analysis' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dna sequence analysis

DNA sequence analysis

  • Gene prediction methods
  • Gene indices
  • Mapping cDNA on genomic DNA
  • Genome-genome comparison
  • Applications

Computational Molecular Biology

MPI for Molecular Genetics

dna sequences gene structure eucaryotes

DNA sequencesgene structure (eucaryotes)

Protein coding sequence

3‘UTR

5‘UTR

promotor

exon 1

exon 2

exon n

exon n-1

Computational Molecular Biology

MPI for Molecular Genetics

dna sequences repeats repetitive elements

DNA sequencesrepeats, repetitive elements

  • Long INterspersed Elements
  • SINE (e.g. Alu)
  • Transposons
  • Simple repeats (e.g. ATATA...)

Computational Molecular Biology

MPI for Molecular Genetics

dna sequences repeats repetitive elements4

DNA sequencesrepeats, repetitive elements

  • High copy number
  • Sequence variability
  • Mostly located in untranslated regions

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction strategies for detecting orfs exons
Gene predictionStrategies for detecting ORFs / exons
  • Distribution of stop codons
  • Codon usage
  • Hexamer frequencies
  • Prediction of the coding frame
  • Splice site recognition (Eucaryotes only)

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction by sequence comparison
Gene predictionby sequence comparison
  • Comparison of genomic DNA and cDNA/ESTs
  • Comparison of related genomic DNA of different organisms

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction codon usage single exon
Gene predictionCodon usage (single exon)

coding

Frame 1

non-coding

Frame 2

Frame 3

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction codon usage single exon8

coding sequence

Gene predictionCodon usage (single exon)

coding

Frame 1

non-coding

Frame 2

Frame 3

correct start

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction codon usage multiple exons
Gene predictionCodon usage (multiple exons)

Exons:

208. .295

1029. .1349

1500. .1688

2686. .2934

3326. .3444

3573. .3680

4135. .4309

4708. .4846

4993. .5096

7301. .7389

7860. .8013

8124. .8405

8553. .8713

9089. .9225

13841. .14244

coding

Frame 1

non-coding

Frame 2

Frame 3

Splice sites

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction codon usage multiple exons10
Gene predictionCodon usage (multiple exons)

Exons:

208. .295

1029. .1349

1500. .1688

2686. .2934

3326. .3444

3573. .3680

4135. .4309

4708. .4846

4993. .5096

7301. .7389

7860. .8013

8124. .8405

8553. .8713

9089. .9225

13841. .14244

coding

Frame 1

non-coding

Frame 2

Frame 3

Splice sites

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction additional criteria
Gene predictionAdditional criteria
  • Detection of Start codons
  • Detection of potential promotor elements
  • Detection of repetitive sequences (mostly untranslated)
  • Homology to known genes of related organisms

Computational Molecular Biology

MPI for Molecular Genetics

gene prediction software
Gene predictionSoftware
  • GENSCAN (C.Burge & S.Karlin)
  • Grail (neural network; Ueberbacher et al.)
  • MZEF (M. Zhang,1997)
  • FGeneH, Hexon (V.Solovyev et al., 1994)
  • Genie, etc.

All programs are using dynamic programming for detection of the

optimal solution

Computational Molecular Biology

MPI for Molecular Genetics

dna sequences in public databases
DNA sequences in public databases

Human

~ 2.8 million ESTs + 130 000 RNAs

Mouse

~ 1.8 million ESTs + 30 000 RNAs

Computational Molecular Biology

MPI for Molecular Genetics

expressed sequence tags est

cDNA is usually oligo dT primed, or by random primers

  • Several cDNAs for the same mRNA may be generated

AAAAAA...

cDNA

TTTTTT...

Expressed sequence tags (EST)
  • Reverse transcriptase stops ‚randomly‘

mRNA

Computational Molecular Biology

MPI for Molecular Genetics

expressed sequence tags est15
Expressed sequence tags (EST)

Dechiffered sequence (EST)

Clone = mRNA fragment

3‘-primer

<700 bp

Vector

(known sequence)

Average: 1500 bp

Computational Molecular Biology

MPI for Molecular Genetics

expressed sequence tags est16
Expressed sequence tags (EST)
  • Isolation of mRNAs from tissue(s)
  • Generation of cDNAs reflecting parts of the RNAs
  • Cloning of cDNAs into a vector (often random orientation)
  • End sequencing of the clones

Computational Molecular Biology

MPI for Molecular Genetics

generation of ests basecalling problems
Generation of ESTsBasecalling problems

close to 5‘ end of EST

close to 3‘ end of EST

missing bases

Computational Molecular Biology

MPI for Molecular Genetics

coverage of an mrna by ests

expressed sequence tags

(ESTs)

putative

mRNA

AAAAAA...

5‘UTR

exon 1

exon 2

3‘UTR

Coverage of an mRNA by ESTs

Computational Molecular Biology

MPI for Molecular Genetics

characteristics of ests
Characteristics of ESTs
  • Highly redundant
  • Low sequence quality
  • (Cheap)
  • Reflect expressed genes
  • May be tissue/stage specific

Computational Molecular Biology

MPI for Molecular Genetics

gene indices
Gene indices

Clustering of EST and mRNA sequences of an organism to

reduce redundance in sequence data.

Goal: Each cluster represents one gene or mRNA

  • UniGene (NCBI)
  • TIGR Gene Indices
  • STACK (SANBI)
  • GeneNest (DKFZ,MPI)

Computational Molecular Biology

MPI for Molecular Genetics

gene indices genenest workflow
Gene indicesGeneNest workflow

EMBL database

Unigene database

Quality clipping

Quality clipping

BLAST/QUASAR

search, clustering

Assembly,

Consensus sequences

Visualization

Computational Molecular Biology

MPI for Molecular Genetics

gene indices quality clipping
Gene indicesQuality clipping
  • Removal of vector sequence
  • Masking of repetitive sequences (e.g. Alu)
  • Removal of terminal sequences of low quality

In order to cluster based on gene-specific sequence data

the following steps have to be performed:

Computational Molecular Biology

MPI for Molecular Genetics

gene indices clustering
Gene indicesClustering
  • Minimal % identity (e.g. > 95%)
  • Minimal length of match (e.g. >40 bp)
  • No internal matches (TIGR gene indices)
  • Same origin of tissue (only STACK)

Sequences are usually clustered if the matching part between

two sequences fullfills several (empirical) criteria:

Computational Molecular Biology

MPI for Molecular Genetics

gene indices assembly
Gene indicesAssembly
  • Contigs, reflecting partially different sequences
  • One consensus sequence per contig
  • A relative order of the sequences (alignment)

Sequences in a cluster are assembled to group those sequences

which are globally similar, resulting in

Computational Molecular Biology

MPI for Molecular Genetics

gene indices consensus sequences
Gene indicesConsensus sequences
  • Reduced error rate
  • Consensus often longer than any single sequence contributing
  • Efficient database search
  • Detection of exon/intron boundaries and alternative splice variants

Computational Molecular Biology

MPI for Molecular Genetics

gene indices alignment
Gene indices Alignment

consensus

Computational Molecular Biology

MPI for Molecular Genetics

gene indices alignment software
Gene indices AlignmentSoftware
  • Phrap (Phil Green)
  • CAP3 (X. Huang)
  • TIGR assembler
  • GAP4 (R. Staden)

Computational Molecular Biology

MPI for Molecular Genetics

genenest visualization http genenest molgen mpg de
GeneNest visualization(http://genenest.molgen.mpg.de)

Computational Molecular Biology

MPI for Molecular Genetics

genenest visualization http genenest molgen mpg de29
GeneNest visualization(http://genenest.molgen.mpg.de)

Computational Molecular Biology

MPI for Molecular Genetics

tigr gene indices http www tigr org
TIGR Gene Indices(http://www.tigr.org/)

Alignment scheme

Computational Molecular Biology

MPI for Molecular Genetics

unigene http www ncbi nih nlm gov unigene
UniGene(http://www.ncbi.nih.nlm.gov/UniGene)

Computational Molecular Biology

MPI for Molecular Genetics

unigene http www ncbi nih nlm gov unigene32
UniGene(http://www.ncbi.nih.nlm.gov/UniGene)

Computational Molecular Biology

MPI for Molecular Genetics

mapping of est consensus sequences on genomic dna

missing intron

consensus sequence

(  mRNA)

exons

Mapping of EST consensus sequences on genomic DNA

genomic sequence

Computational Molecular Biology

MPI for Molecular Genetics

mapping cdna on genomic dna
Mapping cDNA on genomic DNA

Computational Molecular Biology

MPI for Molecular Genetics

mapping cdna on genomic dna http splicenest molgen mpg de
Mapping cDNA on genomic DNA(http://splicenest.molgen.mpg.de)

Computational Molecular Biology

MPI for Molecular Genetics

genome genome comparison
Genome-genome comparison

ancestral gene

mouse

x

xxx

x

xxx

human

X = region with low mutation rate

Computational Molecular Biology

MPI for Molecular Genetics

genome genome comparison37
Genome-genome comparison

Computational Molecular Biology

MPI for Molecular Genetics

genome genome comparison38
Genome-genome comparison
  • Conserved coding regions (protein similarity, similar function)
  • Conserved coding exons (protein domain similarity, functional feature)
  • Conserved non-coding regions (regulatory sites, transcription factor binding sites)

Computational Molecular Biology

MPI for Molecular Genetics

gene indices applications
Gene indicesApplications
  • Detection of exon/intron boundaries
  • Detection of alternative splicing
  • Detection of Single Nucleotide Polymorphisms
  • Genome annotation
  • Analysis of gene expression
  • Design of DNA-chips/arrays

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing

hnRNA

5‘UTR

exon 1

exon 2

exon 3

mRNA 1

5‘UTR

exon 1

exon 3

mRNA 2

5‘UTR

exon 1

exon 2

Alternative Splicing

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing41

splice variant

consensus sequence

(  mRNA)

exons

Alternative Splicing

genomic sequence

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing additional exon
Alternative Splicing(additional exon)

Splice variants of adenylsuccinate lyase

unspliced ?

skipped exon

gene prediction errors ?

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing43
Alternative Splicing

Splice variants of APECED gene

alternative variants

number of sequences

genomic sequence

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing44
Alternative splicing

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing alternative donor site
Alternative Splicing (alternative donor site)

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing46
Alternative Splicing

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing alternative exons
Alternative Splicing(alternative exons)

Computational Molecular Biology

MPI for Molecular Genetics

alternative splicing unknown gene hs16936
Alternative Splicing(unknown gene Hs16936)

Computational Molecular Biology

MPI for Molecular Genetics

single nucleotide polymorphisms snp
Single Nucleotide Polymorphisms(SNP)
  • SNPs are single base differences within one species
  • Several million SNPs detected in Human
  • SNPs may be related to diseases

Computational Molecular Biology

MPI for Molecular Genetics

single nucleotide polymorphisms snp50
Single Nucleotide Polymorphisms(SNP)

SNP or basecalling error ?

Computational Molecular Biology

MPI for Molecular Genetics

genome annotation ensembl http www ensembl org
Genome Annotation / Ensembl(http://www.ensembl.org)

Computational Molecular Biology

MPI for Molecular Genetics

analysis of gene expression tissue specificity
Analysis of gene expressiontissue-specificity
  • Counting frequency of ESTs derived from a specific tissue within one sequence cluster
  • Searching for cluster/contigs which are tissue specific (e.g. tumor)
  • Searching for alternative splice variants which are potentially tissue specific

Computational Molecular Biology

MPI for Molecular Genetics

analysis of gene expression tissue specificity53
Analysis of gene expressiontissue-specificity

neuron-specific gene

(Hs90005)

Computational Molecular Biology

MPI for Molecular Genetics

analysis of gene expression tissue specificity54
Analysis of gene expressiontissue-specificity

neuron-specific gene

(Hs90005)

Computational Molecular Biology

MPI for Molecular Genetics

anaysis of gene expression internal priming
Anaysis of gene expressioninternal priming

Computational Molecular Biology

MPI for Molecular Genetics

analysis of gene expression tissue specificity56
Analysis of gene expressiontissue-specificity

Computational Molecular Biology

MPI for Molecular Genetics

analysis of gene expression tissue specificity57
Analysis of gene expressiontissue-specificity
  • Analysis of tissue-specificity depends on
  • expression level
  • number of clones sequenced

Computational Molecular Biology

MPI for Molecular Genetics

design of dna chips arrays non redundant gene set
Design of DNA chips/arraysnon-redundant gene set
  • Selection of ‚optimal‘ clones
  • Generation of gene-specific PCR-products

Computational Molecular Biology

MPI for Molecular Genetics

design of dna chips arrays optimal clones
Design of DNA chips/arrays‚optimal clones‘
  • clone availability
  • type of clone library
  • length of the clone
  • relative position to the consensus sequence
  • homology to other genes
  • existence of repetitive elements

Computational Molecular Biology

MPI for Molecular Genetics

design of dna chips arrays gene specific pcr products

similarity to another gene

repetitive sequence

potential gene-specific

fragment

potential gene-specific

fragment

Design of DNA chips/arrays gene-specific PCR-products
  • putative gene

consensus

sequence

exon A

exon B

exon C

Computational Molecular Biology

MPI for Molecular Genetics

design of dna chips arrays optimal gene specific pcr product
Design of DNA chips/arrays optimal gene-specific PCR-product
  • minimal similarity to other genes
  • minimal content of repetitive sequences
  • not spanning over several exons
  • +/- constant length of PCR-products of
  • different genes

Computational Molecular Biology

MPI for Molecular Genetics

primer design what are primers
Primer designWhat are primers?
  • short oligonucleotides (15-25 bp)
  • unique sequence
  • defined melting temperature

Computational Molecular Biology

MPI for Molecular Genetics

primer design primer hybridization elongation
Primer designprimer hybridization/elongation

5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘

|||||||||||||||||||||||||

3‘...AAAGTCATTAATTTTTCTAAAGACACCGGTAAA...5‘

Computational Molecular Biology

MPI for Molecular Genetics

primer design applications
Primer designApplications
  • DNA sequencing
  • Polymerase Chain Reaction (PCR)
  • DNA chip/array design

Computational Molecular Biology

MPI for Molecular Genetics

primer design features
Primer designFeatures
  • Melting temperature
  • Self-complementarity
  • Secondary binding capacity

Computational Molecular Biology

MPI for Molecular Genetics

primer design melting temperature 2 4 rule
Primer designmelting temperature / 2+4 rule

TTTCAGTAATTAAAAAGATTTCTGT

5 x 4°C + 20 x 2°C = 60°C

Computational Molecular Biology

MPI for Molecular Genetics

primer design thermodynamic stability nearest neighbour
Primer designthermodynamic stability / nearest neighbour

TTTCAGTAATTAAAAAGATTTCTGT

-1.2

-1.7

-1.5

kcal/mol

Computational Molecular Biology

MPI for Molecular Genetics

primer design self complementarity
Primer designself-complementarity

5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘

| | |||||| |

3‘ TGTCTTTAGAAAAATTAATGACTTT 5‘

all primers able to form internal loops are also able to form dimers

Computational Molecular Biology

MPI for Molecular Genetics

primer design secondary binding sites
Primer designsecondary binding sites

5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘

|| | | | |||||||

3‘ ACGGTAGGCATTCTACGAAAAGACA 5‘

stability of 3‘-terminal bases gets a higher weight

simulating ist importance for the polymerase

Computational Molecular Biology

MPI for Molecular Genetics

primer design secondary binding sites suffix tree

A

...NACGTCAAA...

A

C

...NACGTCGCA...

G

C

GTAGCC...

T

C

AGCC...

AAA...

GCA...

Primer designsecondary binding sites / suffix tree

AACGTAGCC...

Computational Molecular Biology

MPI for Molecular Genetics