alignment n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ALIGNMENT PowerPoint Presentation
Download Presentation
ALIGNMENT

Loading in 2 Seconds...

play fullscreen
1 / 37

ALIGNMENT - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

ALIGNMENT. How do we tell whether two macromolecules are similar? Why?. SEQUENCE STRUCTURE FUNCTION. Alignments. DNA:DNA polypeptide:polypeptide. Alignments. One-to-One One-to-Database Many-to-Many. Origins of Sequence Similarity. Homology common evolutionary descent

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ALIGNMENT' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
alignment

ALIGNMENT

How do we tell whether two macromolecules are similar? Why?

SEQUENCE

STRUCTURE

FUNCTION

Chuck Staben

alignments
Alignments
  • DNA:DNA
  • polypeptide:polypeptide

Chuck Staben

alignments1
Alignments
  • One-to-One
  • One-to-Database
  • Many-to-Many

Chuck Staben

origins of sequence similarity
Origins of Sequence Similarity
  • Homology
    • common evolutionary descent
  • Similarity in function
    • convergence
  • Chance

History

Necessity

Serendipity

Chuck Staben

similarity

MISMATCH

Similarity

GAACAAT

||||||| 7/7 OR 100%

GAACAAT

GAACAAT

||| ||| 6/7 OR 84%

GAATAAT

Chuck Staben

mismatches
Mismatches

GAACAAT

||| ||| 6/7 OR 84%

GAATAAT

Same??

GAACAAT

||| ||| 6/7 OR 84%

GAAGAAT

Chuck Staben

terminal mismatch

Count this?

Terminal Mismatch

GAACAATttttt

||| |||

aaaccGAATAAT 6/7 OR 84%

Chuck Staben

indels

INDEL

INDELS

GAAgCAAT

||| |||| 7/7 OR 100%

GAA*CAAT

(alignment-challenged?)

Chuck Staben

indels cont d

vs.

GAAggggCAAT

||| ||||

GAA****CAAT

Indels, cont’d

GAAgCAAT

||| ||||

GAA*CAAT

Chuck Staben

similarity scoring
Similarity Scoring
  • Terminal mismatches (0)
  • Match score (10)
  • Mismatch penalty (-9)
  • Gap penalty (50)
  • Gap extension penalty (3)

DNA Defaults-Bestfit

Chuck Staben

dna scoring
DNA Scoring

GGGGGGGGGG

|||||***** 5(10)-5(9)=5

GGGGGAAAAAGGGGG

GGGGG*****GGGGG

|||||***** ||||| 10(10)-50-5(3)=35

GGGGGAAAAAGGGGG

Chuck Staben

absurdity of low gap penalty
Absurdity of Low Gap Penalty

GATCGCTACGCTCAGC

A.C.C..C..T

Perfect similarity,

Every time!

Chuck Staben

algorithms
Algorithms

Optimal Score=Optimal Alignment

Needleman-Wunsch

Dynamic Programming

Optimal Local Alignment

Smith-Waterman

Chuck Staben

programs
Programs
  • BESTFIT
    • Smith-Waterman
    • SINGLE BEST SIMILARITY
  • GAP
    • Needleman-Wunsch
    • End-to-end ALWAYS
  • COMPARE/DOTPLOT
    • COMPLETE surface of comparison

Chuck Staben

bestfit vs gap
BESTFIT vs GAP

1 ggggg 5

|||||

3 ggggg 7

1 ...gggggaaaaaggggccccc 19

|| |||| ||

1 gggggttttttttggggtttcc 22

Chuck Staben

statistical significance
Statistical Significance

RaNdOmIzE

Quality: 50 Length: 5

Similarity: 100.000 Identity: 100.000

Average quality, 20 randomizations: 34.2 +/- 9.4

Quality > RANDOM + 2()

Chuck Staben

program limitations
Program Limitations
  • BESTFIT
    • 1000 vs 10,000
  • GAP
    • 1000 vs 1000
  • COMPARE
    • 1000 vs 1000

Memory

Chuck Staben

protein similarity
Protein Similarity
  • Identity-Easy

WEAK Alignments

  • Chemical Similarity
    • L vs I, K vs R…
  • Evolutionary Similarity

Chuck Staben

single base evolution
Single-Base Evolution

CAU=H

CAC=H CGU=R UAU=Y

CAA=Q CCU=P GAU=D

CAG=Q CUU=L AAU=N

Chuck Staben

substitution matrices
Substitution Matrices
  • PAM-Dayhoff
  • BLOSUM-Henikoff

Chuck Staben

pam dayhoff
PAM-Dayhoff
  • Related proteins, substitutions constrained by evolution and function
  • “accepted” by evolution (point accepted mutation)
  • 1 PAM::1% divergence
      • PAM120=closely related proteins
      • PAM250=divergent proteins
  • Log/odds approach

Chuck Staben

blosum henikoff henikoff
BLOSUM-Henikoff&Henikoff
  • Align “BLOCKS”
  • Merge blocks at given % similar to one sequence
  • Calculate “target” frequencies
  • BLOSUM62=62% similar blocks
    • good general purpose
  • BLOSUM30
    • weak similarities

Chuck Staben

blosum62
BLOSUM62

Chuck Staben

blosum62 2
BLOSUM62-2

Glu Asp Gln Lys Arg His Gly Ala

GAA GAU CAA AAA AGA CAUGGA GCA

GAG GAC CAG AAG AGG CACGGG GCG

Chuck Staben

slide25
Gaps
  • No general theory!!
  • G+L(n)
    • indel mutations rare
    • variation in length “easy”

Chuck Staben

alignment statistics
Alignment Statistics
  • Ungapped, local alignments (HSPs)
    • extreme value, not normal distribution
  • S(observed score) vs expected distribution p
  • E=expected number, chance alignments
  • K,  distribution parameters

“chance of finding a needle in a haystack depends on the size of the haystack”

Chuck Staben

real alignments
“Real” Alignments
  • Multiple HSPs
  • Karlin-Altshcul Sum Statistics
  • Heuristic qualities
    • alignments proceed end-to-end ????

Chuck Staben

real alignments1
Real Alignments

Protein-Protein

Close-Distant

DNA-DNA

Chuck Staben

phylogeny
Phylogeny

GCG

Myoglobin

Chuck Staben

cow to pig
Cow-to-Pig

88% identical

Chuck Staben

cow to pig cdna
Cow-to-Pig cDNA

80% Identity

(88% at aa!)

Chuck Staben

coding vs non coding regions
Coding vs Non-coding Regions

90% in Coding

74% in Non-coding

Chuck Staben

third base of codon hypervariable
Third Base of Codon Hypervariable

28 third base

11 second

8 first

Chuck Staben

cow to fish protein
Cow-to-Fish Protein

42% identity

51% similairity

Chuck Staben

cow to fish dna
Cow-to-Fish DNA

48% similairity

Significant

Chuck Staben

protein vs dna alignments
Protein vs DNAAlignments
  • Polypeptide similarity > DNAs
  • Coding DNA > Non-coding
  • 3rd base of codon hypervariable
  • Moderate Distance 

poor DNA similarity

Chuck Staben