Aligning Alignments. Soni Mukherjee 11/11/04. Pairwise Alignment. Given two sequences, find their optimal alignment Score = (#matches) * m  (#mismatches) * s  (#gaps) * d Optimal alignment is the alignment with the maximum score. Dynamic Programming. We want to align
x1…xm and y1…yn
x1…xi and y1…yj
xi aligns to yj
x1……xi1 xi
y1……yj1 yj
2. xi aligns to a gap
x1……xi1 xi
y1……yj 
yj aligns to a gap
x1……xi 
y1……yj1 yj
Dynamic ProgrammingCGCCTAGCTAG
CTGCTATCTTTAG
xi aligns to yj
x1……xi1 xi
y1……yj1 yj
2. xi aligns to a gap
x1……xi1 xi
y1……yj 
yj aligns to a gap
x1……xi 
y1……yj1 yj
Dynamic ProgrammingCGCCTAGCTAG
CTGCTATCTTTAG
D(i,j) = D(i1, j1) +
m, if xi = yj
s, otherwise
xi aligns to yj
x1……xi1 xi
y1……yj1 yj
2. xi aligns to a gap
x1……xi1 xi
y1……yj 
yj aligns to a gap
x1……xi 
y1……yj1 yj
Dynamic ProgrammingCGCCTAGCTAG
CTGCTATCTTTAG
D(i,j) = D(i1, j1) +
m, if xi = yj
s, otherwise
D(i,j) = D(i1, j)  d
xi aligns to yj
x1……xi1 xi
y1……yj1 yj
2. xi aligns to a gap
x1……xi1 xi
y1……yj 
yj aligns to a gap
x1……xi 
y1……yj1 yj
Dynamic ProgrammingCGCCTAGCTAG
CTGCTATCTTTAG
D(i,j) = D(i1, j1) +
m, if xi = yj
s, otherwise
D(i,j) = D(i1, j)  d
D(i,j) = D(i, j1)  d
Where s(xi, yj) = m if xi = yj; s otherwise
y1 ……………………………… yN
xM ……………………………… x1
Gap of length n incurs penalty p(n) = n*d
For all n, p(n+1)  p(n) < p(n)  p(n1)
Gap of length n incurs penalty p(n) = n*d
For all n, p(n+1)  p(n) < p(n)  p(n1)
D(i, j) = max
D(i1, j1) + s(xi, yj)
maxk=0…i1 D(k, j) – p(ik)
maxk=0…j1 D(i, k) – p(jk)
Gap of length n incurs penalty p(n) = n*d
For all n, p(n+1)  p(n) < p(n)  p(n1)
D(i, j) = max
D(i1, j1) + s(xi, yj)
maxk=0…i1 D(k, j) – p(ik)
maxk=0…j1 D(i, k) – p(jk)
3
Running time = O(N )
d = gap open penalty
e = gap extend penalty
D(i, j) = score of alignment x1…xi to y1…yj ifxi aligns to yj
H(i, j) = score of alignment x1…xi to y1…yj ifyj aligns to a gap
V(i, j) = score of alignment x1…xi to y1…yj ifxi aligns to a gap
e
d
D(i1, j1) + s(xi, yj)
H(i1, j1) + s(xi, yj)
V(i1, j1) + s(xi, yj)
D(i, j1)  d
H(i, j1)  e
V(i, j1)  d
D(i1, j)  d
H(i1, j)  d
V(i1, j)  e
D(i1, j1) + s(xi, yj)
H(i1, j1) + s(xi, yj)
V(i1, j1) + s(xi, yj)
D(i, j1)  d
H(i, j1)  e
V(i, j1)  d
Running time = O(MN)
D(i1, j)  d
H(i1, j)  d
V(i1, j)  e
 x Starts z x Starts z x Continues
y  new gap y  new gap   old gap
x:AC_GCGG_C
y:AC_GC_GAG
z:GCCGC_GAG
x: ACGCGG_C x: AC_GCGG_C y: AC_GCGAG
y: ACGC_GAC z: GCCGC_GAG z: GCCGCGAG
S(m) = k<l s(mk, ml)
wheres(mk, ml) = score of induced alignment (k, l)
F(i,j,k) = max{ F(i1,j1,k1)+S(xi, xj, xk),
F(i1,j1,k )+S(xi, xj,  ),
F(i1,j ,k1)+S(xi, , xk),
F(i1,j ,k )+S(xi, ,  ),
F(i ,j1,k1)+S( , xj, xk),
F(i ,j1,k )+S( , xj, ),
F(i ,j ,k1)+S( , , xk) }
x
y
z
w
x1 ……………………………… xM
z1 ……………………………… zL
yN ……………………………… y1

A
D(i1, j1) – d + s(A, A)
D(i1, j) – d – d
D(i, j1) + 0 – d
y:  A
x:  
z: A A
y:  A
x:  
 x Starts z x Starts z x Continues
y  new gap y  new gap   old gap
 x Starts z x Starts z x Continues
y  new gap y  new gap   old gap
 x Starts or continues
  a gap???
 x Starts z x Starts z x Continues
y  new gap y  new gap   old gap
 x Starts or continues
  a gap???
 x
 
D(i, j) = score of alignment a1…ai to b1…bj if ai aligns to bj
H(i, j) = score of alignment a1…ai to b1…bj if bj aligns to a gap
V(i, j) = score of alignment a1…ai to b1…bj if ai aligns to a gap
are the cases HH, HV, and HD, generalized as HX
 x
 
2
2
Only three types of paths can cause
Alignment vs Alignment
Comparison…   x
…   
Only three types of paths can cause
Alignment vs Alignment
Any path can cause
Comparison…   x
…   
…   x
…   
x:  A
y:  
A: AGGCTATCACCTGACCTCCAGG
B: TAGCTATCACGACCGC
C: CAGCTATCACGACCGC
D: CAGCCTATCACCGAACGCCA
A: AGGCTATCACCTGACCTCCAGG
B: TAGCTATCACGACCGC
C: CAGCTATCACGACCGC
D: CAGCCTATCACCGAACGCCA
O((3 + sqrt(2)) (nk) k ), if k < n
O((3 + sqrt(2)) k n ), if k >= n
k
2
3/2
n
2
1/2
o
o
o