CS 5263 Bioinformatics. Lecture 3: Dynamic Programming and Sequence Alignment. Roadmap. Review of last lecture Biology Dynamic programming Sequence alignment. R. R. R. R. R. R. …. H2N. COOH. Cterminal. Nterminal. Carboxyl group. Amino group. Protein zoomin.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
CS 5263 Bioinformatics
Lecture 3: Dynamic Programming and Sequence Alignment
R
R
R
R
R
R
…
H2N
COOH
Cterminal
Nterminal
Carboxyl group
Amino group
R

H2NCCOOH

H
Side chain
5’ACATGATAA3’
3’TGTACTAT5’
5’ACATGATAA3’ 5’ACATGATAA3’
3’TGTACTATT5’ 3’TGTACTATT5’
(where genetic information is stored)
(for making mRNA)
Coding strand: 5’ACGTAGACGTATAGAGCCTAG3’
Template strand: 3’TGCATCTGCATATCTCGGATC5’
mRNA: 5’ACGUAGACGUAUAGAGCCUAG3’
Coding strand and mRNA have the same sequence, except that T’s in DNA are replaced by U’s in mRNA.
Third
letter
2
3
1
(0,0)
s
1
5
1
1
3
3
2
3
3
2
2
2
1
1
2
1
2
3
4
g
(3,3)
(0,0) > (2,0) used by 3 paths
2
3
1
0
1
5
1
1
3
3
2
3
3
2
2
2
1
1
2
1
2
3
4
2
3
1
0
2
5
6
1
5
1
1
3
1
3
2
3
3
2
2
2
4
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
2
3
3
2
2
2
4
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
3
2
3
3
2
2
2
4
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
8
1
1
2
1
2
3
4
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
8
1
1
2
1
2
3
4
5
5
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
8
1
1
2
1
2
3
4
5
5
7
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
8
1
1
2
1
2
3
4
5
5
7
10
2
3
1
0
2
5
6
1
5
1
1
3
1
2
3
6
3
2
3
3
2
2
2
4
4
6
8
1
1
2
1
2
3
4
5
5
7
10
Function fib(n)
if (n == 0 or n == 1)
return 1;
else
return fib(n1) + fib(n2);
Time complexity: O(1.62^n)
function fib(n)
F[0] = 1;F[1] = 1;
For i = 2 to n
F[n] = F[n1] + F[n2];
End
Return F[n];
Time: O(n), space: O(n)
C
…ACGGTGCAGTCACCA…
…ACGTTGCGTCCACCA…
Sequence edits:
Mutation, deletion, insertion
next generation
OK
OK
OK
X
X
Still OK?
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
Alignment:
The “best” way to match the letters of one sequence with those of the other
How do we define “best”?
S’: AGGCTATCACCTGACCTCCAGGCCGATGCCC
T’: TAGCTATCACGACCGCGGTCGATTTGCCCGAC
AGGCCTC
Scoring Function:
Match: +m~~~AAC~~~
Mismatch: s~~~AA~~~
Gap (indel):d
for all subseqs A of S, B of T s.t. A = B do
align A[i] with B[i], 1 ≤i ≤A
align all other chars to spaces
compute its value
retain the max
end
output the retained alignment
S = abcd A = cd
T = wxyz B = xz
abcd abcd
wxyz wxyz
Suppose we wish to align
x1……xM
y1……yN
Let F(i,j) = optimal score of aligning
x1……xi
y1……yj
Notice three possible cases:
~~~~~~~ xM
~~~~~~~ yN
2.xM aligns to a gap
~~~~~~~ xM
~~~~~~~ 
~~~~~~~ 
~~~~~~~ yN
m, if xM = yN
F(M,N) = F(M1, N1) +
s, if not
F(M,N) = F(M1, N)  d
F(M,N) = F(M, N1)  d
F(M1, N1) + (XM,YN)
F(M, N1) – d
(XM,YN) = m if XM = YN, and –s otherwise
F(i1, j1) + (Xi,Yj)
F(i, j1) – d
F(i1, j1) + (Xi,Yj)
F(i, j1) – d
[case 1]
[case 2]
[case 3]
F(i1, j1)
F(i1, j)
1
2
F(i, j1)
F(i, j)
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
This only tells us the best score
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
AGTA
ATA
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTAm = 1
y = ATAs = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
AGTA
ATA
j = 0
1
2
3
For each j = 1……N
F(i1,j) – d [case 1]
F(i, j) = max F(i, j1) – d [case 2]
F(i1, j1) + σ(xi, yj) [case 3]
UP, if [case 1]
Ptr(i,j)= LEFTif [case 2]
DIAGif [case 3]
O(NM)
O(NM)
CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC
GCGAGTTCATCTATCACGACCGCGGTCG
Changes:
For all i, j,
F(i, 0) = 0
F(0, j) = 0
maxi F(i, N)
FOPT = max maxj F(M, j)
x1……………………………… xM
yN……………………………… y1
x
x
y
y
$ diff file1 file2
1c1
< A

> G
4c4
< D

> 
LCS = 4
Changes:
For all i, j, F(i, 0) = F(0, j) = 0
F(i1,j)
F(i, j) = max F(i, j1)
F(i1, j1) + σ(xi, yj)
where σ(xi, yj) = 1 if xi = yj and 0 otherwise.
maxi F(i, N)
FOPT = max
maxj F(M, j)
See you next week