CS 5263 Bioinformatics. Lecture 3: Dynamic Programming and Sequence Alignment. Roadmap. Review of last lecture Biology Dynamic programming Sequence alignment. R. R. R. R. R. R. …. H2N. COOH. Cterminal. Nterminal. Carboxyl group. Amino group. Protein zoomin.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 3: Dynamic Programming and Sequence Alignment
R
R
R
R
R
…
H2N
COOH
Cterminal
Nterminal
Carboxyl group
Amino group
Protein zoominR

H2NCCOOH

H
Side chain
5’ACATGATAA3’
3’TGTACTAT5’
5’ACATGATAA3’ 5’ACATGATAA3’
3’TGTACTATT5’ 3’TGTACTATT5’
(where genetic information is stored)
(for making mRNA)
Coding strand: 5’ACGTAGACGTATAGAGCCTAG3’
Template strand: 3’TGCATCTGCATATCTCGGATC5’
mRNA: 5’ACGUAGACGUAUAGAGCCUAG3’
Coding strand and mRNA have the same sequence, except that T’s in DNA are replaced by U’s in mRNA.
Function fib(n)
if (n == 0 or n == 1)
return 1;
else
return fib(n1) + fib(n2);
function fib(n)
F[0] = 1; F[1] = 1;
For i = 2 to n
F[n] = F[n1] + F[n2];
End
Return F[n];
C
…ACGGTGCAGTCACCA…
…ACGTTGCGTCCACCA…
Sequence edits:
Mutation, deletion, insertion
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
Alignment:
The “best” way to match the letters of one sequence with those of the other
How do we define “best”?
S’: AGGCTATCACCTGACCTCCAGGCCGATGCCC
T’: TAGCTATCACGACCGCGGTCGATTTGCCCGAC
AGGCCTC
Scoring Function:
Match: +m ~~~AAC~~~
Mismatch: s ~~~AA~~~
Gap (indel): d
for all subseqs A of S, B of T s.t. A = B do
align A[i] with B[i], 1 ≤i ≤A
align all other chars to spaces
compute its value
retain the max
end
output the retained alignment
S = abcd A = cd
T = wxyz B = xz
abcd abcd
wxyz wxyz
Suppose we wish to align
x1……xM
y1……yN
Let F(i,j) = optimal score of aligning
x1……xi
y1……yj
Notice three possible cases:
~~~~~~~ xM
~~~~~~~ yN
2. xM aligns to a gap
~~~~~~~ xM
~~~~~~~ 
~~~~~~~ 
~~~~~~~ yN
m, if xM = yN
F(M,N) = F(M1, N1) +
s, if not
F(M,N) = F(M1, N)  d
F(M,N) = F(M, N1)  d
F(M1, N1) + (XM,YN)
F(M, N1) – d
(XM,YN) = m if XM = YN, and –s otherwise
F(i1, j1) + (Xi,Yj)
F(i, j1) – d
F(i, j1) – d
[case 1]
[case 2]
[case 3]
F(i1, j1)
F(i1, j)
1
2
F(i, j1)
F(i, j)
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
This only tells us the best score
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
AGTA
ATA
j = 0
1
2
3
x = AGTA m = 1
y = ATA s = 1
d = 1
F(i,j) i = 0 1 2 3 4
Optimal Alignment:
F(4,3) = 2
AGTA
ATA
j = 0
1
2
3
For each j = 1……N
F(i1,j) – d [case 1]
F(i, j) = max F(i, j1) – d [case 2]
F(i1, j1) + σ(xi, yj) [case 3]
UP, if [case 1]
Ptr(i,j) = LEFT if [case 2]
DIAG if [case 3]
CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC
GCGAGTTCATCTATCACGACCGCGGTCG
Changes:
For all i, j,
F(i, 0) = 0
F(0, j) = 0
maxi F(i, N)
FOPT = max maxj F(M, j)
x1……………………………… xM
yN……………………………… y1
Changes:
For all i, j, F(i, 0) = F(0, j) = 0
F(i1,j)
F(i, j) = max F(i, j1)
F(i1, j1) + σ(xi, yj)
where σ(xi, yj) = 1 if xi = yj and 0 otherwise.
maxi F(i, N)
FOPT = max
maxj F(M, j)