Sequence Alignment

Oct 9, 2002

Joon Lee

Genomics & Computational Biology

- Optimization problems: find the best decision one after another
- Subproblems are not independent
- Subproblems share subsubproblems
- Solve subproblem, save its answer in a table

- Characterize the structure of an optimal solution
- Recursively define the value of an optimal solution
- Compute the value of an optimal solution in a bottom-up fashion
- Construct an optimal solution from computed information

Sequence 1: G A A T T C A G T T A

Sequence 2: G G A T C G A

G A A T T C A G T T A

| | | | | |

G G A _ T C _ G _ _ A

G _ A A T T C A G T T A

| | | | | |

G G _ A _ T C _ G _ _ A

- Initialization: gap penalty
- Scoring: matrix fill
- Alignment: trace back

- A = a1a2…an, B = b1b2…bm
- Sij : score at (i,j)
- s(aibj) : matching score between ai andbj
- w : gap penalty

figure source

- Match: +2
- Mismatch: -1
- Gap: -2

0 + 2 = 2

-2 + (-2) = -4

-2 + (-2) = -4

-2 + (-1) = -3

-4 + (-2) = -6

2 + (-2) = 0

-2 + 2 = 0

2 + (-2) = 0

-4 + (-2) = -6

G A A T T C A G T T A

G G A _ T C _ G _ _ A

G A A T T C A G T T A

G G A T _ C _ G _ _ A

- Match: +2
- Mismatch: -1
- Gap: -2

- Match: +2
- Mismatch: -1
- Gap: -2
G C A T C C G

G A T C G

G A T C G

G A T C G

- Match/mismatch → Substitution matrix

- Global: Needlman-Wunsch Algorithm
- Local: Smith-Waterman Algorithm

From Mount Bioinformatics Chap 3

- Sequence alignment with Java applet
- http://linneus20.ethz.ch:8080/5_4_5.html

