Dynamic programming
Download
1 / 29

Dynamic Programming - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Dynamic Programming. Presenters: Michal Karpinski Eric Hoffstetter. Background. “Dynamic programming” originates with Richard Bellman (1940s) in multistage decision process problems.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Dynamic Programming' - michi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dynamic programming

Dynamic Programming

Presenters:

Michal Karpinski

Eric Hoffstetter


Background
Background

“Dynamic programming” originates with Richard Bellman (1940s) in multistage decision process problems.

While at RAND Corp, he wanted his work to appear more practical (“real work”) as opposed to theoretical. To shield himself from scrutiny, Bellman chose the word “programming,” which implies fruitful, deliberate effort and embellished it with “dynamic.” As he puts it “it’s impossible to use dynamic in a pejorative sense.”

Applications:

String alignments / problems

Pattern recognition:

Image matching / image recognition (2D & 3D)

Speech recognition (Viterbi algorithm)

Manufacturing – find fastest way through factory

Order of matrices in matrix multiplication to minimize cost

Build optimal binary search tree – minimize number of nodes visited during search

Language translator – most common words near root of tree


Used to solve problems exhibiting
Used to solve problems exhibiting:

  • Overlapping Subproblems: “they occur as a subproblem of different problems”

  • Optimal Substructure:“An optimal solution to the problem contains within it optimal solutions to subproblems.”

  • Subproblem Independence:“the solution to one subproblem does not affect the solution to another subproblem, i.e., they do not share resources”


Tops down and bottoms up
Tops Down and Bottoms Up

  • Top-down: problem is broken down to subproblems then solved using memoization to remember the solutions to subproblems already solved.

    Topdown:

    function fib(n)

    if n = 0 return 0

    if n = 1 return 1

    else return fib(n − 1) + fib(n − 2)

    Top down with memoization (not memorization)

    var m := map(0 → 1, 1 → 1)

    function fib(n)

    if map m does not contain key n

    m[n] := fib(n − 1) + fib(n − 2)

    return m[n]

  • Bottom-up: all subproblems must be solved in advance to build solutions to larger problems

    function fib(n)

    var previousFib := 0, currentFib := 1

    repeat n − 1 times

    var newFib := previousFib + currentFib

    previousFib := currentFib

    currentFib := newFib

    return currentFib


Biological sequence matching problems 1
Biological Sequence Matching Problems 1

  • DNA

    • Two strands

    • Four letter alphabet (four bases)

    • Base pairing rules

    • Strands are directional and, within a gene, only one strand is translated

  • RNA

    • Functional or intermediate step of protein manufacturing

    • Four letter alphabet

  • Proteins

    • 20 letter alphabet


Biological sequence matching problems 2
Biological Sequence Matching Problems 2

  • Applications

    • Identify strains of viruses, bacteria

    • Identify genes (hair, skin, eye color, height) and genetic basis for diseases (lethal or susceptibility to cancer, etc.)

    • Identify evolutionary relationships

  • Dynamic programming is the basis of BLAST (Basic Local Alignment Search Tool) – in top 3of most cited papers in recent bioscience history (was #1 in 1990s)


Sequence alignment algorithm 1
Sequence Alignment Algorithm 1

AGGCGGATC

TAGCATCTAC

-AGGCGGATC---

TAG-C--ATCTAC

Given two strings x = x1x2...xM, y = y1y2…yN,

Find the alignment with maximum score

F = (# matches)  m - (# mismatches)  s – (#gaps)  d


Sequence alignment algorithm 2
Sequence Alignment Algorithm 2

AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA

There are > 2N possible alignments.

AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC


Sequence alignment algorithm 3
Sequence Alignment Algorithm 3

Note:

The score of aligning x1……xM

y1……yN

is additive

Say that x1…xixi+1…xM

aligns to y1…yjyj+1…yN

Add the two scores:

F(x1…xM, y1…yN) = F(x1…xi, y1...yj) + F(xi+1…xm, yj+1…yN)


Sequence alignment algorithm 4
Sequence Alignment Algorithm 4

  • Original problem

    • Align x1…xM to y1…yN

  • Divide into a finite number of subproblems (non-overlapping for efficiency)

    • Align x1…xi to y1…yj

  • Subdivide the subproblem and construct the solution from smaller subproblems

  • Classic problem type for dynamic programming

    Let F(i, j) = optimal score of aligning

    x1……xi

    y1……yj

  • F is the “matrix” or “table” or “program.”Hence the term “dynamic programming.”


Sequence alignment algorithm 5
Sequence Alignment Algorithm 5

F = (# matches)  m - (# mismatches)  s – (# gaps)  d

F(i, j) calculated with scoring function s(xi, yj) or gap function g

Scoring function s(xi, yj)

Three cases:

  • xi aligns to yj

    x1……xi-1 xi

    y1……yj-1 yj

    2. xi aligns to a gap

    x1……xi-1 xi

    y1……yj -

  • yj aligns to a gap

    x1……xi -

    y1……yj-1 yj

diagonal move

m, if xi = yj

F(i, j) = F(i – 1, j – 1) +

-s, if not

horizontal move

F(i, j) = F(i – 1, j) – d

Gap function

vertical move

F(i, j) = F(i, j – 1) – d


Sequence alignment algorithm 6
Sequence Alignment Algorithm 6

How do we choose the case for each matrix position?

Assume that the subproblems are solved:

F(i, j – 1), F(i – 1, j), F(i – 1, j – 1) are optimal

Therefore,

F(i – 1, j – 1) + s(xi, yj)

F(i, j) = max F(i – 1, j) – d

F(i, j – 1) – d

Where s(xi, yj) = m, if xi = yj; -s, if not


Sequence alignment algorithm 7
Sequence Alignment Algorithm 7

Set d = 1, m = 1, s = -0.5

F(i – 1, j – 1) + s(xi, yj)

F(i, j) = max F(i – 1, j) – 1

F(i, j – 1) – 1

Where s(xi, yj) = 1, if xi = yj

-0.5, if not


Needleman wunsch algorithm 1 finds global optimal alignment
Needleman-Wunsch Algorithm 1:Finds Global Optimal Alignment

  • Initialization

    • F(0, 0) = 0

    • F(0, j) = - j  d

    • F(i, 0) = - i  d

  • Main IterationFilling-in partial alignments

    • For each i = 1……M

      For each j = 1……N

      F(i – 1,j – 1) + s(xi, yj) [case 1]

      F(i, j) = max F(i – 1, j) – d [case 2]

      F(i, j – 1) – d [case 3]

      % if [case 1]

      Ptr(i, j) = ! if [case 2]

      # if [case 3]

  • TerminationF(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment


Needleman wunsch algorithm 2
Needleman-Wunsch Algorithm 2

Initialization

F(0, 0) = 0

F(0, j) = - j  d

F(i, 0) = - i  d

(1) F(i – 1,j – 1) + s(xi, yj)

F(i, j) = max (2) F(i – 1, j) – d

(3) F(i, j – 1) – d

% (1)

Ptr(i, j) = ! (2)

# (3)


Smith waterman algorithm 1 finds local optimal alignment s
Smith-Waterman Algorithm 1:Finds local optimal alignment(s)

Ignore poorly aligned regions

  • Initialization

    • F(0, 0) = 0

    • F(0, j) = 0

    • F(i, 0) = 0

  • Main IterationFilling-in partial alignments

    • For each i = 1……M

      For each j = 1……N

      0

      F(i – 1,j – 1) + s(xi, yj) [case 1]

      F(i, j) = max F(i – 1, j) – d [case 2]

      F(i, j – 1) – d [case 3]

      % if [case 1]

      Ptr(i, j) = ! if [case 2]

      # if [case 3]

  • Termination

    F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment


Smith waterman algorithm 2
Smith-Waterman Algorithm 2

Initialization

F(0, 0) = 0

F(0, j) = 0

F(i, 0) = 0

(1) F(i – 1,j – 1) + s(xi, yj)

F(i, j) = max (2) F(i – 1, j) – d

(3) F(i, j – 1) – d

% (1)

Ptr(i, j) = ! (2)

# (3)




Overlap detection 1
Overlap Detection 1

  • When searching for matches of a short string in database of long strings, we don’t want to penalize overhangs

x

x

y

y

x1 …………………… xM

x1 …………………… xM

y1 ………… yN

y1 ………………… yN


Overlap detection 2
Overlap Detection 2

F(i – 1, 0)

F(i, 0) = maxF(i – 1, m) – T

F(i – 1,j – 1) + s(xi, yj)

F(i, j) = max F(i – 1, j) – d

F(i, j – 1) – d


Overlap detection 3
Overlap Detection 3

0

F(i – 1,j – 1) + s(xi, yj)

F(i, j) = max F(i – 1, j) – d

F(i, j – 1) – d

F(i – 1, 0)

F(i, 0) = maxF(i – 1, m) – T

Needleman-WunschwithOverlap Detection

Smith-WatermanwithOverlap Detection


Bounded dynamic programming
Bounded Dynamic Programming

Initialization:

F(i,0), F(0,j) undefined for i, j > k

Iteration:

For i = 1…M

For j = max(1, i – k)…min(N, i+k)

F(i – 1, j – 1)+ s(xi, yj)

F(i, j) = max F(i, j – 1) – d, if j > i – k(N)

F(i – 1, j) – d, if j < i + k(N)

Termination: same

x1 ………………………… xM

y1 ……………………… yN

k(N)


Largest common subsequence 1
Largest Common Subsequence 1

  • Initialization

    • F(0, 0) = 0

    • F(0, j) = 0

    • F(i, 0) = 0

  • Main Iteration

    • For each i = 1……M

      For each j = 1……N

      F(i – 1,j – 1) + 1, if xi = yj [case 1]

      F(i, j) = max F(i – 1, j), if not(xi = yj) [case 2]

      F(i, j – 1), if not(xi = yj) [case 3]

      % if [case 1]

      Ptr(i, j) = ! if [case 2]

      #if [case 3]

  • TerminationF(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment


Largest common subsequence 2
Largest Common Subsequence 2

Initialization

F(0, 0) = 0

F(0, j) = 0

F(i, 0) = 0

(1) F(i – 1,j – 1) + 1, if xi = yj

F(i, j) = max (2) F(i – 1, j), if not(xi = yj)

(3) F(i, j – 1), if not(xi = yj)

% (1)

Ptr(i, j) = ! (2)

# (3)


Largest common subsequence 3
Largest Common Subsequence 3

Cormen: error on page 353

Corrected (to obtain figure 15.6)

m = length[X]

n = length[Y]

for i = 1 to m

do c[i,0] = 0

for j = 0 to n

do c[0,j] = 0

for i = 1 to m

for j = 1 to n

if xi = yj then

c[i,j] = c[i-1, j-1] + 1]

b[i,j] = “%”

else if c[i-1, j] > c[i, j-1] then

c[i,j] = c[i-1, j]

b[i,j] = “!”

else

c[i,j] = c[i, j-1]

b[i,j] = “#”

return c and b


Performance
Performance

  • Running Time: O(mn) + O(m+n) for output

  • Storage: O(mn)

    • Possible to eliminate backpointer matrix for some problems

  • Improvements

    • Overlap detection

    • Partitioning: Find local alignments to seed global alignment

    • Bounded DP

    • Gap opening vs. gap extension

    • Biochemically significant scoring function


Sources
Sources

Altschul, S.F., et al. Basic Local Alignment Search Tool.J. Molec. Biol. 215(3): 403-10, 1990.

Bellman, Richard. Dynamic Programming. Princeton University Press, Princeton: 1957.

Cormen et al. Introduction to Algorithms. MIT Press, Cambridge: 2001.

Dreyfus, Stuart. 2002. Richard Bellman on the birth of dynamic programming. Operations Research 50: 48-51.

Durbin et al. Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, New York: 1998.

Gotoh, O. 1982. An improved algorithm for matching biological sequences. Journal of Molecular Biology 162: 705-708.

Gusfield, Dan. Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York: 1997.

Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48: 443-453.

Preiss. B.R. Data Structures and Algorithms with Object-Oriented Design Patterns in C#.

Smith, T. F. and Waterman, M.S. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147: 195-197.

Wikipedia


Sequence alignment algorithm x
Sequence Alignment Algorithm X

AGGCTATCACCTGACCTCCAGGCCGATGCCC

TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---

TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Given two strings x = x1x2...xM, y = y1y2…yN,

Find the alignment with maximum score

F = (# matches)  m - (# mismatches)  s – (#gaps)  d


ad