1 / 90

# 資料結構與演算法 ( 上 ) - PowerPoint PPT Presentation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '資料結構與演算法 ( 上 )' - nerita

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### 資料結構與演算法(上)

http://www.csie.ntu.edu.tw/~hil/

Data Structures and Algorithms (I)

Outline of this slide
• Dynamic programming
• Fibonacci sequence
• Stamp problem
• Sequence alignment
• Matrix multiplication

Data Structures and Algorithms (I)

Leornardo Fibonacci1170-1250

Data Structures and Algorithms (I)

Old hens never die …They just lay eggs!
• At the beginning of Day 1, there is a hen.
• Each hen lays an egg every 24 hours.
• Each egg takes 24 hours to become a hen.
• F(n) = the number of hens at the end of day n.
• Give an algorithm to compute F(n).

Data Structures and Algorithms (I)

Day 1

Data Structures and Algorithms (I)

Day 2

Data Structures and Algorithms (I)

Day 3

Data Structures and Algorithms (I)

Day 4

Data Structures and Algorithms (I)

Day 5

Data Structures and Algorithms (I)

E(n) = the number of eggs at the end of Day n
• E(n) = ?
• F(n) = ?

Data Structures and Algorithms (I)

(

)

(

)

F

F

1

2

1

=

=

.

The recurrence relation

Data Structures and Algorithms (I)

The recursive algorithm

Data Structures and Algorithms (I)

(

(

)

)

(

)

k

I

O

F

i

i

n

1

6

1

8

0

3

3

9

9

t

t

t

t

t

a

e

s

m

e

o

c

o

m

p

u

e

n

u

s

n

g

:

h

l

h

i

i

t

t

e

r

e

c

u

r

s

v

e

a

g

o

r

m

.

3

6

5

7

6

1

6

1

8

0

0

1

8

9

3

0

3

5

7

1

1

0

£

=

:

:

3

0

1

6

1

8

0

0

1

8

5

9

3

2

5

9

=

:

:

Very Inefficient!

F(8)

F(7)

F(6)

F(6)

F(5)

F(5)

F(4)

F(5)

F(4)

F(4)

F(3)

F(4)

F(3)

F(3)

F(2)

Data Structures and Algorithms (I)

Dynamic-Programming Approach

The DP-algorithm takes only O(n) time and space!

Data Structures and Algorithms (I)

Illustration

Data Structures and Algorithms (I)

Dynamic Programming
• A clever way to implement recursion:
• Using storage to avoid unnecessarily duplicated efforts.
• 讓走過的留下痕跡

Data Structures and Algorithms (I)

Question
• 有沒有可能維持線性的時間，卻將空間降低到O(1)?

Data Structures and Algorithms (I)

### Another example

Choosing stamps

Data Structures and Algorithms (I)

The problem
• If the postage is n, what is the minimum number of stamps to cover the postage?

Data Structures and Algorithms (I)

A recursive algorithm

Data Structures and Algorithms (I)

The DP-version

The DP-algorithm takes only O(n) time and space!

Data Structures and Algorithms (I)

Illustration

Data Structures and Algorithms (I)

Question
• 剛剛只是問幾張郵票.
• 如果我們想要知道最少張郵票的貼法，究竟是每一種面額的郵票各幾張，應該如何處理？需要額外再花空間嗎？

Data Structures and Algorithms (I)

### Sequence Alignment

Data Structures and Algorithms (I)

Aligning two strings
• A = attgatcctag
• B = acttagtccttcgc
• A → a-ttga-tcc-tag-
• B → actt-agtccttcgc

gap

gap

gap

gap

gap

Data Structures and Algorithms (I)

Measuring an alignment

Scoring matrix

Data Structures and Algorithms (I)

BLAST matrix

Transition/Transversion matrix

Other scoring matrices

Data Structures and Algorithms (I)

Scoring matrix is an art
• Log odds matrix
• score[i, j] = log (q(i, j) / p(i) p(j)).
• PAM matrix
• Point accepted mutations
• BLOSOM matrix
• Block substitution matrix
• Steven Henikoff and Jorja G. Henikoff (1992).
• Other specialized scoring matrices
• Domenico Bordo and Patrick Argos (1991).
• Jean-Michael Claverie (JCB 1993).
• Lee F. Kowlakowski and Kenneth A. Rice (Nature 1994)

Data Structures and Algorithms (I)

Scoring an alignment
• a – t t g a – t c c – t a g -
• c c t t – a g t c c t t cg c

-2-1+2+2-1+2-1+2+2+2-1+2-2+2-1

• score = 7

Data Structures and Algorithms (I)

String alignment problem
• Input:
• two strings A and B; and
• a scoring table 分.
• Output:
• an alignment of A and B that has the maximum score with respect to 分.

Data Structures and Algorithms (I)

Q: Any naïve methods?
• A = attgatcctag
• B = ccttagtccttcgc

Data Structures and Algorithms (I)

Q: Is there a recursive method?
• A = attgatcctag
• B = ccttagtccttcgc

Data Structures and Algorithms (I)

(

)

f

l

i

i

t

n

a

g

n

m

n

;

(

)

f

i

0

0

t

m

n

r

e

u

r

n

;

=

=

l

t

e

x

y

z

1

;

=

=

=

(

)

f

d

i

0

0

>

>

m

a

n

n

(

)

[

[

]

[

]

]

l

l

S

A

B

i

1

1

t

¡

¡

+

e

x

a

g

n

m

n

c

o

r

e

m

n

;

=

;

;

(

)

f

i

0

>

m

(

)

[

[

]

]

l

l

S

A

i

1

t

¡

+

¡

e

y

a

g

n

m

n

c

o

r

e

m

;

=

;

;

(

)

f

i

0

>

n

(

)

[

[

]

]

l

l

S

B

i

1

t

¡

+

¡

e

z

a

g

n

m

n

c

o

r

e

n

;

=

;

;

(

)

t

r

e

u

r

n

m

a

x

x

y

z

;

;

;

g

Yes, but very inefficient!

Data Structures and Algorithms (I)

c

c

t

t

a

g

t

c

a

t

t

g

a

Alignment graph

Data Structures and Algorithms (I)

Each alignment corresponds to a maximal path on the alignment graph.

The score of an alignment is the score of its corresponding maximal path.

c

c

t

t

a

g

t

c

a

t

t

g

a

Observations

c c t t - a g t c

a - t t g a - - -

Data Structures and Algorithms (I)

Score of edges

B[j]

A[i]

Data Structures and Algorithms (I)

### The graph problem

Finding a maximal path with maximum score on the alignment graph (a directed acyclic graph)

Data Structures and Algorithms (I)

For each i = 0, 1,…, |A| and each j = 0, 1,…, |B|, let 點[i, j] keep the maximum score of aligning A[1…i] and B[1…j]. Idea

j

0

1

|B|

B[j]

0

1

A[i]

i

|A|

Data Structures and Algorithms (I)

An observation

Data Structures and Algorithms (I)

For example

c

c

t

t

a

g

t

c

0

-1

-2

-3

-4

-5

-6

-7

-8

a

-1

-2

-3

-4

-5

-2

-3

-4

-5

t

-2

-3

-4

-1

-2

-3

-4

-1

-2

t

-3

-4

-5

-2

1

0

-1

-2

-3

g

-4

-5

-6

-3

0

-1

2

1

0

a

-5

-6

-7

-4

-1

2

1

0

-1

Data Structures and Algorithms (I)

(

)

f

l

i

i

t

n

a

g

n

m

n

;

[

]

l

C

0

0

0

t

e

;

=

;

[

]

[

]

[

[

]

]

f

l

C

C

S

A

i

i

i

i

1

0

1

0

t

t

¡

+

¡

o

r

o

m

e

c

o

r

e

;

=

=

;

;

;

[

]

[

]

[

[

]

]

f

l

C

C

S

S

B

j

j

j

j

1

0

0

1

t

t

¡

+

¡

o

r

o

n

e

c

o

r

e

;

=

=

;

;

;

f

i

1

t

o

r

o

m

=

f

f

j

1

t

o

r

o

n

=

[

]

[

[

]

[

]

]

l

C

S

A

B

i

j

i

j

1

1

t

¡

¡

+

e

x

c

o

r

e

;

=

;

;

[

]

[

[

]

]

l

C

S

A

i

j

i

1

t

¡

+

¡

e

y

c

o

r

e

;

=

;

;

[

]

[

[

]

]

l

C

S

B

i

j

j

1

t

¡

+

¡

e

z

c

o

r

e

;

=

;

;

[

]

(

)

l

C

i

j

t

e

m

a

x

x

y

z

;

=

;

;

;

g

[

]

C

t

r

e

u

r

n

m

n

;

;

g

The DP-version.

Data Structures and Algorithms (I)

Complexity
• Space = O(|A|×|B|).
• Each node keeps a score and a pointer, and thus requires only O(1) space.
• Time = O(|A|×|B|).
• The content of each node can be obtained from those of at most three nodes in O(1) time.

Data Structures and Algorithms (I)

Question
• 剛剛只是算出最佳的成績.
• 如果我們想要知道得到這個最佳成績的alignment應該如何處理？需要額外再花空間嗎？

Data Structures and Algorithms (I)

For example

c

c

t

t

a

g

t

c

0

-1

-2

-3

-4

-5

-6

-7

-8

a

-1

-2

-3

-4

-5

-2

-3

-4

-5

t

-2

-3

-4

-1

-2

-3

-4

-1

-2

t

-3

-4

-5

-2

1

0

-1

-2

-3

g

-4

-5

-6

-3

0

-1

2

1

0

a

-5

-6

-7

-4

-1

2

1

0

-1

Data Structures and Algorithms (I)

Complexity
• Space = O(|A|×|B|).
• Each node keeps a score and a pointer, and thus requires only O(1) space.
• Time = O(|A|×|B|).
• The content of each node can be obtained from those of at most three nodes in O(1) time.

Data Structures and Algorithms (I)

### Challenge

Reducing the space complexity

Data Structures and Algorithms (I)

First attempt

c

c

t

t

a

g

t

c

0

-1

-2

-3

-4

-5

-6

-7

-8

a

-1

-2

-3

-4

-5

-2

-3

-4

-5

t

-2

-3

-4

-1

-2

-3

-4

-1

-2

t

-3

-4

-5

-2

1

0

-1

-2

-3

g

What is the problem?

-4

-5

-6

-3

0

-1

2

1

0

a

-5

-6

-7

-4

-1

2

1

0

-1

Data Structures and Algorithms (I)

### Knowing the maximum score, but …

Not knowing the corresponding alignment

Data Structures and Algorithms (I)

### Q: Can we deduce an optimal alignment from the optimal score?

Data Structures and Algorithms (I)

c

c

t

t

a

g

t

c

a

t

t

g

a

A key observation

Data Structures and Algorithms (I)

Time = O(|A||B|)?

Space = O(|A|)?

c

c

t

t

a

g

t

c

a

t

t

g

a

Finding an index i …

0

|B|/2

|B|

Data Structures and Algorithms (I)

The following two scores are the same

0

|B|/2

|B|

c

c

t

t

a

g

t

c

a

t

t

g

a

Trick

Data Structures and Algorithms (I)

|A1| |B1 | + |A2| |B2 | = |A| |B| / 2.

c

c

t

t

a

g

t

c

a

t

t

g

a

After locating the index i

0

|B|/2

|B|

Data Structures and Algorithms (I)

Overall complexity
• Time = O(|A||B|).
• Why?
• O(|A||B| + |A||B|/2 + |A||B|/4 + |A||B|/8 + …) = O(|A||B|).
• Space = O(|A|).
• Why?

Data Structures and Algorithms (I)

### Application 1

Longest common subsequence

Data Structures and Algorithms (I)

Subsequence
• For any indices 1 ≤ i1 < i2 < … <ik≤ |A|, A[i1] A[i2] A[i3]…A[ik] is a subsequence of A.
• For example, A = 0 1 1 0 1 0 1
• 0 1 1 1, 0 0 0, and 1 0 1 0 1 are subsequences of A.
• 0 1 0 1 1 0 is not a subsequence of A.

Data Structures and Algorithms (I)

Longest Common Subsequence
• Input: two strings A and B
• Output: a longest string C that is a subsequence of both A and B.

Any naïve algorithm?

Data Structures and Algorithms (I)

It’s an alignment problem…
• …with respect to the following scoring matrix:

Data Structures and Algorithms (I)

Why?
• Each alignment with score k corresponds to a common subsequence of length k.

0 1 1 – 1 0 - - 0 1 1 -

- 1 0 1 1 0 1 0 – 1 1 0

1 1 0 1 1

Data Structures and Algorithms (I)

### Application 2

Edit distance between two strings

Data Structures and Algorithms (I)

Edit operations
• Inserting a character at position i
• Deleting a character at position i
• Replacing a character at position i by a new character

Data Structures and Algorithms (I)

Edit distance
• The edit distance between two strings A and B is the minimum number of edit operations required to turn A into B.

Data Structures and Algorithms (I)

The edit distance problem
• Input: two strings A and B
• Output: the edit distance of A and B.

Any naïve algorithm?

Data Structures and Algorithms (I)

It’s an alignment problem…
• …with respect to the following scoring matrix:

Data Structures and Algorithms (I)

Why?
• Each alignment with score -k corresponds to a sequence of k edit operations that turns A into B.

0 1 1 – 1 0 - - 0 1 1 -

- 1 0 1 1 0 1 0 – 1 1 0

-1 -1-1 -1-1-1 -1

d

r

i

i

i

d

i

Data Structures and Algorithms (I)

### A challenge

Speeding up the edit-distance algorithm

Data Structures and Algorithms (I)

The challenge
• Input: two strings A and B with |A| ≤ |B|.
• Output: the edit distance k between A and B.
• Objective:
• Time: O(k|A|).
• Note that we do not know k in advance, since otherwise it does not make any sense to solve this problem. 

Data Structures and Algorithms (I)

c

c

t

t

a

g

t

c

a

t

t

g

a

Observation 1
• Although we do not know k, we still know |B| – |A| ≤k. Why?

Data Structures and Algorithms (I)

Why?

c

c

t

t

a

g

t

c

a

t

t

g

a

Observation 2

Data Structures and Algorithms (I)

c

c

t

t

a

g

t

c

a

t

t

g

a

Just computing 點[i,j] for 2k+1 diagonals

Data Structures and Algorithms (I)

Some thoughts
• Idea:
• it suffices to evaluate 點[i, j] for all indices i and j with |i – j| ≤ k.
• But we don’t know k…
• Modified idea:
• It suffices to evaluate 點[i, j] for all indices i and j with |i – j| ≤ t for some number t ≥ k.
• Q: How to find such a t?

Data Structures and Algorithms (I)

A key lemma
• Suppose we evaluate only those 點[i, j] with |i – j| ≤ t.
• If t ≥ k, then 點[|A|,|B|] = – k ≥ – t.
• If t < k, then 點[|A|,|B|] ≤ – k < – t.
• Therefore, we can determine whether t ≥ k by whether 點[|A|,|B|] ≥ – t after evaluating those 2t + 1 diagonals.

Why?

Data Structures and Algorithms (I)

Algorithm
• For s = 1, 2, 4, 8, …
• Let t = s (|B| – |A|);
• Evaluate those 點[i, j] with |i – j| ≤ t.
• If 點[|A|,|B|] ≥ – t, then return – 點[|A|,|B|];

Data Structures and Algorithms (I)

Time complexity: O(k|A|)
• Each iteration takes time O(t|A|).
• The last iteration dominates the time complexity, since at the beginning of each iteration the value of t is increased by a factor of 2.
• In the last iteration, we have t < 2k. Why?
• The last iteration takes time O(k|A|).

Data Structures and Algorithms (I)

### Matrix multiplication

Data Structures and Algorithms (I)

b

b

h

h

L

A

L

B

T

i

i

t

t

t

t

t

£

£

e

e

a

p

q

m

a

r

x

e

e

a

q

r

m

a

r

x

e

n

e

.

.

d

f

d

h

h

h

A

B

A

B

M

i

i

t

t

t

t

t

£

£

p

r

o

u

c

o

a

n

s

e

p

r

m

a

r

x

s

u

c

a

X

[

]

[

]

[

]

k

k

M

A

B

i

j

i

j

¢

=

;

;

;

k

·

·

1

q

h

l

d

f

l

l

d

d

h

d

i

i

i

i

j

i

j

1

1

·

·

·

·

t

o

s

o

r

a

n

c

e

s

a

n

w

p

a

n

r

.

(

)

(

)

k

b

f

d

h

?

I

O

A

B

A

B

W

i

i

t

t

t

t

t

£

a

e

s

p

q

r

m

e

o

o

a

n

r

o

m

a

n

y

.

Multiplying two matrices

Data Structures and Algorithms (I)

Illustration

=

Data Structures and Algorithms (I)

Illustration

=

Data Structures and Algorithms (I)

Illustration

=

Data Structures and Algorithms (I)

(

)

A

B

C

A

B

C

£

£

£

£

=

(

)

A

B

C

£

£

=

:

h

d

b

b

l

d

b

f

f

d

T

A

B

C

i

i

i

i

t

t

t

£

£

e

m

e

r

e

q

u

r

e

y

o

a

n

n

g

c

o

u

e

a

e

c

e

b

h

h

l

l

¯

i

i

i

t

t

t

t

y

w

c

w

o

m

a

r

c

e

s

m

u

p

y

r

s

.

Multiplying 3 matrices

Data Structures and Algorithms (I)

1

1

£

£

£

£

£

£

n

n

n

n

n

n

n

n

n

n

h

l

l

T

i

i

t

3

2

(

(

)

)

e

o

v

e

r

a

m

e

s

£

£

n

n

=

=

2

3

3

(

)

(

)

(

)

£

£

£

+

n

n

n

=

:

An example

Data Structures and Algorithms (I)

1

1

1

1

£

£

£

£

£

£

n

n

n

n

n

n

n

n

h

l

l

T

i

i

t

2

2

(

(

)

)

e

o

v

e

r

a

m

e

s

£

£

n

n

=

=

2

2

2

(

)

(

)

(

)

£

£

£

+

n

n

n

=

:

An example

Data Structures and Algorithms (I)

F

`

`

`

f

I

A

i

i

i

t

t

t

n

p

u

:

s

e

q

u

e

n

c

e

o

p

o

s

v

e

n

e

g

e

r

s

0

1

n

;

;

:

:

:

;

,

h

w

e

r

e

`

h

b

f

f

d

M

i

i

t

t

I

s

e

n

u

m

e

r

o

r

o

w

s

o

m

a

r

x

a

n

i

i

1

¡

,

`

h

b

f

l

f

M

i

i

t

t

I

s

e

n

u

m

e

r

o

c

o

u

m

n

s

o

m

a

r

x

i

i

.

F

d

f

f

h

O

A

i

i

1

t

t

t

t

¡

u

p

u

:

n

o

r

e

r

o

p

e

r

o

r

m

n

g

o

s

e

n

m

a

r

x

l

l

h

b

f

i

i

i

i

i

i

i

t

t

t

t

m

u

p

c

a

o

n

s

n

e

m

n

m

u

m

n

u

m

e

r

o

o

p

e

r

a

o

n

s

b

h

d

i

t

t

t

t

o

o

a

n

e

p

r

o

u

c

M

M

M

£

£

£

¢

¢

¢

1

2

n

:

The problem

Data Structures and Algorithms (I)

`

`

`

`

`

`

`

`

`

`

`

`

`

`

3

4

0

1

1

2

3

5

2

6

6

5

0

4

Illustration

Data Structures and Algorithms (I)

Any naïve algorithm?

Data Structures and Algorithms (I)

(

)

b

h

b

f

L

C

i

i

i

i

j

t

t

t

e

e

e

m

n

m

u

m

n

u

m

e

r

o

o

p

e

r

a

o

n

s

;

d

b

h

d

i

i

t

t

t

t

r

e

q

u

r

e

o

o

a

n

e

p

r

o

u

c

M

M

M

£

£

£

¢

¢

¢

i

i

j

1

+

:

l

l

h

b

f

d

C

i

i

i

i

t

t

e

a

r

y

e

m

n

m

u

m

n

u

m

e

r

o

o

p

e

r

a

o

n

s

r

e

q

u

r

e

,

b

h

d

f

l

l

l

i

i

i

t

t

t

t

t

t

o

o

a

n

e

p

r

o

u

c

o

a

n

m

a

r

c

e

s

s

e

x

a

c

y

(

)

C

1

n

;

.

C(i,j)

Data Structures and Algorithms (I)

(

)

C

i

j

;

(

f

i

i

j

0

¸

(

(

)

(

)

)

=

k

k

`

`

`

h

C

C

i

i

i

j

1

t

+

+

+

m

n

o

e

r

w

s

e

k

i

j

1

¡

;

;

:

k

·

i

j

<

Recurrence

Data Structures and Algorithms (I)

Dynamic programming

Data Structures and Algorithms (I)

3

(

)

F

l

T

O

i

i

t

m

e

c

o

m

p

e

x

y

:

n

.

2

(

)

F

l

S

O

i

t

p

a

c

e

c

o

m

p

e

x

y

:

n

.

2

(

)

h

b

d

d

?

C

i

t

t

I

a

n

s

e

r

e

u

c

e

o

o

n

Complexity

Data Structures and Algorithms (I)

### Have a wonderful weekend!!

See you next time

Data Structures and Algorithms (I)