Simple and fast linear space computation of L ongest c ommon s ubsequences

Download Presentation

Simple and fast linear space computation of L ongest c ommon s ubsequences

Loading in 2 Seconds...

- 54 Views
- Uploaded on
- Presentation posted in: General

Simple and fast linear space computation of L ongest c ommon s ubsequences

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999

A

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained

From both A and B by deleting zero or more (not necessarily adjacent) symbols.

A

A

- Divide and conquer
- Midpoint

A

Ordered Pair (i,j)

A A B A C

A B C

(2,3)=

(A,C)

A

Some basic terms

Match

A A B A C

A B C

A

Chain

A A B A C

A B C

A

Rank k

A A B A C

A B C

A

c b a b b a c a c

Matching Matrix

a

b

a

c

b

c

b

a

A

Dominant matches

All Upper-left matches in each rank

A

Dominant matches

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5

A

A A B A C

A B C

A

c b a b b a c a c

a

b

a

c

b

c

b

a

A

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c

A

FCk

BCk

A

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5

A

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c

A

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

- There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.

A

P

|BC|- (p-k+1)

|FC|= (k)

K

<(p-k+1)

<(p-k+1)

P

A

FC1

BC1

FC2

BC2

Sooner or later…

A

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M

c b a b b a c a c

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M1

M2

M3

M4

M5

c b a b b a c a c

A

Let call the first empty Mi….

M p’

A

- The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint

A

K

K-1

K-2

1

0

K=p

M k-1

M 0

M 2

M 1

M k

A

- We can`t keep tracks of each set- very expensive

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c

A

Keep only dominant matches…

When we see a dominant match below- done.

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c

A

- FCf’ , BCb’ the minimal indices as stated above

A

- The Length of an LCS is b’ + f’ -1.

A

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)

A

The End

Simple and fast linear space computation of longest common subsequence

Written by:

Claus Rick,1999

Based on algorithm by:

D.Hirschberg, 1975

Cast:

Matrices

Lines

Arrows

Squares

Blue

Red

Brown

Grey

Black

String A

String B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation

What is the LCS

Lemma 1

Divided And Conquer

Define M…

Match

Lemma 2

Chain

Keep just Dominant…

Dominant Matches

FC

Lemma 3

BC

Complexity