1 / 36

# Simple and fast linear space computation of L ongest c ommon s ubsequences - PowerPoint PPT Presentation

Simple and fast linear space computation of L ongest c ommon s ubsequences. Claus Rick, 1999. A. What is the LCS problem?. A A B A C. A B C. …Finding a sequence of greatest possible length that can be obtained

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Simple and fast linear space computation of L ongest c ommon s ubsequences' - sydnee-gilliam

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained

From both A and B by deleting zero or more (not necessarily adjacent) symbols.

Some boring history…

Pre-Info

• Divide and conquer

• Midpoint

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)=

(A,C)

Some basic terms

Match

A A B A C

A B C

Some basic terms

Chain

A A B A C

A B C

Rank k

A A B A C

A B C

Some basic terms

c b a b b a c a c

Matching Matrix

a

b

a

c

b

c

b

a

Some basic terms

Dominant matches

All Upper-left matches in each rank

Dominant matches

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5

A A B A C

A B C

c b a b b a c a c

a

b

a

c

b

c

b

a

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c

Some last basic terms

FCk

BCk

Forward contours (FC)

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c

Lemma 1

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

• There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.

Lemma 1- proof

P

|BC|- (p-k+1)

|FC|= (k)

K

<(p-k+1)

<(p-k+1)

P

Start calculating

FC1

BC1

FC2

BC2

Sooner or later…

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M

c b a b b a c a c

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M1

M2

M3

M4

M5

c b a b b a c a c

Let call the first empty Mi….

M p’

Lemma 2

• The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint

Lemma 2- proof

K

K-1

K-2

1

0

K=p

M k-1

M 0

M 2

M 1

M k

Little problem…

• We can`t keep tracks of each set- very expensive

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c

Lets define:

• FCf’ , BCb’ the minimal indices as stated above

Lemma 3

• The Length of an LCS is b’ + f’ -1.

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)

The End

Written by:

Claus Rick,1999

Based on algorithm by:

D.Hirschberg, 1975

Cast:

Matrices

Lines

Arrows

Squares

Blue

Red

Brown

Grey

Black

String A

String B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation

Appendix subsequence

What is the LCS

Lemma 1

Divided And Conquer

Define M…

Match

Lemma 2

Chain

Keep just Dominant…

Dominant Matches

FC

Lemma 3

BC

Complexity