Simple and fast linear space computation of l ongest c ommon s ubsequences
Download
1 / 36

Simple and fast linear space computation of L ongest c ommon s ubsequences - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Simple and fast linear space computation of L ongest c ommon s ubsequences. Claus Rick, 1999. A. What is the LCS problem?. A A B A C. A B C. …Finding a sequence of greatest possible length that can be obtained

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Simple and fast linear space computation of L ongest c ommon s ubsequences' - sydnee-gilliam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Simple and fast linear space computation of l ongest c ommon s ubsequences

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999


What is the lcs problem

A

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained

From both A and B by deleting zero or more (not necessarily adjacent) symbols.


Some boring history

A

Some boring history…


Pre info

A

Pre-Info

  • Divide and conquer

  • Midpoint


Some basic terms

A

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)=

(A,C)


A

Some basic terms

Match

A A B A C

A B C


Some basic terms1

A

Some basic terms

Chain

A A B A C

A B C


A

Rank k

A A B A C

A B C


Some basic terms2

A

Some basic terms

c b a b b a c a c

Matching Matrix

a

b

a

c

b

c

b

a


Some basic terms3

A

Some basic terms

Dominant matches

All Upper-left matches in each rank


A

Dominant matches

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5


A

A A B A C

A B C


A

c b a b b a c a c

a

b

a

c

b

c

b

a


Backward contours bc

A

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c


Some last basic terms

A

Some last basic terms

FCk

BCk


Forward contours fc

A

Forward contours (FC)

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5


Backward contours bc1

A

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c


Lemma 1

A

Lemma 1

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

  • There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.


Lemma 1 proof

A

Lemma 1- proof

P

|BC|- (p-k+1)

|FC|= (k)

K

<(p-k+1)

<(p-k+1)

P


Start calculating

A

Start calculating

FC1

BC1

FC2

BC2

Sooner or later…


Really really last terms

A

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi


A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M

c b a b b a c a c


A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M1

M2

M3

M4

M5

c b a b b a c a c


A

Let call the first empty Mi….

M p’


Lemma 2

A

Lemma 2

  • The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint


Lemma 2 proof

A

Lemma 2- proof

K

K-1

K-2

1

0

K=p

M k-1

M 0

M 2

M 1

M k


Little problem

A

Little problem…

  • We can`t keep tracks of each set- very expensive


A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c


What do we do

A

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.


A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c


Lets define

A

Lets define:

  • FCf’ , BCb’ the minimal indices as stated above


Lemma 3

A

Lemma 3

  • The Length of an LCS is b’ + f’ -1.


Complexity

A

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)


A

The End


Simple and fast linear space computation of longest common subsequence

Written by:

Claus Rick,1999

Based on algorithm by:

D.Hirschberg, 1975

Cast:

Matrices

Lines

Arrows

Squares

Blue

Red

Brown

Grey

Black

String A

String B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation


Appendix
Appendix subsequence

What is the LCS

Lemma 1

Divided And Conquer

Define M…

Match

Lemma 2

Chain

Keep just Dominant…

Dominant Matches

FC

Lemma 3

BC

Complexity


ad