Simple and fast linear space computation of l ongest c ommon s ubsequences
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Simple and fast linear space computation of L ongest c ommon s ubsequences PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Simple and fast linear space computation of L ongest c ommon s ubsequences. Claus Rick, 1999. A. What is the LCS problem?. A A B A C. A B C. …Finding a sequence of greatest possible length that can be obtained

Download Presentation

Simple and fast linear space computation of L ongest c ommon s ubsequences

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Simple and fast linear space computation of l ongest c ommon s ubsequences

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999


What is the lcs problem

A

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained

From both A and B by deleting zero or more (not necessarily adjacent) symbols.


Some boring history

A

Some boring history…


Pre info

A

Pre-Info

  • Divide and conquer

  • Midpoint


Some basic terms

A

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)=

(A,C)


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

Some basic terms

Match

A A B A C

A B C


Some basic terms1

A

Some basic terms

Chain

A A B A C

A B C


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

Rank k

A A B A C

A B C


Some basic terms2

A

Some basic terms

c b a b b a c a c

Matching Matrix

a

b

a

c

b

c

b

a


Some basic terms3

A

Some basic terms

Dominant matches

All Upper-left matches in each rank


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

Dominant matches

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

A A B A C

A B C


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

c b a b b a c a c

a

b

a

c

b

c

b

a


Backward contours bc

A

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c


Some last basic terms

A

Some last basic terms

FCk

BCk


Forward contours fc

A

Forward contours (FC)

c b a b b a c a c

a

b

a

c

b

c

b

a

1

2

3

4

5


Backward contours bc1

A

Backward contours (BC)

a

b

a

c

b

c

b

a

5

4

3

2

1

c b a b b a c a c


Lemma 1

A

Lemma 1

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

  • There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.


Lemma 1 proof

A

Lemma 1- proof

P

|BC|- (p-k+1)

|FC|= (k)

K

<(p-k+1)

<(p-k+1)

P


Start calculating

A

Start calculating

FC1

BC1

FC2

BC2

Sooner or later…


Really really last terms

A

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M

c b a b b a c a c


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

M1

M2

M3

M4

M5

c b a b b a c a c


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

Let call the first empty Mi….

M p’


Lemma 2

A

Lemma 2

  • The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint


Lemma 2 proof

A

Lemma 2- proof

K

K-1

K-2

1

0

K=p

M k-1

M 0

M 2

M 1

M k


Little problem

A

Little problem…

  • We can`t keep tracks of each set- very expensive


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c


What do we do

A

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

c b a b b a c a c

a

b

a

c

b

c

b

a

a

b

a

c

b

c

b

a

c b a b b a c a c


Lets define

A

Lets define:

  • FCf’ , BCb’ the minimal indices as stated above


Lemma 3

A

Lemma 3

  • The Length of an LCS is b’ + f’ -1.


Complexity

A

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)


Simple and fast linear space computation of l ongest c ommon s ubsequences

A

The End


Simple and fast linear space computation of l ongest c ommon s ubsequences

Simple and fast linear space computation of longest common subsequence

Written by:

Claus Rick,1999

Based on algorithm by:

D.Hirschberg, 1975

Cast:

Matrices

Lines

Arrows

Squares

Blue

Red

Brown

Grey

Black

String A

String B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation


Appendix

Appendix

What is the LCS

Lemma 1

Divided And Conquer

Define M…

Match

Lemma 2

Chain

Keep just Dominant…

Dominant Matches

FC

Lemma 3

BC

Complexity


  • Login