T cniques i eines bioinform tiques
Download
1 / 34

Tècniques i Eines Bioinformàtiques - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Tècniques i Eines Bioinformàtiques. Bioinformatics, Sequence and Genome Analysis David W. Mount Flexible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot Algorithms on strings (2001) M. Crochemore, C. Hancart and T. Lecroq

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Tècniques i Eines Bioinformàtiques' - doane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
T cniques i eines bioinform tiques
Tècniques i Eines Bioinformàtiques

  • Bioinformatics, Sequence and Genome Analysis

  • David W. Mount

  • Flexible Pattern Matching in Strings (2002)

  • Gonzalo Navarro and Mathieu Raffinot

  • Algorithms on strings (2001)

  • M. Crochemore, C. Hancart and T. Lecroq

  • http://www-igm.univ-mlv.fr/~lecroq/string/index.html


Algorismes i estructures eficients de cerca
Algorismes i estructures eficients de cerca

String matching: definition of the problem (text,pattern)

depends on what we have: text or patterns

  • Exact matching:

  • The patterns ---> Data structures for the patterns

  • 1 pattern ---> The algorithm depends on |p| and ||

  • k patterns ---> The algorithm depends on k, |p| and ||

  • The text ----> Data structure for the text (suffix tree, ...)


Exact string matching one pattern
Exact string matching: one pattern

How does the string algorithms made the search?

For instance, given the sequence

CTACTACTACGTCTATACTGATCGTAGCTACTACATGC

search for the pattern ACTGA.

and for the pattern TACTACGGTATGACTAA


Exact string matching brute force algorithm
Exact string matching: Brute force algorithm

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A

Example:

Given the pattern ATGTA, the search is

G T A C T A G A G G A C G T A T G T A C T G ...


Exact string matching brute force algorithm1
Exact string matching: Brute force algorithm

  • How the comparison is made?

From left to right: prefix

  • Which is the next position of the window?

The window is shifted only one cell

Text :

Pattern :

Text :

Pattern :


Exact string matching one pattern1
Exact string matching: one pattern

Text :

Pattern :

How does the matching algorithms made the search?

There is a sliding window along the text

against which the pattern is compared:

At each step the comparison is made and

the window is shifted to the right.

Which are the facts that differentiate the algorithms?

  • How the comparison is made.

  • The length of the shift.


Exact string matching one pattern text on line
Exact string matching: one pattern (text on-line)

Experimental efficiency (Navarro & Raffinot)

BNDM : Backward Nondeterministic Dawg Matching

| |

BOM : Backward Oracle Matching

64

32

16

Horspool

8

BOM

BNDM

4

Long. pattern

2

w

2 4 8 16 32 64 128 256


Horspool algorithm
Horspool algorithm

  • How the comparison is made?

Text :

Pattern :

Sufix search

  • Which is the next position of the window?

a

Text :

Pattern :

Shift until the next ocurrence of “a” in the pattern:

a

a

a

a

a

a

We need a preprocessing phase to construct the shift table.


Horspool algorithm example
Horspool algorithm : example

Given the pattern ATGTA

A

C

G

T

  • The shift table is:


Horspool algorithm example1
Horspool algorithm : example

Given the pattern ATGTA

A 4

C

G

T

  • The shift table is:


Horspool algorithm example2
Horspool algorithm : example

Given the pattern ATGTA

A 4

C 5

G

T

  • The shift table is:


Horspool algorithm example3
Horspool algorithm : example

Given the pattern ATGTA

A 4

C 5

G 2

T

  • The shift table is:


Horspool algorithm example4
Horspool algorithm : example

Given the pattern ATGTA

A 4

C 5

G 2

T 1

  • The shift table is:


Horspool algorithm example5
Horspool algorithm : example

Given the pattern ATGTA

A 4

C 5

G 2

T 1

  • The shift table is:

  • The searching phase:

G T A C T A G A G G A C G T A T G T A C T G ...

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A


Exemple algorisme de horspool
Exemple algorisme de Horspool

Given the pattern ATGTA

A 4

C 5

G 2

T 1

  • The shift table is:

  • The searching phase:

G T A C T A G A G G A C G T A T G T A C T G ...

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A

A T G T A


Q estions sobre l algorisme de horspool
Qüestions sobre l’algorisme de Horspool

A 4

C 5

G 2

T 1

Given the pattern ATGTA, the shift table is

Given a random text over an

equally likely probability distribution (EPD):

1.- Determine the expected shift of the window. And,

if the PD is not equally likely?

2.- Determine the expected number of shifts

assuming a text of length n.

3.- Determine the expected number of comparisons

in the suffix search phase


Exact string matching one pattern text on line1
Exact string matching: one pattern (text on-line)

Experimental efficiency (Navarro & Raffinot)

BNDM : Backward Nondeterministic Dawg Matching

| |

BOM : Backward Oracle Matching

64

32

16

Horspool

8

BOM

BNDM

4

Long. pattern

2

w

2 4 8 16 32 64 128 256


Bndm algorithm
BNDM algorithm

  • How the comparison is made?

Search for suffixes of T that are factors of

x

Text :

Pattern :

That is denoted as

D2 = 1 0 0 0 1 0 0

Once the next character x is read

D3 = D2<<1 & B(x)

B(x): mask of x in the pattern P.

For instance, if B(x) = ( 0 0 1 1 0 0 0)

D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 )

  • Which is the next position of the window?

Depends on the value of the leftmost bit of D


Bndm algorithm exaple
BNDM algorithm: exaple

B(A) = ( 1 0 0 0 1 )

B(C) = ( 0 0 0 0 0 )

B(G) = ( 0 0 1 0 0 )

B(T) = ( 0 1 0 1 0 )

  • The mask of characters is:

  • The searching phase:

G T A C T A G A G G A C G T A T G T A C T G ...

A T G T A

A T G T A

A T G T A

A T G T A

Given the pattern ATGTA

D1 = ( 0 1 0 1 0 )

D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 )

D1 = ( 0 0 1 0 0 )

D2 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 )

D1 = ( 1 0 0 0 1 )

D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 )

D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0) = ( 0 0 1 0 0 )

D4 = ( 0 1 0 0 0 ) & ( 0 0 0 0 0) = ( 0 0 0 0 0 )


Exemple algorisme bndm
Exemple algorisme BNDM

B(A) = ( 1 0 0 0 1 )

B(C) = ( 0 0 0 0 0 )

B(G) = ( 0 0 1 0 0 )

B(T) = ( 0 1 0 1 0 )

  • Given the pattern ATGTA

  • The mask of characters is :

  • The searching phase:

G T A C T A G A G G A C G T A T G T A C T G ...

A T G T A

A T G T A

D1 = ( 1 0 0 0 1 )

D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 )

D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 )

D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 )

D5 = ( 1 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 )

D6 = ( 0 0 0 0 0 ) & ( * * * * * ) = ( 0 0 0 0 0 )

Trobat!


Exemple algorisme bndm1
Exemple algorisme BNDM

B(A) = ( 1 0 0 0 1 )

B(C) = ( 0 0 0 0 0 )

B(G) = ( 0 0 1 0 0 )

B(T) = ( 0 1 0 1 0 )

  • The mask of characters is :

  • The searching phase:

G T A C T A G A A T A C G T A T G T A C T G ...

A T G T A

A T G T A

A T G T A

Given the pattern ATGTA

How the shif is determined?

D1 = ( 0 1 0 1 0 )

D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 )

D1 = ( 0 1 0 1 0 )

D2 = ( 1 0 1 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 )

D3 = ( 0 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )


Alg cerca exacta d un patr text on line
Alg. Cerca exacta d’un patró (text on-line)

Algorismes més eficients (Navarro & Raffinot)

BNDM : Backward Nondeterministic Dawg Matching

| |

BOM : Backward Oracle Matching

64

32

16

Horspool

8

BOM

BNDM

4

Long. patró

2

w

2 4 8 16 32 64 128 256


Aut mata factor oracle propietats
Autòmata Factor Oracle: propietats

G

T

A

G

T

T

A

G

T

A

G

T

A

G

T

T

A

G

T

A

L’estat reconeix tots els factors que acaben

a la quarta lletra T que no eren reconeguts:

GTAT, TAT, AT perque T ja ho era.

Reconeix tots els factors de de les primeres 4 lletres

Factor Oracle del mot G T A T G T A

Tots els estats són finals ==> Reconeix tots els factors …. i més

Hip: reconeix tots factors

de GTA


Aut mata factor oracle algorisme
Autòmata Factor Oracle: algorisme

?

Algorisme: per a i=1 fins p fer

Afegir transicions que reconeguin factors acabats a i;


Aut mata factor oracle algorisme1
Autòmata Factor Oracle: algorisme

T

T

Que passa si el següent caràcter existeix?


Aut mata factor oracle algorisme2
Autòmata Factor Oracle: algorisme

T

T

Que passa si el següent caràcter no existeix?


Aut mata factor oracle exemple d algorisme
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

i reconeix mots que no són factors com GTGTA.

Però, si no el reconeix ==> no és factor!

Es l’estratègia de l’algorisme BOM


Algorisme bom backward oracle matching
Algorisme BOM (Backward Oracle Matching)

  • Com fa la comparació?

Text :

Patró :

Autòmata: Factor Oracle

Comproba si el sufix és factor del patró

  • Com es determina la següent posició de la finestra?

a

  • Si la a no s’ha trobat

a

  • Si arriben a l’estat final de l’autòmat amb la a


Aut mata factor oracle exemple d algorisme1
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G A...

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


Aut mata factor oracle exemple d algorisme2
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G

A T G T A T G

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


Aut mata factor oracle exemple d algorisme3
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G

A T G T A T G

A T G T A T G

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


Aut mata factor oracle exemple d algorisme4
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G

A T G T A T G

A T G T A T G

A T G T A T G

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


Aut mata factor oracle exemple d algorisme5
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G ...

A T G T A T G

A T G T A T G

A T G T A T G

A T G T A T G

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


Aut mata factor oracle exemple d algorisme6
Autòmata Factor Oracle: exemple d’algorisme

G

T

A

G

T

T

A

G

T

A

  • I la cerca sobre el text :

G T A C T A G A A T G T G T A G A C A T G T A T G G T G ...

A T G T A T G

A T G T A T G

A T G T A T G

A T G T A T G

A T G T A T G

  • Com fa la comparació?

  • Es construeix l’autòmata del patró invers: Suposem que el patró és ATGTATG

A T G T A T G


ad