Bioinformatic PhD. course

Bioinformatic PhD. course . Bioinformatics Xavier Messeguer Peypoch (http://www.lsi.upc.es/~alggen) LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona Supercomputing Center Universitat Politècnica de Catalunya. Contents . 1. Biological introduction .

Bioinformatic PhD. course

Bioinformatics

Xavier Messeguer Peypoch (http://www.lsi.upc.es/~alggen)

LSI Dep. de Llenguatges i Sistemes Informàtics

BSC Barcelona Supercomputing Center

Universitat Politècnica de Catalunya

Contents

1. Biological introduction

2. Comparison of short sequences ( up to 10.000bps)

Dot Matrix Pairwise align. Multiple align. Hash alg.

3. Comparison of large sequences ( more that 10.000bps)

Data structures Suffix trees MUMs

4. String matching

Exact Extended Approximate

5. Sequence assembly

6. Projects: PROMO, MREPATT, …

Pairwise alignment

Recall that with two strings of length n

S2

C

A

-1

__

S1

O(n2)

22-1

1

And with 3 strings?

Multiple alignment

S2

S3

S1

3

2

What happens with three strings?

Let n be their length, then the cost becomes

A

C

A

-1

__

O(n3)

23-1

And with k strings?

O(nk 2k k2)

Multiple alignment programs

Multi-alignment programs:

• Malig (Progressive alignment)

http://alggen.lsi.upc.edu

• Clustal (Progressive alignment)

http://www.ebi.ac.uk/clustalw

• TCoffee (Progressive alignment + data bases)

http://igs-server.cnr-mrs.fr/Tcoffee_cgi/index.cgi

• HMM (Hidden Markov Models)
Multiple progressive alignment

Run alggen-program

RunMalig (Progressive alignment)

http://alggen.lsi.upc.edu

Run Clustal (Progressive alignment)

http://www.ebi.ac.uk/clustalw