Basic String Alignment

1 / 19

# Basic String Alignment - PowerPoint PPT Presentation

Basic String Alignment. Probability theory and statistics String alignment problem Basic string alignment algorithms. Author: Roel Wijgers email: rwijgers@cs.uu.nl. Probability Theory. Conditional chance: P(A|B) = P(A / B) / P(B) Independence of A and B: when P(A / B) = P(A)P(B).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Basic String Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Basic String Alignment

Probability theory and statistics

String alignment problem

Basic string alignment algorithms

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Probability Theory
• Conditional chance:
• P(A|B) = P(A /\ B) / P(B)
• Independence of A and B:
• when P(A /\ B) = P(A)P(B)

Author: Roel Wijgers email: rwijgers@cs.uu.nl

String Alignment
• No gaps allowed:
• Gaps allowed in one of the strings:
• Gaps allowed in both strings:

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Matching models

The random model, i.e. each letter a occurs independently with some frequency qa

This means that the probability of two sequences x and y is defined as follows :

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Matching models(2)

Independence between values xiand yjis not very usefull:

odds ratio:

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Matching models(3)

We rather have an additional scoring system, i.e.:

This scoring system is called the log-odds ratio, and

associated with it is the log-likelihood ratio:

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Log likelihood table

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Gap penalties

We expect to penalise gaps. You can use different functions for this, although the linear function is most common to use:

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Gap penalties(2)

Where f(g) is a geometric distribution:

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Alignment algorithms

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Global alignment: Needleman-Wunsch algorithm

Find the optimal global alignment between 2 sequences, allowing gaps.

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Global alignment: Needleman-Wunsch algorithm(2)

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Local alignment: Smith-Waterman algorithm

Find the best alignment between subsequences of x and y.

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Local alignment: Smith-Waterman algorithm

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Repeated Matches

Search for multiple local matches.

• One of the sequences is fixed and

contains the domain or motif.

• We have some threshold T to exclude short

local alignments.

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Repeated Matches(2)

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Overlap matches

We expect that one of the sequences contains the other, or they overlap.

Author: Roel Wijgers email: rwijgers@cs.uu.nl

Overlap matches(2)

Author: Roel Wijgers email: rwijgers@cs.uu.nl

### Questions

Author: Roel Wijgers email: rwijgers@cs.uu.nl