Loading in 5 sec....

Sequence Alignment Tutorial #2PowerPoint Presentation

Sequence Alignment Tutorial #2

- 209 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Sequence Alignment Tutorial 2' - farren

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Presentation Transcript

Sequence Comparison

Much of bioinformatics involves sequences

- DNA sequences
- RNA sequences
- Protein sequences
We can think of these sequences as strings of letters

- DNA & RNA: |alphabet|=4
- Protein: |alphabet|=20

Global Alignment

Input: two sequences over the same alphabet

Output: an alignment of the two sequences

Example:

- GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA
- A possible alignment:
-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Best biological

explanaiton

Biological data

Global Alignment-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Three elements:

- Perfect matches
- Mismatches
- Insertions & deletions (indel)

Example (cont):

Symmetric view of evolution

Global Alignmentscoring scheme

Score each position independently:

- Match: +1
- Mismatch: -1
- Indel: -2
Score of an alignment is sum of position scores

Example:-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Score: (+1x13) + (-1x2) + (-2x4) = 3

------GCGCATGGATTGAGCGA

TGCGCC----ATTGATGACCA--

Score:(+1x5) + (-1x6) + (-2x11) = -23

Sequence Alignment Variants

Two basic variants of sequence alignment:

- Global alignment (Needelman-Wunsch)
- Local alignment (Smith-Waterman)
Today we’ll see :

- Overlap alignment
- Affine cost for gaps
We’ll use ideas of dynamic programming presented in the lecture

Overlap Alignment

Consider the following problem:

- Find the most significant overlap between two sequences S,T ?
- Possible overlap relations: a.
b.

Difference from local alignment:

Here we require alignment between the endpoints of the two sequences.

Overlap Alignment

Formally:

given S[1..n] , T[1..m] find i,j such that:

d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) }

is maximal.

Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.

Overlap Alignment

- Initialization:V[i,0]=0,V[0,j]=0

Recurrence:as in global alignment

Score:maximum value at the bottom line and rightmost line

Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

- Match: +4
- Mismatch: -1
- Indel: -5

Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

- Match: +4
- Mismatch: -1
- Indel: -5

Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme:

- Match: +4
- Mismatch: -1
- Indel: -5

- Match: +4
- Mismatch: -1
- Indel: -5 -2

The best overlap is:

PAWHEAE------

---HEAGAWGHEE

Pay attention!

A different scoring scheme could yield a different result, such as:

---PAW-HEAE

HEAGAWGHEE-

Affine gap scores

- Observation: Insertions and deletions often occur in blocks longer than a single nucleotide.

- Consequence:
- Current scoring scheme gives a constant penalty per gap unit.
- This does not score well the above phenomenon.

Question: How do we modify the scheme to incorporate this?

Alignment with affine gap scores

- Penalty score for a gap of length g :
d - penalty for introduction of a gap

e - penalty for elongating the gap by one unit.

Typically d > e

- Problem:
When aligning S[i] to a gap we do not know whether to penalize by d or e.

Solution: we compute 3 matrices simultaneously

M(i,j) - the score obtained by aligning S[i] to T[j]

IS(i,j) - the score obtained by aligning S[i]to a gap

IT(i,j) - the score obtained by aligning T[j]to a gap

We assume that a deletion will not be followed directly by an insertion.

This can be obtained by using

Affine gap scores- Initialization:depending on the problem (global, local,…)
- Recurrence:uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’)

Download Presentation

Connecting to Server..