Sequence alignment tutorial 2
Download
1 / 17

Sequence Alignment Tutorial 2 - PowerPoint PPT Presentation


  • 209 Views
  • Uploaded on

Sequence Alignment Tutorial #2. © Ydo Wexler & Dan Geiger. Sequence Comparison. Much of bioinformatics involves sequences DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20. Global Alignment.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sequence Alignment Tutorial 2' - farren


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sequence alignment tutorial 2

Sequence AlignmentTutorial #2

© Ydo Wexler & Dan Geiger

.


Sequence comparison
Sequence Comparison

Much of bioinformatics involves sequences

  • DNA sequences

  • RNA sequences

  • Protein sequences

    We can think of these sequences as strings of letters

  • DNA & RNA: |alphabet|=4

  • Protein: |alphabet|=20


Global alignment
Global Alignment

Input: two sequences over the same alphabet

Output: an alignment of the two sequences

Example:

  • GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA

  • A possible alignment:

    -GCGC-ATGGATTGAGCGA

    TGCGCCATTGAT-GACC-A


Global alignment1

Hypotheses space

Best biological

explanaiton

Biological data

Global Alignment

-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Three elements:

  • Perfect matches

  • Mismatches

  • Insertions & deletions (indel)

Example (cont):

Symmetric view of evolution


Global alignment scoring scheme
Global Alignmentscoring scheme

Score each position independently:

  • Match: +1

  • Mismatch: -1

  • Indel: -2

    Score of an alignment is sum of position scores

Example:-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Score: (+1x13) + (-1x2) + (-2x4) = 3

------GCGCATGGATTGAGCGA

TGCGCC----ATTGATGACCA--

Score:(+1x5) + (-1x6) + (-2x11) = -23


Sequence alignment variants
Sequence Alignment Variants

Two basic variants of sequence alignment:

  • Global alignment (Needelman-Wunsch)

  • Local alignment (Smith-Waterman)

    Today we’ll see :

  • Overlap alignment

  • Affine cost for gaps

    We’ll use ideas of dynamic programming presented in the lecture


Overlap alignment
Overlap Alignment

Consider the following problem:

  • Find the most significant overlap between two sequences S,T ?

  • Possible overlap relations: a.

    b.

Difference from local alignment:

Here we require alignment between the endpoints of the two sequences.


Overlap alignment1
Overlap Alignment

Formally:

given S[1..n] , T[1..m] find i,j such that:

d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) }

is maximal.

Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.


Overlap alignment2
Overlap Alignment

  • Initialization:V[i,0]=0,V[0,j]=0

Recurrence:as in global alignment

Score:maximum value at the bottom line and rightmost line


Overlap alignment example
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

  • Match: +4

  • Mismatch: -1

  • Indel: -5


Overlap alignment example1
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

  • Match: +4

  • Mismatch: -1

  • Indel: -5


Overlap alignment example2
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme:

  • Match: +4

  • Mismatch: -1

  • Indel: -5


Overlap alignment example3

Scoring scheme :

  • Match: +4

  • Mismatch: -1

  • Indel: -5 -2

Overlap Alignment (Example)

The best overlap is:

PAWHEAE------

---HEAGAWGHEE

Pay attention!

A different scoring scheme could yield a different result, such as:

---PAW-HEAE

HEAGAWGHEE-


Affine gap scores
Affine gap scores

  • Observation: Insertions and deletions often occur in blocks longer than a single nucleotide.

  • Consequence:

    • Current scoring scheme gives a constant penalty per gap unit.

    • This does not score well the above phenomenon.

Question: How do we modify the scheme to incorporate this?


Alignment with affine gap scores
Alignment with affine gap scores

  • Penalty score for a gap of length g :

    d - penalty for introduction of a gap

    e - penalty for elongating the gap by one unit.

Typically d > e

  • Problem:

    When aligning S[i] to a gap we do not know whether to penalize by d or e.

Solution: we compute 3 matrices simultaneously

M(i,j) - the score obtained by aligning S[i] to T[j]

IS(i,j) - the score obtained by aligning S[i]to a gap

IT(i,j) - the score obtained by aligning T[j]to a gap


Affine gap scores1

We assume that a deletion will not be followed directly by an insertion.

This can be obtained by using

Affine gap scores

  • Initialization:depending on the problem (global, local,…)

  • Recurrence:uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’)


Affine gap scores2
Affine gap scores an insertion.

  • Simplification:

Why are two matrices enough?


ad