sequence alignment tutorial 2
Download
Skip this Video
Download Presentation
Sequence Alignment Tutorial #2

Loading in 2 Seconds...

play fullscreen
1 / 17

Sequence Alignment Tutorial 2 - PowerPoint PPT Presentation


  • 209 Views
  • Uploaded on

Sequence Alignment Tutorial #2. © Ydo Wexler & Dan Geiger. Sequence Comparison. Much of bioinformatics involves sequences DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20. Global Alignment.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sequence Alignment Tutorial 2' - farren


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sequence alignment tutorial 2

Sequence AlignmentTutorial #2

© Ydo Wexler & Dan Geiger

.

sequence comparison
Sequence Comparison

Much of bioinformatics involves sequences

  • DNA sequences
  • RNA sequences
  • Protein sequences

We can think of these sequences as strings of letters

  • DNA & RNA: |alphabet|=4
  • Protein: |alphabet|=20
global alignment
Global Alignment

Input: two sequences over the same alphabet

Output: an alignment of the two sequences

Example:

  • GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA
  • A possible alignment:

-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

global alignment1

Hypotheses space

Best biological

explanaiton

Biological data

Global Alignment

-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Three elements:

  • Perfect matches
  • Mismatches
  • Insertions & deletions (indel)

Example (cont):

Symmetric view of evolution

global alignment scoring scheme
Global Alignmentscoring scheme

Score each position independently:

  • Match: +1
  • Mismatch: -1
  • Indel: -2

Score of an alignment is sum of position scores

Example:-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Score: (+1x13) + (-1x2) + (-2x4) = 3

------GCGCATGGATTGAGCGA

TGCGCC----ATTGATGACCA--

Score:(+1x5) + (-1x6) + (-2x11) = -23

sequence alignment variants
Sequence Alignment Variants

Two basic variants of sequence alignment:

  • Global alignment (Needelman-Wunsch)
  • Local alignment (Smith-Waterman)

Today we’ll see :

  • Overlap alignment
  • Affine cost for gaps

We’ll use ideas of dynamic programming presented in the lecture

overlap alignment
Overlap Alignment

Consider the following problem:

  • Find the most significant overlap between two sequences S,T ?
  • Possible overlap relations: a.

b.

Difference from local alignment:

Here we require alignment between the endpoints of the two sequences.

overlap alignment1
Overlap Alignment

Formally:

given S[1..n] , T[1..m] find i,j such that:

d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) }

is maximal.

Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.

overlap alignment2
Overlap Alignment
  • Initialization:V[i,0]=0,V[0,j]=0

Recurrence:as in global alignment

Score:maximum value at the bottom line and rightmost line

overlap alignment example
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

  • Match: +4
  • Mismatch: -1
  • Indel: -5
overlap alignment example1
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme :

  • Match: +4
  • Mismatch: -1
  • Indel: -5
overlap alignment example2
Overlap Alignment (Example)

S =PAWHEAE

T =HEAGAWGHEE

Scoring scheme:

  • Match: +4
  • Mismatch: -1
  • Indel: -5
overlap alignment example3

Scoring scheme :

  • Match: +4
  • Mismatch: -1
  • Indel: -5 -2
Overlap Alignment (Example)

The best overlap is:

PAWHEAE------

---HEAGAWGHEE

Pay attention!

A different scoring scheme could yield a different result, such as:

---PAW-HEAE

HEAGAWGHEE-

affine gap scores
Affine gap scores
  • Observation: Insertions and deletions often occur in blocks longer than a single nucleotide.
  • Consequence:
    • Current scoring scheme gives a constant penalty per gap unit.
    • This does not score well the above phenomenon.

Question: How do we modify the scheme to incorporate this?

alignment with affine gap scores
Alignment with affine gap scores
  • Penalty score for a gap of length g :

d - penalty for introduction of a gap

e - penalty for elongating the gap by one unit.

Typically d > e

  • Problem:

When aligning S[i] to a gap we do not know whether to penalize by d or e.

Solution: we compute 3 matrices simultaneously

M(i,j) - the score obtained by aligning S[i] to T[j]

IS(i,j) - the score obtained by aligning S[i]to a gap

IT(i,j) - the score obtained by aligning T[j]to a gap

affine gap scores1

We assume that a deletion will not be followed directly by an insertion.

This can be obtained by using

Affine gap scores
  • Initialization:depending on the problem (global, local,…)
  • Recurrence:uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’)
affine gap scores2
Affine gap scores
  • Simplification:

Why are two matrices enough?

ad