sequence alignment tutorial 3 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Sequence Alignment Tutorial #3 PowerPoint Presentation
Download Presentation
Sequence Alignment Tutorial #3

Loading in 2 Seconds...

play fullscreen
1 / 16

Sequence Alignment Tutorial #3 - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Sequence Alignment Tutorial #3. © Ydo Wexler & Dan Geiger. Sequence Alignment (Reminder). Global Alignment :. Input: two sequences S 1 , S 2 over the same alphabet Output: two sequences S’ 1 , S’ 2 of equal length ( S’ 1 , S’ 2 are S 1 , S 2 with possibly additional gaps)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sequence Alignment Tutorial #3' - ethel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sequence alignment tutorial 3

Sequence AlignmentTutorial #3

.

© Ydo Wexler & Dan Geiger

sequence alignment reminder
Sequence Alignment (Reminder)

Global Alignment:

Input: two sequences S1, S2 over the same alphabet

Output: two sequences S’1, S’2 of equal length

(S’1, S’2 are S1, S2 with possibly additional gaps)

Example:

  • S1= GCGCATGGATTGAGCGA
  • S2= TGCGCCATTGATGACC
  • A possible alignment:

S’1=-GCGC-ATGGATTGAGCGA

S’2= TGCGCCATTGAT-GACC--

Goal: How similar are two sequences S1 and S2

sequence alignment reminder3
Sequence Alignment (Reminder)

Local Alignment:

Input: two sequences S1, S2 over the same alphabet

Output: two sequences S’1, S’2 of equal length

(S’1, S’2 are substrings of S1, S2 with possibly additional gaps)

Example:

  • S1=GCGCATGGATTGAGCGA
  • S2=TGCGCCATTGATGACC
  • A possible alignment:

S’1=ATTGA-G

S’2= ATTGATG

Goal: Find the pair of substrings in two input sequences which have the highest similarity

sequence alignment reminder4
Sequence Alignment (Reminder)

-GCGC-ATGGATTGAGCGA

TGCGCCATTGAT-GACC-A

Three elements:

  • Perfect matches
  • Mismatches
  • Insertions & deletions (indel)
  • Score each position independently
  • Score of an alignment is sum of position scores
breaking number
Breaking Number
  • Input: Two sequences M,E over the same alphabet (|M|≥|E|)
  • Output: The smallest k, s.t. there exist partitions:

M=M1M2…Mk , E=E1E2…Ek s.t

Ei is a substring of Mi for all i = 1..k.

If no such k exists, then return ∞.

Example:

M=AAAATTTAAATTTA

E=AATTATA

M1=AAAATTT M2=AAATT M3=A

E1= AATT E2= AT E3=A

AAAATTTAAATTTA

--AATT---AT--A

Find an O(|M||E|) algorithm for finding the breaking number of M,E.

breaking number cont

(d)

(e)

Affine gap penalty

Breaking Number (cont)
  • Solution: Reduce the problem to global alignment with modifications:
    • Do not allow mismatches
    • Do not allow gaps in M
    • No penalty for gaps in start/end of sequence
    • Constant penalty for gaps (regardless of their length)
  • Scoring scheme:
    • Match – 0
    • Mismatch - -∞
    • Gap intr. - -1
    • Gap elong. -0

AAAATTTAAATTTA

--AATT---AT--A

breaking number = -score of the alignment + 1.

breaking number cont7
Breaking Number (cont)
  • Complexity: Standard O(|M||E|) Dynamic Programming
  • Correctness: Two-way argument
    • An alignment of score –(k-1) corresponds to a partition of M,E to k subsequences
    • A partition of M,E to k subsequences has an alignment score of –(k-1)
  • Optimal alignment has score of -∞ There is no valid partition(2)
  • Optimal alignment has score –k 
      • There is a valid partition to k+1 blocks (1)
      • There is no valid partition to less blocks (2)
multiple sequence alignment

A

-

T

A

G

-

G

T

T

G

G

G

G

T

G

G

-

-

T

-

A

T

T

A

-

-

A

-

T

A

C

C

A

C

C

C

-

G

C

-

G

-

Possible alignment

Possible alignment

Multiple Sequence Alignment

S1=AGGTC

S2=GTTCG

S3=TGAAC

multiple sequence alignment cont
Multiple Sequence Alignment (cont)
  • Input: Sequences S1, S2,…, Sk over the same alphabet
  • Output: Gapped sequences S’1, S’2,…, S’k of equal length
    • |S’1|= |S’2|=…= |S’k|
    • Removal of spaces from S’iobtains Si

Sum-of-pairs (SP) score for a multiple global alignment is the sum of scores of all pairwise alignments induced by it.

multiple sequence alignment example
Multiple Sequence Alignment Example

Consider the following alignment:

AC-CDB-

-C-ADBD

A-BCDAD

Scoring scheme: match - 0

mismatch/indel - -1

SP score:

-4

-3

-5

=-12

multiple sequence alignment complexity
Multiple Sequence AlignmentComplexity

Given kstrings of length n, there is a generalization of the DP algorithm that finds an optimal SP alignment:

  • Instead of a 2-dimensional table we have a k-dimensional table
  • Each dimension is of length ‘n’+1
  • Each entry depends on 2k-1 adjacent entries

Complexity:O(2knk)

This problem is known to be NP-hard (no polynomial-time algorithm)

multiple sequence alignment approximation algorithm
Multiple Sequence Alignment Approximation Algorithm

We use cost instead of score

 Find alignment of minimal cost

Assumption:the cost function δ is a distance function

  • δ(x,x) = 0
  • δ(x,y) = δ(y,x) ≥ 0
  • δ(x,y) + δ(y,z) ≥ δ(x,z) (triangle inequality)

(e.g. cost of MM ≤ cost of two indels)

D(S,T) - cost of minimum global alignment between S and T

multiple sequence alignment approximation algorithm13
Multiple Sequence Alignment Approximation Algorithm

The ‘star’ algorithm:

Input: Γ - set of k strings S1,…,Sk.

  • Find the string S’ (center) that minimizes
  • Denote S1=S’and the rest of the strings as S2,…,Sk
  • Iteratively add S2,…,Sk to the alignment as follows:
    • Suppose S1,…,Si-1are alreadyaligned as S’1,…,S’i-1
    • AlignSi to S’1 to produce S’i and S’’1 aligned
    • AdjustS’2,…,S’i-1by adding spaces where spaces were added to S’’1
    • Replace S’1 by S’’1
multiple sequence alignment approximation algorithm14

total complexity

Multiple Sequence Alignment Approximation Algorithm

Time analysis:

  • Choosing S1 – execute DP for all sequence-pairs - O(k2n2)
  • Adding Si to the alignment -execute DP for Si , S’1 - O(i·n2).

(In the ith stage the length of S’1can be up-to i· n)

multiple sequence alignment approximation algorithm15
Multiple Sequence Alignment Approximation Algorithm

Approximation ratio:

  • M* - optimal alignment
  • M - The alignment produced by this algorithm
  • d(i,j) - the distanceMinduces on the pair Si,Sj

For all i: d(1,i)=D(S1,Si)

(we perform optimal alignment between S’1 and Si and δ(-,-) = 0 )

multiple sequence alignment approximation algorithm16
Multiple Sequence Alignment Approximation Algorithm

Triangle inequality

Approximation ratio:

Definition of S1: