multiple sequence alignment n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Multiple Sequence Alignment PowerPoint Presentation
Download Presentation
Multiple Sequence Alignment

Loading in 2 Seconds...

play fullscreen
1 / 23

Multiple Sequence Alignment - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

A - T. A G -. G T T. G G G. G T G. G - -. T - A. T T A. - - A. - T A. C C A. C C C. - G C. - G -. Possible alignment. Possible alignment. Multiple Sequence Alignment. S 1 = AGGTC. S 2 = GTTCG. S 3 = TGAAC. Multiple Sequence Alignment (cont).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multiple Sequence Alignment' - badrani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple sequence alignment

A

-

T

A

G

-

G

T

T

G

G

G

G

T

G

G

-

-

T

-

A

T

T

A

-

-

A

-

T

A

C

C

A

C

C

C

-

G

C

-

G

-

Possible alignment

Possible alignment

Multiple Sequence Alignment

S1=AGGTC

S2=GTTCG

S3=TGAAC

multiple sequence alignment cont
Multiple Sequence Alignment (cont)

Input: Sequences S1, S2,…, Sk over the same alphabet

Output: Gapped sequences S’1, S’2,…, S’k of equal length

  • |S’1|= |S’2|=…= |S’k|
  • Removal of spaces from S’iobtains Si

Sum-of-pairs (SP) score for a multiple global alignment is the sum of scores of all pairwise alignments induced by it.

multiple sequence alignment example
Multiple Sequence Alignment Example

Consider the following alignment:

AC-CDB-

-C-ADBD

A-BCDAD

Scoring scheme: match - 0

mismatch/indel - -1

SP score:

-4

-3

-5

=-12

multiple sequence alignment complexity
Multiple Sequence AlignmentComplexity
  • Given kstrings of length n, there is a generalization of the DP algorithm that finds an optimal SP alignment:
    • Instead of a 2-dimensional table we have a k-dimensional table
    • Each dimension is of length ‘n’+1
    • Each entry depends on 2k-1 adjacent entries

Complexity:O(2knk)

This problem is known to be NP-hard (no polynomial-time algorithm)

multiple sequence alignment approximation algorithm
Multiple Sequence Alignment Approximation Algorithm
  • We use cost instead of score
  •  Find alignment of minimal cost
  • Assumption:the cost function δ is a distance function
        • δ(x,x) = 0
        • δ(x,y) = δ(y,x) ≥ 0
        • δ(x,y) + δ(y,z) ≥ δ(x,z) (triangle inequality)
          • (e.g. cost of MM ≤ cost of two indels)

D(S,T) - cost of minimum global alignment between S and T

multiple sequence alignment approximation algorithm1
Multiple Sequence Alignment Approximation Algorithm
  • The ‘star’ algorithm:
  • Input: Γ - set of k strings S1,…,Sk.
    • Find the string S’ (center) that minimizes
    • Denote S1=S’and the rest of the strings as S2,…,Sk
    • Iteratively add S2,…,Sk to the alignment as follows:
      • Suppose S1,…,Si-1are alreadyaligned as S’1,…,S’i-1
      • AlignSi to S’1 to produce S’i and S’’1 aligned
      • AdjustS’2,…,S’i-1by adding spaces where spaces were added to S’’1
      • Replace S’1 by S’’1
multiple sequence alignment approximation algorithm2

total complexity

Multiple Sequence Alignment Approximation Algorithm
  • Time analysis:
  • Choosing S1 – execute DP for all sequence-pairs - O(k2n2)
  • Adding Si to the alignment -execute DP for Si , S’1 - O(i·n2).
    • (In the ith stage the length of S’1can be up-to i· n)
multiple sequence alignment approximation algorithm3
Multiple Sequence Alignment Approximation Algorithm
  • Approximation ratio:
  • M* - optimal alignment
  • M - The alignment produced by this algorithm
  • d(i,j) - the distanceMinduces on the pair Si,Sj

For all i: d(1,i)=D(S1,Si)

(we perform optimal alignment between S’1 and Si and δ(-,-) = 0 )

multiple sequence alignment approximation algorithm4
Multiple Sequence Alignment Approximation Algorithm

Triangle inequality

Approximation ratio:

Definition of S1:

multiple sequence alignment reminder

A

-

T

A

G

-

G

T

T

G

G

G

G

T

G

G

-

-

T

-

A

T

T

A

-

-

A

-

T

A

C

C

A

C

C

C

-

G

C

-

G

-

Possible alignment

Possible alignment

Multiple Sequence AlignmentReminder

S1=AGGTC

S2=GTTCG

S3=TGAAC

multiple sequence alignment reminder1
Multiple Sequence AlignmentReminder

Input: Sequences S1, S2,…, Sk over the same alphabet

Output: Gapped sequences S’1, S’2,…, S’k of equal length

  • |S’1|= |S’2|=…= |S’k|
  • Removal of spaces from S’iobtains Si

Sum-of-pairs (SP) score for a multiple global alignment is the sum of scores of all pairwise alignments induced by it.

multiple sequence alignment reminder2
Multiple Sequence AlignmentReminder
  • The ‘star’ algorithm:
  • Input: Γ - set of k strings S1,…,Sk.
    • Find the string S1 (center) that minimizes
    • Iteratively add S2,…,Sk to the alignment
  • Finds MA costing at most twice the optimal cost!

Problem: Conventional MA does not model correctly evolutionary relationships

tree alignment
Tree Alignment
  • Input:X - set of sequences
  • T – phylogenetic tree on X (leaves labeled by X)
  • Output: labels on internal vertices of T, s.t. sum of costs of all edges of T is minimal.
  • How do we label internal vertices?
    • Sequences
    • Profiles (multiple alignments)
profile alignment

A

-

T

G

G

G

G

-

-

T

T

A

-

T

A

C

C

C

-

G

-

Profile Alignment

A profile of a MA of length n over alphabet Σ is a (|Σ|+1)*n table.

Column i holds the distribution of Σ (and gap) in that position

: 3

profile alignment1
Profile Alignment
  • Aligning a sequence to a profile:
  • Matching letter to position: weighted average of scores
  • Indels: introducing new columns gets special consideration
  • (same goes for aligning two profiles)

: 3

clustal algorithm
Clustal Algorithm
  • Iteratively constructs MA for intermediate nodes
  • At each point holds profiles for all leaves
  • Chooses closest pair of neighbors
    • neighbors – have common father in T
    • distance - cost of optimal (pairwise) alignment
  • Aligns the two profiles to get the ‘father-profile’
  • Replaces the two leaves with their father
  • Analysis:
  • Initialization – O(k2) alignments
  • k-1 iterations
  • Iteration i involves k-i-1 new pairwise alignments

ClustalW – more advanced version.

Sequences/profiles are weighted

lifted tree alignments

S4

S4

S5

S2

S5

S1

S2

S3

S4

S6

Lifted Tree Alignments

Lifted tree alignment –

each internal node is labeled by one of the labels of its daughters

Internal nodes are sequences and not profiles

Example:

We’ll show:

DP algorithm for optimal lifted tree alignment

Optimal lifted alignment is 2-approximation of optimal tree alignment

lifted tree alignments algorithm

S4

S4

S2

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsAlgorithm

Input:X - set of sequences

T – phylogenetic tree on X (leaves labeled by X)

Output:lifted labels on internal vertices of T, s.t. sum of costs of all edges of T is minimal.

Basic principle: calculate for every node v in T, and sequence S in X:

d(v,S) - the optimal cost of v’s subtree when it is labeled by S

The cost of optimal tree is

lifted tree alignments algorithm1

S4

S4

S2

O(k2depth(T))=O(k3)

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsAlgorithm

d(v,S) - the optimal cost of v’s subtree when it is labeled by S

Initialization: for leaf v labeled Sv -

Recurrence: for internal node v with daughters u1,…ul -

Correctness: check for suboptimal solution property

Complexity:O(k2) pairwise alignments - O(n2k2).

k-1 iterations

For internal node v - O(kv2) work

Total: O(k2(n2+depth(T)))

lifted tree alignments approximation analysis

S4

S4

S2

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsApproximation analysis
  • Claim: Optimal LTA 2-approximates general tree alignments
  • We’ll show construction of LTA which costs at most twice the optimal TA with sequence-labeled nodes
  • (? can be generalized for profile-labeled nodes ?)
  • Notations:
  • T* - optimal TA labels
  • Sv* - label of node v in T*
  • TL– our constructed LTA
  • SvL - label of node v in TL
lifted tree alignments approximation analysis1

S4

S4

S2

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsApproximation analysis
  • Construction:
  • We label the nodes bottom-up.
  • For node v with daughters u1,…ul –
    • we choose the label (from Su1L ,…,SulL) closest to Sv*
  • We need to show: D(TL) ≤ 2D(T*)
lifted tree alignments approximation analysis2

S4

S4

triangle inequality

choice of i

triangle inequality

S2

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsApproximation analysis
  • Analysis:
  • Some edges in TL have cost 0
  • Observe edges (v,u) of cost > 0:
    • Si- label of father(v)
    • Sj- label of daughter (u)
    • P(v,u) – the path in T* from v to the leaf labeled by Sj
    • D(Si,Sj) ≤ D(Si,Sv*) + D(Sj,Sv*) ≤ 2D(Sj,Sv*) ≤ 2D(P(v,u))
lifted tree alignments approximation analysis3

Q.E.D.

S4

S4

S2

S5

S5

S1

S2

S3

S4

S6

Lifted Tree AlignmentsApproximation analysis
  • D(Si,Sj) ≤ 2D(P(v,u))

If (u,v) and (u’,v’) are two different edges with cost > 0 in TL, then P(u,v) and P(u’,v’) are mutually disjoint in edges

  • Final Remarks:
  • Lifted tree alignment TL is only conceptual (we don’t have T*)
  • Optimal LTA cannot cost more than TL
  • In case of profile-labeled nodes:
      • construction and analysis OK when cost is still distance function