1 / 22

Multiple Sequence Alignment

Multiple Sequence Alignment. Multiple Alignment versus Pairwise Alignment. Up until now we have only tried to align two sequences. What about more than two? And what for? A faint similarity between two sequences becomes significant if present in many

taini
Download Presentation

Multiple Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment

  2. Multiple Alignment versus Pairwise Alignment • Up until now we have only tried to align two sequences. • What about more than two? And what for? • A faint similarity between two sequences becomes significant if present in many • Multiple alignments can reveal subtle similarities that pairwise alignments do not reveal

  3. Generalizing the Notion of Pairwise Alignment • Alignment of 2 sequences is represented as a 2-row matrix • In a similar way, we represent alignment of 3 sequences as a 3-row matrix A T - G C G - A _ C G T - A A T C A C - A • Score: more conserved columns, better alignment

  4. Aligning Three Sequences source • Same strategy as aligning two sequences • Use a 3-D “Manhattan Cube”, with each axis representing a sequence to align • For global alignments, go from source to sink sink

  5. 2-D cell versus 2-D Alignment Cell In 2-D, 3 edges in each unit square In 3-D, 7 edges in each unit cube

  6. Architecture of 3-D Alignment Cell (i-1,j,k-1) (i-1,j-1,k-1) (i-1,j,k) (i-1,j-1,k) (i,j,k-1) (i,j-1,k-1) (i,j,k) (i,j-1,k)

  7. si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k + (vi, wj, _ ) si-1,j,k-1 + (vi, _, uk) si,j-1,k-1 + (_, wj, uk) si-1,j,k + (vi, _ , _) si,j-1,k + (_, wj, _) si,j,k-1 + (_, _, uk) Multiple Alignment: Dynamic Programming cube diagonal: no indels • si,j,k = max • (x, y, z) is an entry in the 3-D scoring matrix face diagonal: one indel edge diagonal: two indels

  8. Multiple Alignment: Running Time • For 3 sequences of length n, the run time is 7n3; O(n3) • For ksequences, build a k-dimensional Manhattan, with run time (2k-1)(nk); O(2knk) • Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time

  9. Practically speaking • Multiple alignment is a hard problem • Yet, it is of extreme practical importance • Comparing several species is a very common task • Doing this for the entire genome is an increasingly common demand! • Several heuristic-based algorithms have been developed that employ greedy, divide-and-conquer, dynamic programming approaches in various combinations • The algorithms we will see today are actually used by current multiple aligners

  10. Multiple Alignment Induces Pairwise Alignments Every multiple alignment induces pairwise alignments x: AC-GCGG-C y: AC-GC-GAG z: GCCGC-GAG Induces: x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAG y: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAG

  11. Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments: x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAG y: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAG can we construct a multiple alignment that induces them?

  12. Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments: x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAG y: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAG can we construct a multiple alignment that induces them? NOT ALWAYS Pairwise alignments may be inconsistent

  13. Multiple Alignment: Greedy Approach • Choose most similar pair of strings and combine into a profile , thereby reducing alignment of k sequences to an alignment of of k-1 sequences/profiles. Repeat • This is a heuristic greedy method u1= ACg/tTACg/tTACg/cT… u2 = TTAATTAATTAA… … uk = CCGGCCGGCCGG… u1= ACGTACGTACGT… u2 = TTAATTAATTAA… u3 = ACTACTACTACT… … uk = CCGGCCGGCCGG k-1 k

  14. Progressive Alignment • Progressive alignment is a variation of greedy algorithm with a somewhat more intelligent strategy for choosing the order of alignments.

  15. Clustal • Popular multiple alignment tool today • Uses “progressive alignment” • Three-step process 1.) Construct pairwise alignments 2.) Build Guide Tree 3.) Progressive Alignment guided by the tree

  16. v1 v2 v3 v4 v1 - v2 .17 - v3 .87 .28 - v4 .59 .33 .62 - Step 1: Pairwise Alignment • Aligns each sequence again each other giving a similarity matrix • Similarity = exact matches / sequence length (percent identity)

  17. Step 2: Guide Tree • Create Guide Tree using the similarity matrix • ClustalW uses the “neighbor-joining method” • Guide tree roughly reflects evolutionary relations

  18. v1 v2 v3 v4 v1 - v2 .17 - v3 .87 .28 - v4 .59 .33 .62 - Step 2: Guide Tree (cont’d) v1 v3 v4 v2 Calculate:v1,3 = alignment (v1, v3)v1,3,4 = alignment((v1,3),v4)v1,2,3,4 = alignment((v1,3,4),v2)

  19. Step 3: Progressive Alignment • Start by aligning the two most similar sequences • Following the guide tree, add in the next sequences, aligning to the existing alignment • An alignment is stored as a “consensus sequence”, to be aligned with other sequences or alignments later • Consensus sequence: Residue aif 75% of aligned sequences have an aat that position.Otherwise “X”.

  20. Evaluation • Evaluating alignment programs is very difficult • What is a benchmark here ? • We haven’t witnessed the process of evolution, so we cannot say for certain what the true alignment of “extant” sequences should be • One approach: “simulate” evolution

  21. Simulating evolution • Generate a random sequence and introduce realistic evolutionary changes to it, along branches of an assumed phylogeny • Substitutions, insertions, deletions, insertion & deletion rates, duplications, introduction of repeat elements, etc.

  22. Evaluating alignment • Once simulation done, take all the sequences at the leaf nodes of the phylogeny (started with root) • Align these sequences using software • Compare computed alignment and known (“true”) alignment • sensitivity and specificity

More Related