cap5510 bioinformatics multiple alignment n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CAP5510 – Bioinformatics Multiple Alignment PowerPoint Presentation
Download Presentation
CAP5510 – Bioinformatics Multiple Alignment

Loading in 2 Seconds...

play fullscreen
1 / 35
cara-hayes

CAP5510 – Bioinformatics Multiple Alignment - PowerPoint PPT Presentation

130 Views
Download Presentation
CAP5510 – Bioinformatics Multiple Alignment
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. CAP5510 – BioinformaticsMultiple Alignment Tamer Kahveci CISE Department University of Florida

  2. Goals • Understand • What is multiple alignment • Why align multiple sequences • Learn • How multiple alignments are scored • Major multiple alignment methods • Dynamic programming • Standard • MSA • Progressive alignment • Star • CLUSTALW

  3. What is Multiple Alignment? • Alignment of more than two sequences • Global: multiple alignment • http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/ scxa_buteu vrdgyiaddk dcayfcgr.. .naycdeeck ...kgaesgk cwyagqygna scx1_titse .kdgypveyd ncayicwnyd .naycdklck ..dkkadsgy cyw...vhil scx6_titse .regypadsk gckitcflta .agycntect ..lkkgssgy caw.....pa scx1_cenno .kdgylvdak gckkncyklg kndycnrecr mkhrggsygy c.....ygfg six2_leiqu ..dgyirkrd gcklsclfg. .negcnkeck ..syggsygy cwt...wgla scxa_buteu cwcyklpdwv pikqkvsgk. cn.... scx1_titse cycyglpdse ptktn..gk. cksgkk scx6_titse cycyglpesv kiwtsetnk. c..... scx1_cenno cyceglsdst ptwplp.nkt csgk.. six2_leiqu cwceglpd.e ktwksetn.t cg....

  4. What is Local Multiple Alignment? • Local: motif • Local: motif (http://blocks.fhcrc.org/blocks-bin/getblock.sh?PR00624 ) ID HISTONEH5; BLOCK AC PR00624A; distance from previous block=(9,12) DE Histone H5 signature BL adapted; width=22; seqs=9; 99.5%=986; strength=1407 H10_HUMAN|P07305 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 H5A_XENLA|P22844 ( 11) AKPKRSKALKKSTDHPKYSDMI 71 H10_RAT|P43278 ( 10) AKPKRAKAAKKSTDHPKYSDMI 70 H10_MOUSE|P10922 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 Q91759 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5B_XENLA|P22845 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5_CHICK|P02259 ( 11) AKPKRVKASRRSASHPTYSEMI 100 H5_CAIMO|P06513 ( 12) AKPKRAKAPRKPASHPSYSEMI 91 H5_ANSAN|P02258 ( 12) AKPKRARAPRKPASHPTYSEMI 100

  5. Why Multiple Alignment • Basis for phylogeny • Helps find conserved regions in sets of proteins • Conserved regions • Provide insight into substitution patterns • Gives hints about functional sites

  6. How to Evaluate Multiple Alignments

  7. Sum of Pairs (SP) • Sum of induced pairwise alignment score of all pairs • Ignore space pairs aligned together A cwcyklpdwv pikqkvsgk cn.... B cycyglpdse ptktn..gk cksgkk A cwcyklpdwv pikqkvsgk cn C cycyglpesv kiwtsetnk c. A cwcyklpdwv pikqkvsgk. cn.. D cyceglsdst ptwplp.nkt csgk A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. + B cycyglpdse ptktn..gk cksgkk C cycyglpesv kiwtsetnk c..... B cycyglpdse ptktn.gk. cksgkk D cyceglsdst ptwplpnkt csgk.. C cycyglpesv kiwtsetnk. c... D cyceglsdst ptwplp.nkt csgk

  8. BAliBASE Benchmark • Compare to a set of hand-aligned sequences • Check positions of letters • If the letters appear at the same position as the benchmark => good • Score between 0 ( ) and 1 ( ) • http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/prog_scores.html

  9. Finding Multiple Alignments

  10. Dynamic Programming

  11. Dynamic Programming • Similar to pairwise alignment • Compare NV and NS 22-1 = 3 cases N + V N S N + V N - N + - N S NV NS S = max V If k sequences are aligned • => k-dimensional matrix is filled

  12. Dynamic Programming A S V k=3 2k –1=7 cases

  13. Complexity • Space complexity: O(nk) for k sequences each n long. • Computing at a cell: O(2k). cost of computing δ. • Time complexity: O(2knk). cost of computing δ. • Finding the optimal solution is exponential in k • Proven to be NP-complete for a number of cost functions

  14. MSA (Carrillo, Lipman’ 88)

  15. MSA – Idea 2 3 1

  16. MSA algorithm (1/3) • Find pairwise alignment • Trial multiple alignment produced by a tree, cost = d • This provides a limit to the volume within which optimal alignments are found • Specifics • Sequences x1, .., xr. • Alignment A, cost = c(A) • Optimal alignment A* • Aij = induced alignment on xi, .., xj on account of A • D(xi,xj) = cost of optimal pairwise alignment of xi,xj <= c(Aij )

  17. MSA algorithm (2/3) • d >= c(A*) = c(A*uv) + Σ c(A*ij) >= c(A*uv) + Σ D(xi,xj) • c(A*uv) <= d - Σ D(xi,xj) = B(u,v) • Compute B(u,v) for each pair of u,v • Consider any cell f with projection (s,t) on u,v plane. • If A* passes through f then A*uv passes through (s,t) • beststuv = best pairwise alignment of xu,xv that passes through (s,t). • beststuv = distance of the prefixes up to (s,t) + cost(xsi,xsj) + distance of suffixes after (s,t) i < j (i,j) ≠ (u,v) i < j (i,j) ≠ (u,v) i < j (i,j) ≠ (u,v)

  18. MSA algorithm (3/3) • If beststuv > B(u,v), then • A* cannot pass through cell f • Discard such cells from computation of DP

  19. Question Align: s1: MPE s2: MKE s3: MSKE s4: SKE BLOSUM 62

  20. Progressive Alignment

  21. Star Alignment

  22. Star Alignments • Heuristic method for multiple sequence alignments • Select a sequence c as the center of the star • For each sequence x1, …, xk such that xi c, perform a Needleman-Wunsch global alignment for xi and c

  23. MPE | | MKE MSKE | || M-KE SKE || MKE M-PE M-KE MSKE S-KE M-PE M-KE MSKE Star Alignments Example s1: MPE s2: MKE s3: MSKE s4: SKE s3 s1 s2 MPE MKE s4 • All induced pairwise alignments to the center sequence is the optimal one. • How should we choose a center? (Exercise: try s4 as the center) • Try all of them?

  24. CLUSTAL-W (Thompson, Higgins, Gibson 1994)

  25. CLUSTAL-W (1/4) • Given sequences A, B, C, D, E • Compare all pairs and construct a distance matrix

  26. A E A E B C D B C D A B C D E A E B C D CLUSTAL-W (2/4) • Find phylogenetic tree for A, B, C, D, E using neighbor joining

  27. CLUSTAL-W (3/4) • Align sequences starting from leaf level • Edge weights are used to compute the score of the alignment • O(k2n2) time • O(n2) space • Result depends on sequence order A B C D E

  28. CLUSTAL-W (4/4) • Sample query using ClustalW • http://www.cise.ufl.edu/~tamer/teaching/fall2007/other/sampleMSAquery • http://www.ebi.ac.uk/clustalw/

  29. Other Progressive Methods • T-COFFEE • PILUP • Muscle • …

  30. T-coffee (Notredame, Higgins, Heringa 2000) • Find a library of alignments between pairs of sequences. • Create a new scoring matrix for each pair of sequences using the library • Directly from alignment of s1 and s2 • Indirectly through alignment of s1, s3 and s3, s2. s1 • Use these scoring matrices during progressive alignment s2 Scoring matrix for s1 and s2

  31. Iterative Alignment

  32. PRRP (Gotoh 1996) • Motivation: If the initial sequences are not good ones, progressive alignment fails. • Idea: Iteratively update the alignment

  33. PRRP A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. E cyceglpdst piwplp.nkt ctgk.. 1. Find some initial alignment 2. Construct phylogenetic tree based on multiple alignment A B C D E Go back if the result has improved A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. E cyceglpdst piwplp.nkt ctgk.. 3. Align sequences

  34. Other methods • Genetic algorithm (machine learning) • Partial order graphs (graph matching) • HMMER (hidden markov model) • For a comparison: • http://www.cise.ufl.edu/~tamer/papers/psb2006.pdf

  35. Motif Logos ID HISTONEH5; BLOCK AC PR00624A; distance from previous block=(9,12) DE Histone H5 signature BL adapted; width=22; seqs=9; 99.5%=986; strength=1407 H10_HUMAN|P07305 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 H5A_XENLA|P22844 ( 11) AKPKRSKALKKSTDHPKYSDMI 71 H10_RAT|P43278 ( 10) AKPKRAKAAKKSTDHPKYSDMI 70 H10_MOUSE|P10922 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 Q91759 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5B_XENLA|P22845 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5_CHICK|P02259 ( 11) AKPKRVKASRRSASHPTYSEMI 100 H5_CAIMO|P06513 ( 12) AKPKRAKAPRKPASHPSYSEMI 91 H5_ANSAN|P02258 ( 12) AKPKRARAPRKPASHPTYSEMI 100