1 / 27

Multiple sequence alignment

Multiple sequence alignment. Jarno Tuimala. Scoring matrices. Uses of matrices. Sequence alignment Database searches Phylogenetics Distances between sequences As evolutionary models For amino acids: PAM, Blosum, JTT… For DNA: IUB… (match 1.9, mismatch 0)

annona
Download Presentation

Multiple sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple sequence alignment Jarno Tuimala

  2. Scoring matrices

  3. Uses of matrices • Sequence alignment • Database searches • Phylogenetics • Distances between sequences • As evolutionary models • For amino acids: PAM, Blosum, JTT… • For DNA: IUB… (match 1.9, mismatch 0) • For evolutionary work, matrices are replaced by mathematical models, while working with DNA sequence data

  4. Adeniini Guaniini Muunnettu kuvista: http://www.bigchalk.com/cgi-bin/WebObjects/WOPortal.woa/wa/HWCDA/file?fileid=18373&flt=ga Sytosiini Tymiini

  5. An example of a DNA matrix • For local alignments with this matrix, gap opening -16 and extension of -4 are typically used.

  6. Sequence alignment

  7. How to align sequences • On paper / with computer • Description of alignment for computer: • scoring matrix • gap penalties • Aligning is not objective • Check the results computer gives you! • Alignments can be used for • searching conserved sequence areas • searching point mutations • studying evolution of genes and species

  8. Gap penalties • Gap are evolutionarily expensive. • Opening is more costly than extension • Affine gap model • Mathematically • P = c + gd • P is the total gap penalty • c is gap opening penalty • d is extension penalty • g is the (lenght of the gap - 1)

  9. How to calculate an alignment score? • match: +4 • mismatch: -5 • gap opening: -16 • gap extension: -4 • 4+4+(-4)+4+(-16)+4+4+4+4+4 = 12

  10. Multiple sequence alignment(MSA)

  11. What is MSA? • MSA is an alignment generated from three or more sequences. • MSA is usually a global alignment, i.e., the aim is to align homologous residues (nucleotides or amino acids) in columns across the length of the whole sequences. A--GT AC-GT ACGGT -CGGT

  12. Alignability of sequences • If the similarity of sequences drops too low, sequences can’t be reliably aligned (accuracy drops below acceptable). • For proteins <20% similarity • For DNA <~75% similarity • This cut-off is called twilight zone. • In other words, twilight zone marks the sequence similarity below which the observed similarity is mainly due to random variation, and not due to evolution.

  13. MSA and dynamic programming • There are methods that can produce the optimal alignment (in terms of gap penalties and scoring matrices), but they are computationally very heavy. • Program MSA uses dynamic programming • In practise, dynamic programming would be good for up to about 10 sequences, and is not usually used for MSA. • But for pairwise alignment it can be used.

  14. MSA methods • There are two popular methods to perform a multiple sequence alignment: • Progressive alignment • Clustal (ClustalW and ClustalX), Pileup… • Clustal is the most commonly used alignment program • Iterative alignment • SAGA… • We will review the Pileup method first

  15. Progressive alignment

  16. Progressive alignment • Produce pairwise alignment between all the sequences you want to align with MSA. • Dynamic programming, ktup-methods, dot matrix method…(you choose it) • Produce a “guide tree” on the basis of the pairwise distances calculated from pairwise alignments. • UPGMA, neighbor joining (you choose it) • Produce an MSA using the guide tree. • Sequences are aligned in the same order as the guide tree instructs.

  17. Pairwise alignments

  18. Pairwise distances No. of nucl. diffs. Absolute distance, used in Pileup/ Clustal JC-distance

  19. UPGMA • Unweighted Pair Group Method with Arithmetic mean • One of the fastest and tree construction methods • Used in Pileup (GCG package) • Clustal uses neighbor joining, but calculating NJ tree is much more demanding; thus, UPGMA is demonstrated here

  20. UPGMA tree

  21. Constructing MSA human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC maqaque CCCCCCCCCC human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC

  22. Score of alignment • 1234 • ACGT match=1 • ACGA mismatch=0 • AGGA • 1: A-A + A-A + A-A = 1+1+1 = 3 • 2: C-C + C-G + C-G =1+0+0 = 1 • 3: G-G + G-G + G-G = 1+1+1 = 3 • 4: T-A + T-A + A-A = 0+0+1 =1 • S(alignment) = S(1) + S(2) + S(3) + S(4) = 3+1+3+1 = 8 • The higher the score, the better the alignment

  23. Progressive alignment - pros and cons • Pros • Fast • Quite accurate • Cons • Once gaps are opened they can never be closed • Errors in the alignment of the first few sequences can have catastrophic effects on the whole alignment

  24. Muscle – both progressive and iterative

  25. Muscle algorithm From http://nar.oxfordjournals.org/cgi/content/full/32/5/1792/GKH340F2

  26. Muscle – comparison results • As fast as Clustal, but at the same time: • As accurate as T-COFFEE! • T-COFFEE was previously the most accurate alignment method (or software) available

More Related