1 / 100

Sequence Alignment and Phylogenetic Analysis

Sequence Alignment and Phylogenetic Analysis. Evolution. Sequence Alignment. AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC. - AG G CTATCAC CT GACC T C CA GG C CGA -- TGCCC --- T AG - CTATCAC -- GACC G C -- GG T CGA TT TGCCC GAC. Definition

wirt
Download Presentation

Sequence Alignment and Phylogenetic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Alignment and Phylogenetic Analysis

  2. Evolution

  3. Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence

  4. Example

  5. The Blosum50 Scoring Matrix

  6. Multiple Alignment

  7. Example

  8. ClustalW • Popular multiple alignment tool today • ‘W’ stands for ‘weighted’ (different parts of alignment are weighted differently). • Three-step process 1.) Construct pairwise alignments 2.) Build Guide Tree 3.) Progressive Alignment guided by the tree

  9. Step 1: Pairwise Alignment

  10. Step 3: Progressive Alignment • Start by aligning the two most similar sequences • Following the guide tree, add in the next sequences, aligning to the existing alignment • Insert gaps as necessary

  11. Some Guidelines for Choosing the Right Sequences

  12. Gathering Sequences with BLAST • The most convenient way to select your sequences is to use a BLAST server • Some BLAST servers are integrated with multiple-alignment methods: • www.expasy.ch (protein only) • srs.ebi.ac.uk (DNA/protein) • npsa-pbil.ibcp.fr

  13. Selecting a Method • Many alternative methods exist for MSAs • Most of them use the progressive algorithm • They all are approximate methods • None is guaranteed to deliver the best alignments • All existing methods have pros and cons • ClustalW is the most popular (21,000 citations) • T-Coffee and ProbCons are more accurate but slower • MUSCLE is very fast, ideal for very large datasets

  14. ClustalW • www.ebi.ac.uk/clustalw • pir.georgetown.edu/pirwww/search/multialn.shtml • www.ddbj.nig.ac.jp/search/clustalw-e.html

  15. Tcoffee • TCOFFEE: www.tcoffee.org • CORE: evaluate MSA • MCOFFEE: run many and combine • EXPRESSO: with structural information

  16. Running Many Methods at Once • MCOFFEE is a a meta-method • It runs all the individual MSA methods • It gathers all the produced MSAs • It combines the MSAs into a single MSA • MCOFFEE is more accurate than any individual method • Its color output lets you estimate the reliability of your MSA • MCOFFEE is available on www.tcoffee.org

  17. Editing and Publishing Alignments

  18. Alignments and Formats • Many alternative formats exist for MSAs • One format does not always have a clear advantage over another • Changing formats is possible • Annotation information can sometimes be lost in a format change • Not all formats contain the same information • The annotation may change • Reformatting may cause the loss of annotation information

  19. The Most Common Sequence Formats

  20. Interleaved and Non-interleaved • The MSF Format • Interleaved • The FASTA Format • Non-interleaved

  21. Choosing Your Format • When choosing a format, ask yourself four questions: • Is it supported by the programs I need to use ? • Can my collaborators use it? • Can it support all of my annotation ? • Is it easy to read and manipulate ?

  22. Converting Formats • Don’t re-compute your MSA if it is not in the right format • Convert your file using one of the online conversion tools • The 3 most popular reformatting utilities: • Fmtseq The most complete • RESDSEQ Very popular and robust • SeqCheck Can clean FASTA sequences

  23. An Alignment CLUSTAL 2.1 multiple sequence alignment sp|P02620|PRVB_MERME ---------------------------------------------AFAGI 5 sp|P02622|PRVB_GADCA ---------------------------------------------AFKGI 5 sp|P02619|PRVB_ESOLU ---------------------------------------------SFAGL 5 sp|Q91482|PRVB1_SALSA --------------------------------------------MACAHL 6 sp|P43305|PRVU_CHICK --------------------------------------------MSLTDI 6 sp|P20472|PRVA_HUMAN --------------------------------------------MSMTDL 6 sp|P80079|PRVA_FELCA --------------------------------------------MSMTDL 6 sp|P02627|PRVA_RANES ---------------------------------------------PMTDL 5 sp|P02626|PRVA_AMPME ---------------------------------------------SMTDV 5 sp|P02586|TNNC2_RABIT MTDQQAEARSYLSEEMIAEFKAAFDMFDADGGGDISVKELGTVMRMLGQT 50 sp|P02620|PRVB_MERME LADADITAALAACKAEGS--FKHGEFFTKIG------LKGKSAADIKKVF 47 sp|P02622|PRVB_GADCA LSNADIKAAEAACFKEGS--FDEDGFYAKVG------LDAFSADELKKLF 47 sp|P02619|PRVB_ESOLU -KDADVAAALAACSAADS--FKHKEFFAKVG------LASKSLDDVKKAF 46 sp|Q91482|PRVB1_SALSA CKEADIKTALEACKAADT--FSFKTFFHTIG------FASKSADDVKKAF 48 sp|P43305|PRVU_CHICK LSPSDIAAALRDCQAPDS--FSPKKFFQISG------MSKKSSSQLKEIF 48 sp|P20472|PRVA_HUMAN LNAEDIKKAVGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVF 48 sp|P80079|PRVA_FELCA LGAEDIKKAVEAFTAVDS--FDYKKFFQMVG------LKKKSPDDIKKVF 48 sp|P02627|PRVA_RANES LAAGDISKAVSAFAAPES--FNHKKFFELCG------LKSKSKEIMQKVF 47 sp|P02626|PRVA_AMPME IPEADINKAIHAFKAGEA--FDFKKFVHLLG------LNKRSPADVTKAF 47 sp|P02586|TNNC2_RABIT PTKEELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECF 100 :: : :. * * : : * sp|P02620|PRVB_MERME GIIDQDKSDFVEEDELKLFLQNFSAGARALTDAETATFLKAGDSDGDGKI 97 sp|P02622|PRVB_GADCA KIADEDKEGFIEEDELKLFLIAFAADLRALTDAETKAFLKAGDSDGDGKI 97 sp|P02619|PRVB_ESOLU YVIDQDKSGFIEEDELKLFLQNFSPSARALTDAETKAFLADGDKDGDGMI 96 sp|Q91482|PRVB1_SALSA KVIDQDASGFIEVEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMI 98 sp|P43305|PRVU_CHICK RILDNDQSGFIEEDELKYFLQRFECGARVLTASETKTFLAAADHDGDGKI 98 sp|P20472|PRVA_HUMAN HMLDKDKSGFIEEDELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKI 98 sp|P80079|PRVA_FELCA HILDKDKSGFIEEDELGFILKGFYPDARDLSVKETKMLMAAGDKDGDGKI 98 sp|P02627|PRVA_RANES HVLDQDQSGFIEKEELCLILKGFTPEGRSLSDKETTALLAAGDKDGDGKI 97 sp|P02626|PRVA_AMPME HILDKDRSGYIEEEELQLILKGFSKEGRELTDKETKDLLIKGDKDGDGKI 97 sp|P02586|TNNC2_RABIT RIFDRNADGYIDAEELAEIFR---ASGEHVTDEEIESLMKDGDKNNDGRI 147 : *.: ..::: :** :: . :: * :: .* :.** * sp|P02620|PRVB_MERME GVEEFAAMV-----KG 108 sp|P02622|PRVB_GADCA GVDEFGALVDKWGAKG 113 sp|P02619|PRVB_ESOLU GVDEFAAMI-----KA 107 sp|Q91482|PRVB1_SALSA GIDEFAVLV-----KQ 109 sp|P43305|PRVU_CHICK GAEEFQEMV-----QS 109 sp|P20472|PRVA_HUMAN GVDEFSTLVA----ES 110 sp|P80079|PRVA_FELCA DVDEFFSLVA----KS 110 sp|P02627|PRVA_RANES GVDEFVTLVS----ES 109 sp|P02626|PRVA_AMPME GVDEFTSLVA----ES 109 sp|P02586|TNNC2_RABIT DFDEFLKMMEG---VQ 160 . :** ::

  24. READSEQ • http://www.ebi.ac.uk/cgi-bin/readseq.cgi

  25. Different Formats (PHYLIP)

  26. PHYLIP (no gap)

  27. Different Formats (MSF)

  28. Converting Formats Can Be Dangerous • Format conversion can result in data loss • After converting your file, you must make sure your data is still intact • The following slide shows the most common losses that occur during conversion

  29. Potential Information Loss When Converting MSAs

  30. Editing your MSA • If your MSA looks bad . . . • Don’t torture the online server • Edit the MSA yourself locally • Never, ever, ever (ever) use a standard word processor • Always use a dedicated MSA editor • The most popular online tool is Jalview • You can get it at www.jalview.org

  31. With Jalview You Can . . . • Modify your MSA • Remove some of the redundant sequences • Insert/remove gaps • Shift portions of the MSA • Modify the alignment of a sub-group of sequences • Recompute some portions of your alignment

  32. Click a sequence to select

  33. Drag to select columns

  34. Some Special Features of Jalview • Computation of a consensus sequence • Computation of a phylogenetic tree • Removal of the redundancy • Applying any color scheme to your MSA

  35. Preparing Your MSA for Publication • MSAs in publications usually come with shaded colors • You can improve your MSAs using online tools like Boxshade • Boxshade will shade your MSA according to its degree of conservation

  36. MSA => LOGO Graph • A LOGO graph summarizes an MSA • Tall letters indicate highly conserved positions • Short letters indicate poorly conserved positions • LOGO graphs are ideal for identifying conserved patterns • weblogo.berkeley.edu/

  37. Going Farther • Your imagination is the limit when it comes to making MSAs nice- looking and informative • Four very popular and easy-to-install MSA editors: • CINEMA • Seaview • Belvu • Kalignview • Boxshade is the simplest shading tool • If you need heavier capabilities, try Espript • Available at espript.ibpc.fr

  38. Molecular Evolution and Phylogenetic Reconstruction

  39. Early Evolutionary Studies • Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early 1960s • The evolutionary relationships derived from these relatively subjective observations were often inconclusive. Some of them were later proved incorrect

  40. Evolution and DNA Analysis: the Giant Panda Riddle • For roughly 100 years scientists were unable to figure out which family the giant panda belongs to • Giant pandas look like bears but have features that are unusual for bears and typical for raccoons, e.g., they do not hibernate • In 1985, Steven O’Brien and colleagues solved the giant panda classification problem using DNA sequences and algorithms

  41. Evolutionary Tree of Bears and Raccoons

More Related