1 / 38

Multiple sequence alignments and motif discovery

Tutorial 5. Multiple sequence alignments and motif discovery. Multiple sequence alignments and motif discovery. Multiple sequence alignment ClustalW Muscle Motif discovery MEME Jaspar. A. C. D. B. Multiple Sequence Alignment. More than two sequences DNA Protein

demetriusl
Download Presentation

Multiple sequence alignments and motif discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 5 Multiple sequence alignments and motif discovery

  2. Multiple sequence alignments and motif discovery • Multiple sequence alignment • ClustalW • Muscle • Motif discovery • MEME • Jaspar

  3. A C D B Multiple Sequence Alignment • More than two sequences • DNA • Protein • Evolutionary relation • Homology  Phylogenetic tree • Detect motif GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

  4. A C D B Multiple Sequence Alignment • Dynamic Programming • Optimal alignment • Exponential in #Sequences • Progressive • Efficient • Heuristic GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

  5. ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

  6. ClustalW • Progressive • At each step align two existing alignments or sequences • Gaps present in older alignments remain fixed -TGTTAAC -TGT-AAC -TGT--AC ATGT---C ATGT-GGC

  7. ClustalW - Input http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Scoring matrix Gap scoring Output format Email address

  8. ClustalW - Output Match strength in decreasing order: * : .

  9. ClustalW - Output

  10. ClustalW - Output

  11. ClustalW - Output

  12. ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score

  13. ClustalW - Output

  14. ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .

  15. ClustalW - Output

  16. ClustalW - Output Branch length

  17. ClustalW - Output

  18. ClustalW - Output

  19. Muscle http://www.ebi.ac.uk/Tools/muscle/index.html

  20. Muscle - output

  21. What’s the difference between Muscle and ClustalW? ClustalW Muscle

  22. http://www.megasoftware.net/index.html

  23. Can we find motifs using multiple sequence alignment? 1 3 5 7 9 ..YDEEGGDAEE.. ..YDEEGGDAEE.. ..YGEEGADYED.. ..YDEEGADYEE.. ..YNDEGDDYEE.. ..YHDEGAADEE.. * :** *: • Motif • A widespread pattern with a biological significance

  24. Can we find motifs using multiple sequence alignment? YES! NO

  25. MEME – Multiple EM* for Motif finding • http://meme.sdsc.edu/ • Motif discovery from unaligned sequences • Genomic or protein sequences • Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence) *Expectation-maximization

  26. Email address How many times in each sequence? MEME - Input Input file (fasta file) Range of motif lengths How many motifs? How many sites?

  27. MEME - Output Motif score

  28. MEME - Output Motif score Motif length Number of times

  29. MEME - Output Low uncertainty = High information content

  30. MEME - Output Multilevel Consensus

  31. MEME - Output Position in sequence Strength of match Sequence names Motif within sequence

  32. MEME - Output Sequence names Motif location in the input sequence Overall strength of motif matches

  33. MAST http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi • Searches for motifs (one or more) in sequence databases: • Like BLAST but motifs for input • Similar to iterations of PSI-BLAST • Profile defines strength of match • Multiple motif matches per sequence • Combined E value for all motifs • MEME uses MAST to summarize results: • Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

  34. MEME - Input Email address Database Input file (motifs)

  35. JASPAR • Profiles • Transcription factor binding sites • Multicellular eukaryotes • Derived from published collections of experiments • Open data accesss

  36. JASPAR • profiles • Modeled as matrices. • can be converted into PSSM for scanning genomic sequences.

  37. Search profile http://jaspar.genereg.net/

  38. logo Name of gene/protein organism score

More Related