1 / 37

Phyloinformatics or How to analyze LOTS of sequences

Phyloinformatics or How to analyze LOTS of sequences. Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014. Phyloinformatic workflow. Phyloinformatic workflow. www.phylota.net. Select and Download Data. Find a sequence cluster with: > 500 sequences

konala
Download Presentation

Phyloinformatics or How to analyze LOTS of sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhyloinformaticsorHow to analyze LOTSof sequences Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014

  2. Phyloinformatic workflow

  3. Phyloinformatic workflow

  4. www.phylota.net

  5. Select and Download Data • Find a sequence cluster with: > 500 sequences < 2000 base pairs • - Tetrapoda • - Teleostei • - eudicotyledons • - arthropoda

  6. Select and Download Data • Find a sequence cluster with: > 500 sequences < 2000 base pairs Download the example file of 18S sequences from the class google drive: 18S.fa • - Tetrapoda • - Teleostei • - eudicotyledons • - arthropoda

  7. Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance

  8. Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance

  9. Alignment Programs Clustal Omega MAFFT ProbCons TCofee PRRN DECIPHER Muscle Clustal Kalign DIALIGN-T Bali-Phy

  10. Balance Between Scalability & Accuracy

  11. MAFFT • Align 1,000s of sequences in minutes/hours • Progressive and iterative methods supported • Multiple scoring schemes • Install locally or run on the CBRC servers

  12. Go ahead and try aligning the 18S.fa file that you downloaded from the class google drive.

  13. Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance

  14. Dot Plot

  15. DELETION / INSERTION

  16. INVERSION

  17. INVERSION Matches between opposite strand Matches between same strand

  18. Evaluating the 18S alignment • Look at your dot plots first. What is wrong with the sequences? • How would you fix/prevent this problem?

  19. Evaluating Sites in an Alignments • Bootstrapping - Guidance • ID regions with strong support - Gblocks

  20. GBlocks

  21. GBlocks

  22. GBlocks 6 I residues 8 F residues 9 W residues

  23. Bootstrapping

  24. Bootstrapping These scores across the bottom scaled between 0 and 1 report the proportion of alignments that agree on the assignment of nucleotides in the original MSA

  25. Try The Data You Downloaded • Make an alignment • Check the dot plots • Use Gblocks to remove uncertain sites • How many sites in initial alignment? • How many sites in filtered alignment? • Did you lose any taxa?

  26. Treat your alignment as a model parameter! • BaliPhy: Estimates phylogenetic trees across all possible alignments without conditioning on a single alignment being “true” • Thanks for listening to me!

More Related