1 / 62

Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe

Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe. Curtis Huttenhower 08- 08-13. Harvard School of Public Health Department of Biostatistics. Some setup notes. Slides with green titles or text include instructions not needed today, but useful for your own analyses

klaus
Download Presentation

Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meta’omic Analysis withMetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics

  2. Some setup notes • Slides with green titles or text include instructions not needed today, but useful for your own analyses • Keep an eye out for red warnings of particular importance • Command lines and program/file names appear in a monospaced font.

  3. Getting some HMP data Click “Get Data” • Go to http://hmpdacc.org

  4. Getting some HMP data • Check out what’s available Click “HMIWGS”

  5. Getting some HMP data • Check out what’s available Click on your favorite body site

  6. Getting some HMP data Don’t click on anything! • Check out what’s available

  7. Getting some (prepped) HMP data • Connect to the server instead and run: • for S in `ls /class/stamps-shared/chuttenh/7*.fasta`;do ln -s $S; done • These are subsamples of six HMP files: • SRS014459.tar.bz2 763577454-SRS014459-Stool.fasta • SRS014464.tar.bz2  763577454-SRS014464-Anterior_nares.fasta • SRS014470.tar.bz2  763577454-SRS014470-Tongue_dorsum.fasta • SRS014472.tar.bz2  763577454-SRS014472-Buccal_mucosa.fasta • SRS014476.tar.bz2  763577454-SRS014476-Supragingival_plaque.fasta • SRS014494.tar.bz2  763577454-SRS014494-Posterior_fornix.fasta • All six shotgunned body sites from • One subject, first visit • Subsampled to 20,000 reads

  8. Who’s there: MetaPhlAn X is a core gene for clade Y X is a unique marker gene for clade Y Gene X

  9. Who’s there: MetaPhlAn • Go to http://huttenhower.sph.harvard.edu/metaphlan Scroll down

  10. Who’s there: MetaPhlAn • You could download MetaPhlAn by clicking here

  11. Who’s there: MetaPhlAn • But don’t! Instead, we’ve downloaded MetaPhlAn already for you by clicking here

  12. Who’s there: MetaPhlAn • And saving just the file metaphlan.py in stamps-software/bin

  13. Who’s there: MetaPhlAn • You could download the bowtie2 database here

  14. From the command line... • But don’t! Instead, get it by: • for S in `ls /class/stamps-shared/chuttenh/*.bt2`; do ln -s $S; done • To see what you can do, run: • module load stamps • metaphlan.py-h| less • Use the arrow keys to move up and down,q to quit back to the prompt

  15. Who’s there: MetaPhlAn

  16. Who’s there: MetaPhlAn • For future reference: these extra options aren’t necessary if you download the whole “default” MetaPhlAn package • metaphlan.pyyour_input.fasta > your_output.txt • To launch your first analysis, run: • metaphlan.py--bowtie2db mpa 763577454-SRS014459-Stool.fasta > 763577454-SRS014459-Stool.txt

  17. Who’s there: MetaPhlAn • What did you just do? • Two new output files: • 763577454-SRS014459-Stool.fasta.bowtie2out.txt • Contains a mapping of reads to MetaPhlAn markers • 763577454-SRS014459-Stool.txt • Containstaxonomicabundances as percentages

  18. Who’s there: MetaPhlAn • less 763577454-SRS014459-Stool.fasta.bowtie2out.txt

  19. Who’s there: MetaPhlAn • less 763577454-SRS014459-Stool.txt

  20. Who’s there: MetaPhlAn • Now finish the job: • metaphlan.py--bowtie2db mpa763577454-SRS014464-Anterior_nares.fasta > 763577454-SRS014464-Anterior_nares.txt • ... • Notethatyoucanuse the up arrowkeyto make your life easier!

  21. Who’s there: MetaPhlAn • Let’s make a single table containing all six samples: • mkdirtmp • mv *.bowtie2out.txt tmp • merge_tables.py-l -d *.txt |zero.py> 763577454.tsv • Youcan look at this file usingless • Note 1: The argumentsless -x4 -Swill help • Note 2: Youcan set this “permanently” usingexport LESS="-x4 -S"

  22. Who’s there: MetaPhlAn • But it’s easier using MeV; go to http://www.tm4.org/mev.html Click “Download”

  23. An interlude: MeV • Don’t forget to transfer your 763577454.tsv file locally for viewing using scp • Unzip, launch MeV, and select File/Load data

  24. An interlude: MeV • Click “Browse” to your TSV file, then • Tell MeV it’s a two-color array • Uncheck “Load annotation” • Click on the upper-leftmost data value

  25. An interlude: MeV • “Load” your data, then make is visible by: • Display/Set Color Scale Limits • Choose Single Gradient, min 0, max 10

  26. An interlude: MeV • Finally, to play around a bit: • Display/Set Element Size/whatever you’d like • Clustering/Hierarchical Clustering • Optimize both gene and sample order • And select Manhattan Distance (imperfect!)

  27. An interlude: MeV • If you’d like, you can • Display/Sample-Column Labels/Abbr. Names

  28. An interlude: MeV • MeV is a tool; imperfect, but convenient • You should likely include just “leaf” nodes • Species, whose names start include “s__” • You can filter your file using:cat 763577454.tsv | grep -E '(Stool)|(s__)'> 763577454_species.tsv • You can, but might not want to, z-score normalize • Adjust Data/Gene-Row Adjustments/Normalize Genes-Rows • Many other tools built in – experiment!

  29. What they’re doing: HUMAnN • Back to the task at hand; you could download HUMAnN at: http://huttenhower.sph.harvard.edu/humann Click here

  30. What they’re doing: HUMAnN • ...but instead we’ve already downloaded it • Expand HUMAnN (no install!) • tar -xzf/class/stamps-shared/chuttenh/sources/humann-0.98.tar.gz • Set up a link to the KEGG reference DB: • ln-s /class/stamps-shared/chuttenh/kegg.reduced.udb • Andalthoughyouwouldnormally download USEARCH fromhere: • http://www.drive5.com/usearch/download.html We’regoingtouseitpreinstalledinstead

  31. What they’re doing: HUMAnN • If we weren’t all running this, you’d need to: • Get KEGG – used to be free, now it’s not! • Fortunately, we have a HUMAnN-compatible distributable version; contact me... • Index it for USEARCH: • usearch6 -makeudb_usearchkegg.reduced.fasta-output kegg.reduced.udb • This takes a minute or two, sowe’veprecomputedit; thus, forgeahead...

  32. What they’re doing: HUMAnN • Did you notice that we didn’t QC our data at all? • MetaPhlAn is very robust to junk sequence • HUMAnN is pretty robust, but not quite as much • We’ve already run a standard metagenomic QC: • Quality trim by removing bad bases (typically Q ~15) • Length filter to remove short sequences (typically <75%)

  33. What they’re doing: HUMAnN • Must start from FASTQ files to do this • Quality trim by removing bad bases: • TrimBWAstyle.py< your_data.fastq > your_trimmed_data.fastq • Length filter by removing short sequences: • 75% of original length is standard (thus 75nt from 100nt reads) • remove_bad_seqs.py 75 < your_trimmed_data.fastq > your_filtered_data.fastq • Now convert your FASTQ to a FASTA: • fastq2fasta.py < your_filtered_data.fastq> your_filtered_data.fasta • Some final caveats: • If you’re using paired end reads, match filters! • See my course homeworks at http://huttenhower.sph.harvard.edu/bio508 • Aren’t you glad you’re not doing this today?

  34. What they’re doing: HUMAnN • Enter the humann directory • cd humann-0.98 • Run your first translated BLAST search: • usearch6.0.192_i86linux32-usearch_local ../763577454-SRS014459-Stool.fasta -db../kegg.reduced.udb-id 0.8 -blast6out input/763577454-SRS014459-Stool.txt • What did you just do? • less input/763577454-SRS014459-Stool.txt • Recall BLAST’s tab-delimited output headers: • qseqidsseqidpident length mismatch gapopenqstartqendsstart send evaluebitscore • Rinse andrepeatfor the remaining samples

  35. What they’re doing: HUMAnN • Normally you’d need to install SCons from: • http://www.scons.org • Instead, we’ll use it preinstalled as well, so... GO! • /class/stamps-software/scons/bin/scons • Youshouldsee a bunch of textscrollby • Note: youcan run scons -j8toparallelizetasks

  36. What they’re doing: HUMAnN • After a minute or two, you should see:

  37. What they’re doing: HUMAnN • This has created four main files: • Two each for pathways (big) and modules (small) • Two each for coverage and relative abundance • Each is tab-delimitedtextwithone column per sample • Allfour are in the output directory: • output/04a-hit-keg-mpt-cop-nul-nve-nve-xpe.txt • Coverage (a) of pathways (t) • output/04a-hit-keg-mpm-cop-nul-nve-nve-xpe.txt • Coverage (a) of modules (m) • output/04b-hit-keg-mpt-cop-nul-nve-nve.txt • Abundance (b) of pathways (t) • output/04b-hit-keg-mpm-cop-nul-nve-nve.txt • Abundance (b) of modules (m) • I almostalwaysjustuse 04b-mpm (module abundances)

  38. What they’re doing: HUMAnN • Let’s take a look: • less output/04b-hit-keg-mpm-cop-nul-nve-nve.txt

  39. What they’re doing: HUMAnN • That’s ugly; it gets much better in Excel • Note: this is very sparse since we’re using a small subset of KEGG • Note: the mock community demo data is included on the right

  40. What they’re doing: HUMAnN • And there’s nothing stopping us from using MeV • Or R, or QIIME, or anything that’ll read tab-delimited text

  41. What matters: LEfSe • All of these analyses give you tables • 16S  OTUs • MetaPhlAn  Species • HUMAnN  Modules (or pathways or genes) • If you know something about your samples... • Case/control • Different habitats • Different time points • How can you identify features that change?

  42. What matters: LEfSe • Let’s get all of the HMP species data:http://hmpdacc.org/resources/data_browser.php Click “HMSMCP”

  43. What matters: LEfSe • Download the MetaPhlAn table for all 700 samples Right click this irritatingly tiny icon

  44. Downloading from the command line • Instead of saving this, download it by: • Right-click to copy the URL • Run wget <paste URL here> • Note: curl –O <URL> works just as well

  45. What matters: LEfSe • Make sure this file is in your home directory, and expand it: • bunzip2 HMP.ab.txt.bz2 • Look at the result • less HMP.ab.txt • IMPORTANT!!! • This file’s too big to analyze directly today • ln -s /class/stamps-shared/chuttenh/HMP.ab.filtered.txt • This is great – tons of data, but no metadata • HUMAnN to the rescue • From the humann-0.98 directory:python src/metadata.py input/hmp_metadata.dat < ../HMP.ab.filtered.txt> ../HMP.ab.filtered.metadata.tsv • NOW take a look again

  46. What matters: LEfSe • Let’s modify this file to be LEfSe-compatible • Open it up in Excel

  47. What matters: LEfSe • Delete all of the metadata rows except: • RANDSID and STSite • Save it as tab-delimited text: HMP.ab.filtered.metadata.txt

  48. What matters: LEfSe • Visit LEfSe at: http://huttenhower.sph.harvard.edu/lefse Click here

  49. What matters: LEfSe • Then upload your formatted table • After you upload, wait for the progress meter to turn green! 2. Then here 1. Click here 4. Then watch here 3. Then here

  50. What matters: LEfSe • Then tell LEfSe about your metadata: 1. Click here 2. Then select STSite 3. Then select RANDSID 4. Then here

More Related