1 / 24

Computational metagenomics and the human microbiome

Computational metagenomics and the human microbiome. Curtis Huttenhower 01-21-11. Harvard School of Public Health Department of Biostatistics. What to do with your metagenome?. Reservoir of gene and protein functional information. Comprehensive snapshot of microbial ecology and evolution.

pearly
Download Presentation

Computational metagenomics and the human microbiome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational metagenomics andthehuman microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics

  2. What to do with your metagenome? Reservoir of gene and protein functional information Comprehensive snapshot of microbial ecology and evolution Who’s there? What are they doing? What do functional genomic data tell us about microbiomes? What can our microbiomes tell us about us?* (x1010) Public health tool monitoring population health and interactions Diagnostic or prognostic biomarker for host disease *Using terabases of sequence and thousands of experimental results

  3. The Human Microbiome Project All healthy subjects; followup projects in psoriasis, Crohn’s, colitis, obesity, acne, cancer, antibiotic resistant infection… • 300 “normal” adults, 18-40 • 16S rDNA + WGS • 5 sites/18 samples + blood • Oral cavity: saliva, tongue, palate, buccal mucosa, gingiva, tonsils, throat, teeth • Skin: ears, inner elbows • Nasal cavity • Gut: stool • Vagina:introitus, mid, fornix • Reference genomes (~200+800) Kolenbrander, 2010 Hamady, 2009 2007 - ongoing

  4. HMP Organisms: Everyone andeverywhere is different ← Body sites + individuals → gut nose mouth arm vagina ear mucosa palate gingiva tonsils saliva sub. plaq. sup. plaq. throat tongue ← Organisms (taxa) → Aerobicity, interaction with the immune system, and extracellular medium appear to be major determinants Every microbiome is surprisingly different Even common organisms vary tremendously in abundance among individuals There are few organismalbiotypes in health Most organisms are rare in most places

  5. HUMAnN: Community metabolic and functionalreconstruction Functional seq. KEGG + MetaCYC CAZy, TCDB,VFDB, MEROPS… HMPUnifiedMetabolicAnalysisNetwork 300 subjects 1-3 visits/subject ~6 body sites/visit 10-200M reads/sample 100bp reads BLAST Smoothing Witten-Bell BLAST → Genes Genes → Pathways MinPath(Ye 2009) WGS reads Genes(KOs) Taxonomic limitation Rem. paths in taxa < ave. ? Pathways(KEGGs) Pathways/modules Xipe Distinguish zero/low(Rodriguez-Mueller in review) Gap filling c(g) = max( c(g), median )

  6. HUMAnN: Community metabolic and functionalreconstruction Pathway coverage Pathway abundance

  7. HUMAnN: Validating gene and pathwayabundances on synthetic data • Validated on individual genes, module coverage + abundance • False negatives: short genes (<100bp), taxonomically rare pathways • False positives: large andmulticopy (not many in bacteria)

  8. HUMAnN: The steps that didn’t make the cut Abundance Coverage

  9. Functional modules in 741 HMP samples • Zero microbes (of ~1,000) are core among body sites • Zero microbes are core among individuals • 19 (of ~220) pathways are present in every sample • 53 pathways are present in 90%+ samples PF O(BM) S O(SP) O(TD) RC AN ← Samples → Coverage ← Pathways→ Abundance • Only 31 (of 1,110) pathways are present/absent from exactly one body site • 263 pathways are differentially abundant in exactly one body site

  10. Microbial environment trumpshost environment (in health) Pathways in all body sites (“core”) • Human microbiome structure dictated primarily by microbial niche,not host (in health) • Huge variation in who’s there; small variation inwhat they’re doing • Note: definitely variation inhow these functions are implemented • Does not yet speak to environment (diet!), genetics, or disease ← Microbes→ HMP stool, colored by BMI MetaHIT stool, colored by IBD Aerobic body sites Gastrointestinal body sites ← Pathways→

  11. Metagenomic biomarker discovery Intervention/perturbation Healthy/IBD BMI Diet Biological story? Independent sample Batch effects? Populationstructure? Cross-validate Geneexpression Taxa &pathways SNPgenotypes Niches &Phylogeny Test forcorrelates Confounds/stratification/environment Featureselectionp >> n Multiplehypothesiscorrection

  12. LEfSe: Metagenomic classcomparison and explanation LEfSe LDA +Effect Size Nicola Segata http://huttenhower.sph.harvard.edu/lefse

  13. LEfSe: Evaluation on synthetic data

  14. Microbes characteristic of theoral and gut microbiota

  15. Aerobic, microaerobic and anaerobic communities • High oxygen: skin, nasal • Mid oxygen: vaginal, oral • Low oxygen: gut

  16. LEfSe: The TRUC murine colitis microbiota With Wendy Garrett

  17. MetaHIT: The gut microbiome and IBD With Ramnik Xavier, Joshua Korzenik 124 subjects: 99 healthy 21 UC + 4 CD Taxa Qin 2010 PhymmBrady 2009 WGS reads ReBLASTed against KEGG since published data obfuscates read counts Genes(KOs) Pathways/modules Pathways(KEGGs)

  18. MetaHIT: Taxonomic CD biomarkers Up in CD Down in CD Firmicutes UC Enterobacteriaceae

  19. MetaHIT: Functional CD biomarkers Subset of enriched pathways in CD patients Subset of enriched modules in CD patients Up in CD Down in CD Growth/replication Motility Transporters Sugar metabolism

  20. Sleipnir: Software forscalable functional genomics Massive datasets require efficientalgorithms and implementations. • Sleipnir C++ library for computational functional genomics • Data types for biological entities • Microarray data, interaction data, genes and gene sets, functional catalogs, etc. etc. • Network communication, parallelization • Efficient machine learning algorithms • Generative (Bayesian) and discriminative (SVM) • And it’s fully documented! It’s also speedy: microbial data integration computation takes <3hrs. http://huttenhower.sph.harvard.edu/sleipnir http://huttenhower.sph.harvard.edu/lefse http://huttenhower.sph.harvard.edu/humann

  21. Thanks! Human Microbiome Project George Weinstock Jennifer Wortman Owen White MakedonkaMitreva Erica Sodergren VivienBonazzi Jane Peterson Lita Proctor SaharAbubucker Yuzhen Ye Beltran Rodriguez-Mueller Jeremy Zucker QiandongZeng MathangiThiagarajan Brandi Cantarel Maria Rivera Barbara Methe Bill Klimke Daniel Haft Dirk Gevers Jacques Izard Nicola Segata PinakiSarder Ramnik Xavier HMP Metabolic Reconstruction Wendy Garrett Sarah Fortune Bruce Birren Mark Daly Doyle Ward Eric Alm Ashlee Earl Lisa Cosimi Levi Waldron LarisaMiropolsky Interested? We’re recruiting students and postdocs! http://huttenhower.sph.harvard.edu/

  22. The LEfSe algorithm Statisticalconsistency Biologicalconsistency Overall effect size

  23. HMP: Metabolism, host-microbiome interactions, and microbial taxa >3200 gene families differential in the mucosa >1500 upregulated outsidethe mucosa and not in anyActinobacterial genome WGS 16S

More Related