1 / 88

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 6. David Wishart Informatics and Statistics for Metabolomics June 16-17, 2011. A Typical Metabolomics Experiment. 25. PC2. 20. 15. ANIT. 10. 5. 0. -5. Control. -10. -15. PAP. -20.

Download Presentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module #: Title of Module 2

  3. Module 6 David Wishart Informatics and Statistics for Metabolomics June 16-17, 2011

  4. A Typical Metabolomics Experiment

  5. 25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods

  6. Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods

  7. Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search

  8. Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/download/

  9. Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...

  10. Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances

  11. Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.

  12. Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity

  13. MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca

  14. Downloads • Processed data • PDF report • Images GC/LC-MS raw spectra MS / NMR peak lists MS / NMR spectra bins Metabolite concentrations • Peak detection • Retention time correction Baseline filtering Peak alignment • Resources & utilities • Peak searching • Pathway mapping • Name conversion • Lipidomics • Metabolite set libraries • Data integrity check • Missing value imputation • Data normalization • Row-wise normalization (4) • Column-wise normalization (4) • Enrichment analysis • Over representation analysis • Single sample profiling • Quantitative enrichment • analysis • Statistical analysis • Univariate analysis • Dimension reduction • Feature selection • Cluster analysis • Classification • Pathway analysis • Enrichment analysis • Topology analysis • Interactive visualization • Time-series /two factor • Visualization • Two-way ANOVA • ASCA • Temporal comparison

  15. MetaboAnalyst Overview • Raw data processing • Using MetaboAnalyst • Data Reduction & Statistical analysis • Using Metaboanalyst • Functional enrichment analysis • Using MSEA in MetaboAnalyst • Metabolic pathway analysis • Using MetPA in MetaboAnalyst

  16. Example Datasets

  17. Example Datasets

  18. Metabolomic Data Processing

  19. Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)

  20. Data Upload

  21. Alternatively …

  22. Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows

  23. Data Integrity Check

  24. Data Normalization

  25. Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)

  26. Data Normalization • Column-wise normalization aims to make each variable (column) comparable to each other. This procedure is useful when variables are of very different orders of magnitude. Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling

  27. Normalization Result

  28. Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results

  29. Visual Inspection • What does an outlier look like?

  30. Outlier Removal

  31. Noise Reduction

  32. Noise Reduction (cont.) • Characteristics of noise • Low intensities • Low variances (default)

  33. Data Reduction and Statistical Analysis

  34. Common tasks • To detect interesting patterns; • To identify important features; • To assess difference between the phenotypes • Classification / prediction

  35. ANOVA

  36. View Individual Compounds

  37. Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other?

  38. Template Matching • Looking for compounds showing interesting patterns of change

  39. Template Matching (cont.)

  40. Question • Q: Identify compounds that decrease in the first three groups but increase in the last group?

  41. PCA Scores Plot

  42. PCA Loading Plot

  43. Question Q: Identify compounds that contribute most to the separation between group 15 and 45

  44. PLS-DA Score Plot

  45. Determine # of Components

  46. Important Compounds

  47. Model Validation

  48. Questions • Q: What does p < 0.01 mean? • Q: How many permutations need to be performed if you want to claim p value < 0.0001?

  49. Heatmap Visualization

More Related