Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 6 David Wishart Informatics and Statistics for Metabolomics June 16-17, 2011
25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods
Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods
Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search
Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/download/
Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances
Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.
Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity
MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca
Downloads • Processed data • PDF report • Images GC/LC-MS raw spectra MS / NMR peak lists MS / NMR spectra bins Metabolite concentrations • Peak detection • Retention time correction Baseline filtering Peak alignment • Resources & utilities • Peak searching • Pathway mapping • Name conversion • Lipidomics • Metabolite set libraries • Data integrity check • Missing value imputation • Data normalization • Row-wise normalization (4) • Column-wise normalization (4) • Enrichment analysis • Over representation analysis • Single sample profiling • Quantitative enrichment • analysis • Statistical analysis • Univariate analysis • Dimension reduction • Feature selection • Cluster analysis • Classification • Pathway analysis • Enrichment analysis • Topology analysis • Interactive visualization • Time-series /two factor • Visualization • Two-way ANOVA • ASCA • Temporal comparison
MetaboAnalyst Overview • Raw data processing • Using MetaboAnalyst • Data Reduction & Statistical analysis • Using Metaboanalyst • Functional enrichment analysis • Using MSEA in MetaboAnalyst • Metabolic pathway analysis • Using MetPA in MetaboAnalyst
Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)
Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows
Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)
Data Normalization • Column-wise normalization aims to make each variable (column) comparable to each other. This procedure is useful when variables are of very different orders of magnitude. Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling
Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results
Visual Inspection • What does an outlier look like?
Noise Reduction (cont.) • Characteristics of noise • Low intensities • Low variances (default)
Common tasks • To detect interesting patterns; • To identify important features; • To assess difference between the phenotypes • Classification / prediction
Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other?
Template Matching • Looking for compounds showing interesting patterns of change
Question • Q: Identify compounds that decrease in the first three groups but increase in the last group?
Question Q: Identify compounds that contribute most to the separation between group 15 and 45
Questions • Q: What does p < 0.01 mean? • Q: How many permutations need to be performed if you want to claim p value < 0.0001?