Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 7 Metabolomic Data Analysis Using MetaboAnalyst David Wishart Informatics and Statistics for Metabolomics July 8-9-2013
Learning Objectives • To become familiar with the standard metabolomics data analysis workflow • To become aware of key elements such as: data integrity checking, outlier detection, quality control, normalization, scaling, etc. • To learn how to use MetaboAnalyst to facilitate data analysis
25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods
Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods
Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search
Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/download/
Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances
Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.
Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity
MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca
GC/LC-MS raw spectra • Peak lists • Spectral bins • Concentration table • Spectra processing • Peak processing • Noise filtering • Missing value estimation • Row-wise normalization • Column-wise normalization • Combined approach Data integrity check Data input Data processing Data normalization Statistical Exploration Functional Interpretation Enrichment analysis Pathway analysis Time-series analysis Two/multi-group analysis • Over representation analysis • Single sample profiling • Quantitative enrichment • analysis • Enrichment analysis • Topology analysis • Interactive visualization • Data overview • Two-way ANOVA • ANOVA - SCA • Time-course analysis • Univariate analysis • Correlation analysis • Chemometric analysis • Feature selection • Cluster analysis • Classification Outputs Image Center Quality checking Other utilities • Resolution: 150/300/600 dpi • Format: png, tiff, pdf, svg, ps • Methods comparision • Temporal drift • Batch effect • Biolgoical checking • Peak searching • Pathway mapping • Name/ID conversion • Lipidomics • Processed data • Result tables • Analysis report • Images
MetaboAnalyst Overview • Raw data processing • Using MetaboAnalyst • Data Reduction & Statistical analysis • Using MetaboAnalyst • Functional enrichment analysis • Using MSEA in MetaboAnalyst • Metabolic pathway analysis • Using MetPA in MetaboAnalyst
Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)
Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows
Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)
Data Normalization • Column-wise normalization aims to make each variable (column) comparable in scale to each other, thereby generating a “normal” distribution • This procedure is useful when variables are of very different orders of magnitude • Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling
Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results
Visual Inspection • What does an outlier look like? Finding outliers via PCA Finding outliers via Heatmap
Noise Reduction (cont.) • Characteristics of noise & uninformative features • Low intensities • Low variances (default)
Common tasks • To identify important features • To detect interesting patterns • To assess difference between the phenotypes • To facilitate classification or prediction
Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other?
High resolution image Specify format Specify resolution Specify size
Question • Q: In untargeted metabolomics using NMR, researchers often look for region(s) on the spectra showing biggest change in their correlation patterns under different conditions. Can you do that in MetaboAnalyst? • Hint: check the available parameters of Correlation analysis
Template Matching • Looking for compounds showing interesting patterns of change • Essentially a method to look for linear trends or periodic trends in the data • Best for data that has 3 or more groups
Template Matching (cont.) Strong linear + correlation to grain % Strong linear - correlation to grain %
Question • Q: Identify compounds that decrease in the first three groups but increase in the last group?
PCA Loading Plot Compounds most responsible for separation
Question Q: Identify compounds that contribute most to the separation between group 15 and 45