Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

canadian bioinformatics workshops n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Canadian Bioinformatics Workshops PowerPoint Presentation
Download Presentation
Canadian Bioinformatics Workshops

play fullscreen
1 / 98
Canadian Bioinformatics Workshops
206 Views
Download Presentation
avram-caldwell
Download Presentation

Canadian Bioinformatics Workshops

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module #: Title of Module 2

  3. Module 7 Metabolomic Data Analysis Using MetaboAnalyst David Wishart Informatics and Statistics for Metabolomics July 8-9-2013

  4. Learning Objectives • To become familiar with the standard metabolomics data analysis workflow • To become aware of key elements such as: data integrity checking, outlier detection, quality control, normalization, scaling, etc. • To learn how to use MetaboAnalyst to facilitate data analysis

  5. A Typical Metabolomics Experiment

  6. 25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods

  7. Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods

  8. Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search

  9. Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/download/

  10. Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...

  11. Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances

  12. Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.

  13. Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity

  14. MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca

  15. MetaboAnalyst Workflow

  16. GC/LC-MS raw spectra • Peak lists • Spectral bins • Concentration table • Spectra processing • Peak processing • Noise filtering • Missing value estimation • Row-wise normalization • Column-wise normalization • Combined approach Data integrity check Data input Data processing Data normalization Statistical Exploration Functional Interpretation Enrichment analysis Pathway analysis Time-series analysis Two/multi-group analysis • Over representation analysis • Single sample profiling • Quantitative enrichment • analysis • Enrichment analysis • Topology analysis • Interactive visualization • Data overview • Two-way ANOVA • ANOVA - SCA • Time-course analysis • Univariate analysis • Correlation analysis • Chemometric analysis • Feature selection • Cluster analysis • Classification Outputs Image Center Quality checking Other utilities • Resolution: 150/300/600 dpi • Format: png, tiff, pdf, svg, ps • Methods comparision • Temporal drift • Batch effect • Biolgoical checking • Peak searching • Pathway mapping • Name/ID conversion • Lipidomics • Processed data • Result tables • Analysis report • Images

  17. MetaboAnalyst Overview • Raw data processing • Using MetaboAnalyst • Data Reduction & Statistical analysis • Using MetaboAnalyst • Functional enrichment analysis • Using MSEA in MetaboAnalyst • Metabolic pathway analysis • Using MetPA in MetaboAnalyst

  18. Example Datasets

  19. Example Datasets

  20. Metabolomic Data Processing

  21. Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)

  22. Data Upload

  23. Alternatively …

  24. Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows

  25. Data Integrity Check

  26. Data Normalization

  27. Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)

  28. Data Normalization • Column-wise normalization aims to make each variable (column) comparable in scale to each other, thereby generating a “normal” distribution • This procedure is useful when variables are of very different orders of magnitude • Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling

  29. Normalization Result

  30. Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results

  31. Visual Inspection • What does an outlier look like? Finding outliers via PCA Finding outliers via Heatmap

  32. Outlier Removal

  33. Noise Reduction

  34. Noise Reduction (cont.) • Characteristics of noise & uninformative features • Low intensities • Low variances (default)

  35. Data Reduction and Statistical Analysis

  36. Common tasks • To identify important features • To detect interesting patterns • To assess difference between the phenotypes • To facilitate classification or prediction

  37. ANOVA

  38. View Individual Compounds

  39. Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other?

  40. Overall correlation pattern

  41. High resolution image Specify format Specify resolution Specify size

  42. Question • Q: In untargeted metabolomics using NMR, researchers often look for region(s) on the spectra showing biggest change in their correlation patterns under different conditions. Can you do that in MetaboAnalyst? • Hint: check the available parameters of Correlation analysis

  43. Template Matching • Looking for compounds showing interesting patterns of change • Essentially a method to look for linear trends or periodic trends in the data • Best for data that has 3 or more groups

  44. Template Matching (cont.) Strong linear + correlation to grain % Strong linear - correlation to grain %

  45. Question • Q: Identify compounds that decrease in the first three groups but increase in the last group?

  46. PCA Scores Plot

  47. PCA Loading Plot Compounds most responsible for separation

  48. 3D-PCA

  49. Question Q: Identify compounds that contribute most to the separation between group 15 and 45