Metabolomics a Promising ‘omics Science - PowerPoint PPT Presentation

Metabolomics a promising omics science
1 / 41

  • Uploaded on
  • Presentation posted in: General

Metabolomics a Promising ‘omics Science. By Susan Simmons University of North Carolina Wilmington. Collaborators. Dr. David Banks, Duke Dr. Chris Beecher, University of Michigan Dr. Xiaodong Lin, University of Cincinnati Dr. Young Truong, UNC Dr. Jackie Hughes-Oliver, NC State

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Metabolomics a Promising ‘omics Science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Metabolomics a promising omics science

Metabolomics a Promising ‘omics Science

By Susan Simmons

University of North Carolina Wilmington



  • Dr. David Banks, Duke

  • Dr. Chris Beecher, University of Michigan

  • Dr. Xiaodong Lin, University of Cincinnati

  • Dr. Young Truong, UNC

  • Dr. Jackie Hughes-Oliver, NC State

  • Dr. Stanley Young, NISS

  • Dr. Ann Stapleton, UNCW Biology

  • Dr. Robert Simmons, MD

What is metabolomics

What is Metabolomics?

  • The word metabolome was first used less than a decade ago (1998) and referred to all low molecular mass compounds synthesized and modified by a living cell or organism (Villas-Boas, 2007)

  • The complete human metabolome consists of endogenous (~1800) and exogenous metabolites (MANY!!)

  • Human Metabolome Project

Fluorene degradation reference pathway www genome jp kegg kyoto encyclopedia of genes and genomes

Fluorene degradation - Reference pathway ( Encyclopedia of Genes and Genomes)

Mass distribution of compounds in the human metabolome

  • Metabolome

    • natively biosynthesized

    • monomeric

  • Complex metabolites

  • Xenobiome

Mass Distribution of Compounds in the Human Metabolome

History of metabolomics

History of Metabolomics

  • Machinery to detect metabolites have existed since the late 1960’s

  • First paper appeared in 1971 (Robinson and Pauling)

  • First paper involving “metabolomics” came about in the late 1990’s

Why metabolomics can be promising

Why Metabolomics can be promising

  • Easy to use screening for disease

  • Assist in identifying gene function

  • Drug discovery

  • Assessment of toxicity (especially liver toxicity) in new drugs.

  • Nutrigenomics and diet strategies

Genomics proteomics and metabolomics

Genomics,Proteomics and Metabolomics

The emerging science of metabolomics

The emerging science of Metabolomics


Genomics – 25,000Genes



Transcriptomics – 100,000Transcripts

Proteomics – 1,000,000Proteins


Biochemicals (Metabolites)

Metabolomics – 1,800 Compounds


Biochemical profile map to metabolic pathways

Biochemical Profile

Biochemical Profile Map to Metabolic Pathways

Data collection and measurement issues

Data Collection and Measurement Issues

To obtain data, a tissue sample is taken from a patient. Then:

  • The sample is prepped and put onto wells on a silicon plate.

  • Each well’s aliquot is subjected to gas and/or liquid chromatography.

  • After separation, the sample goes to a mass spectrometer.

Metabolomics a promising omics science

MS platforms


Data Extraction

-peak identification

-peak alignment

-peak deconvolution

Chemical Identification

-reference databases

-ion spectra

-grouping related ions

-compound id


Quality Control

Data Reduction











No Interpretation Interface




Data collection and measurement issues1

Data Collection and Measurement Issues

The sample prep involves stabilizing the sample, adding spiked-in calibrants, and creating multiple aliquots (some are frozen) for QC purposes. This is roboticized.

Sources of error in this step include:

  • within-subject variation

  • within-tissue variation

  • contamination by cleaning solvents

  • calibrant uncertainty

  • evaporation of volatiles.

Data collection and measurement issues2

Data Collection and Measurement Issues

The result of this is a set of m/z ratios and timestamps for each ion, which can be viewed as a 2-D histogram in the m/z x time plane.

One now estimates the amount of each metabolite. This entails normalization, which also introduces error.

The caveats pointed out in Baggerley et al. (Proteomics, 2003) apply.

Data collection and measurement issues3

Data Collection and Measurement Issues

  • Baseline correction

  • Alignment

  • Estimating quantity of specific metabolites.

Data collection and measurement issues4

Data Collection and Measurement Issues

Let z be the vector of raw data, and let x be the estimates. Then the measurement equation is:

G(z) = x = µ+ ε

where µis the vector of unknown true values and εis decomposable into separate components.

For metabolite i, the estimate Xiis:

gi(z) = lnΣ wij∫∫sm(z) – c(m,t)dm dt.

Data collection and measurement issues5

Data Collection and Measurement Issues

The law of propagation of error (this is essentially the delta method) says that the variance in X is about

Σni=1 (∂g /∂ zi)2 Var[zi] +

Σi≠k 2 (∂g/∂zi)(∂g/∂zk) Cov[zi, zk]

The weights depend upon the values of the spiked in calibrants, so this gets complicated.

Data collection and measurement issues6

Data Collection and Measurement Issues

Cross-platform experiments are also crucial for medical use. This leads to key comparison designs. Here the same sample (or aliquots of a standard solution or sample) are sent to multiple labs. Each lab produces its spectrogram.

It is impossible to decide which lab is best, but one can estimate how to adjust for interlab differences.

Data collection and measurement issues7

Data Collection and Measurement Issues

The Mandel bundle-of-lines model is what we suggest for interlaboratory comparisons. This assumes:

Xik = αi + βiθk + εik

where Xik is the estimate at lab i for metabolite k, θk is the unknown true quantity of metabolite k, and

εik ~ N(0,σik2).

Data collection and measurement issues8

Data Collection and Measurement Issues

To solve the equations given values from the labs, one must impose constraints. A Bayesian can put priors on the laboratory coefficients and the error variance.

Metabolomics needs a multivariate version, with models for the rates at which compounds volatilize.

Statistical issues

Statistical issues

  • Many missing values!!!

  • Outliers

  • Distribution of metabolites are not normally distributed

  • n<p

  • Correlated metabolites

Statistical issues1

Statistical Issues

  • PCA or ICA

  • Partial Least Squares

  • Clustering

  • Random Forest, SVM

  • rSVD

Statistical issues2

Statistical issues

Dealing with missing values

  • Replacing missing values by 0’s is not necessarily a good idea. Not truly 0.

  • Minimum, half-min, uniform(0, minimum)

  • Random forest imputation

  • Observing conditional distribution (Dr. Young Truong at UNC)

Statistical issues3

Statistical Issues

Prediction and Classification

  • Partial least squares

  • Random Forest

  • SVM

  • Neural networks

Statistical issues4

Statistical Issues

Identifying relationships

  • MDS

  • Clustering

  • rSVD (PowerMV from NISS)

Als metabolomic data set

ALS metabolomic data set

We had abundance data on 317 metabolites from 63 subjects. Of these, 32 were healthy, 22 had ALS but were not on medication, and 9 had ALS and were taking medication.

The goal was to classify the two ALS groups and the healthy group.

Here p>n. Also, some abundances were below detectability.

Als metabolomic data set1

ALS metabolomic data set

Using the Breiman-Cutler code for Random Forests, the out-of-bag error rate was 7.94%; 29 of the ALS patients and 29 of the healthy patients were correctly classified.

20 of the 317 metabolites were important in the classification, and three were dominant.

RF can detect outliers via proximity scores. There were four such.

Als metabolomic data set2

ALS Metabolomic data set

Several support vector machine approaches were tried on this data:

  • Linear SVM

  • Polynomial SVM

  • Gaussian SVM

  • L1 SVM (Bradley and Mangasarian, 1998)

  • SCAD SVM (Fan and Li, 2000)

    The SCAD SVM had the best loo error rate, 14.3%.

Als metabolomic data set3

ALS Metabolomic data set

Robust SVD (Liu et al., 2003) is used to simultaneously cluster patients (rows) and metabolites (columns). Given the patient by metabolite matrix X, one writes

Xik = ri ck + εik

where ri and ck are row and column effects. Then one can sort the array by the effect magnitudes.

Als metabolomic data set4

ALS metabolomic data set

To do a rSVD use alternating L1 regression, without an intercept, to estimate the row and column effects. First fit the row effect as a function of the column effect, and then reverse. Robustness stems from not using OLS.

Doing similar work on the residuals gives the second singular value solution.

Nci data set

NCI data set

  • NCI 60 cell lines

  • 9 cancer types: breast, CNS, colon, melanoma, renal, leukemia, prostate, ovarian, lung

  • GC-LS

  • Melanoma vs CNS (8 cell lines for melanoma and 6 cell lines for CNS)

Variable importance using rf

Variable Importance using RF

Component 1 versus 2

Component 1 versus 2

Useful websites

Useful websites

  • Deconvolution of peaks, software AMDIS (; NIST, Gaithersburg, USA)

  • Human Metabolome database (

  • KEGG (


  • Many, many others

Concluding remarks

Concluding Remarks

  • Many interesting statistical issues still need to be addressed.

    • Measurement issues and interlaboratory differences need to be properly addressed.

    • Statistical issues in analyzing metabolomic data still remain an interesting challenge.

  • Metabolomics is an important part in understanding systems biology.

  • Login