1 / 35

The metabonomics

Combination of Independent Component Analysis and statistical modeling for the identification of metabonomic biomarkers in 1 H-NMR spectroscopy Réjane Rousseau (Institut de Statistique, UCL, Belgium). The metabonomics. specific region = biomarker. Biofluid (Urine). 1 H-NMR spectroscopy.

Download Presentation

The metabonomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combination of Independent Component Analysis and statistical modeling for the identification of metabonomic biomarkers in 1H-NMR spectroscopy Réjane Rousseau (Institut de Statistique, UCL, Belgium)

  2. The metabonomics specific region = biomarker Biofluid (Urine) 1H-NMR spectroscopy Whithout contact METABOLITES TISSUES Organs After contact Frequency domain • One molecule = several peaks ( 1 to 3 ) with specific positions in the frequency domain. • The concentration of a molecule is proportional to the area under the curve in its peaks. Spectral alterations = Altered metabolites = detection Identification of biomarkers «  which part of the spectrum to examine? » • How? • In an experimental database • with a methodology based on Principal Component Analysis (“PCA”) Objective: propose a methodology combining ICA and statistical modeling

  3. Identification of biomarkers: • Experimental data: • Biologists have a pool of n rats with a control pathological state. • They collect n samples of urine . TheH-NMR produces n spectra of m values = Spectral data • Each sample or spectrum is described by l variables in a matrix ofdesign Y. One of these variables describes the characteristic related to the biomarkers: yk • Biomarker identification= statistical methods to answer to the question: « In these multivariate data, which are the most altered variables xj when ykchanges? » Spectral data: X(n x m) Design dataY(n x l) y1= age of the rat yk=y2= severity of diabetis xj

  4. Example: controlled data • Advantage of controlled data: we know the spectral regions that should be identified as biomarkers. • The controlled data : • 28 spectra of 600 points: X(28 x 600) • Each spectrum = a sample of urine + a chosen concentration of Citrate + a chosen concentration of Hippurate X(28x600) Y (28x2) y1= concentration of citrate y2= concentration of hippurate • We need a biomarker to detect changes of the level of citrate described by y1 « Which are the spectral regions xj the most altered when the y1 changes?» Spectral regions corresponding to Citrate = the biomarkers to identify. hippurate (y2) citrate (y1)

  5. spectrum 3000 points … A spectrum of 600 values xj with ∑ xj = 1 • Urine • Citrate • Hippurate = Natural urine Hippurate y2 + Hippurate + Citrate y1 • 14 mixtures in 2 replications • 28 samples Citrate The biomarkers to identify.= spectral regions corresponding to Citrate

  6. NEW USUAL I. Reduction of the dimension: PCA I. Reduction of the dimension: ICA XC = TP • XTC = SAT • Components : • are independent • with a biological meaning • Examination of the ALL components: • to visualize unconnected • molecules in samples • Principal components are : • uncorrelated • in the direction of maximum of variance • Examination of the 2 first components: Score plot Loadings L1 L2 II. Biomarker discovery through Statistical modelling ex: Citrate plays an important role Identificationof biomarker Comparison of the intensities of biomarkers between spectra from ≠ conditions Identificationof biomarker This is only powerful if the biological question is related to the highest variance in the dataset!

  7. The proposed methodology: • Resulting components: • meaningfullcomponent • - with some advantages over • the principal components Part I : Dimension reduction with ICA on the spectral data: XTC= S.AT Part II: Biomarker discovery through statistical modelling Identification of biomarker Comparison of the values in biomarker between spectra from different conditions

  8. What is Independent component analysis (ICA)? • The idea: • Each observed vector of data is a linear combination of unknown independent (not only linearly independent) components • The ICA provides the independent components (sources, sk) which have created a vector of data and the corresponding mixing weights aki. • How do we estimate the sources? with linear transformations of observed signals that maximize the independence of the sources. • How do we evaluate this property of independence? Using theCentral Limit Theorem(*),the independence of sources components can be reflect by non-gaussianity. Solving the ICA problem consists of finding a demixing matrix which maximises the non-gaussianity of the estimated sources under the constraint that their variances are constant. • Fast-ICA algorithm: - uses an objective function related to negentropy - uses fixed-point iteration scheme. * almost any measured quantity which depends on several underlying independent factors has a Gaussian PDF

  9. I. Dimension reduction by ICA : X(nxm)n spectra defined by m variables ex: (28x600) Each spectrum is a weighted sum of the independent spectral expressions which each one can correspond to an independent (composite) metabolite contained in the studied sample. (aT , weight  quantity) Transposition XT(mxn) Centering mixture and sjhave zero mean XTC= S.AT XTC(mxn) “Whitening”: PCA • we obtain uncorrelated scores with unit variance → demixing matrix is orthogonal • possibility to discard irrelevant scores = chose the number of sources to estimate T(mxq) = XTC. P ICA S (mxq) = XTC. P.W = XTC. A

  10. Example: I. Dimension reduction by ICA XTC= S.AT XTC(600 x 28) = S (600 x 6) AT (6x28) s1 s2 s3 s4 s5 s6 xTC1 xTC28 at1 at2 at3 at4 at5 at6 Urine + citrate + hippurate

  11. S (600 x 6) AT 28 spectra Natural urine aTi,8 Citrate Hippurate

  12. Hippurate Hippurate Hippurate Citrate Citrate Citrate Note: Comparison with the usual PCA • Similarities:projection methods linearly decomposing multi-dimensional into components. • Differences: • The number of sources, q, has to be fixed • Sources are not naturally sorted according to their importances • The independence condition = the biggest advantage of the ICA: - independent components are more meaningful than uncorrelated components - more suitable for our question in which the component of interest are not always in the direction with the maximum variance . PCA ICA 1 2 Natural urine

  13. PCA ICA Hippurate & Citrate Natural urine Loading 1 s1 Citrate Loading 2 s2 Hippurate & Citrate Hippurate Loading 3 s3 PC2 aT3 PC1 aT2

  14. The proposed methodology: Part I : Dimension reduction with ICA on the spectral data: XTC= S.AT q sources representing the spectra of independent (unrelated) composite metabolites contained in the samples. Part II: Biomarker discovery through statistical modeling - on the mixing matrix AT - with covariates chosen in the design matrix Identification of biomarkers Comparison of the intensities in biomarkers between spectra from different conditions

  15. PART II: Biomarker discovery with statistical model The idea: Among the q recovered sj , we suppose that some sources: • present biomarker regions for a chosen factor yk • are interpretable as the spectra of pure or composite independent metabolite which has a concentration in the samples influenced by a chosen factor yk • have weights influenced by a chosen factor yk AT (q x n) Modelisation of the relation between the weight vector and the design variables

  16. PART II: Part I : ICA on the spectral data: XTC= S.AT Part II: Biomarker discovery through statistical modelling Step 1: Fit a linear model on AT • relation between the weight vector and the covariates in design variables • different models. Step 2: Biomarker identification • apply statistical tests on the parameters • of the models • selection of sources with • significant effects. Step 3: comparison of the intensities in biomarkers between spectra from different conditions • prediction of mixing weights by the model • reconstruction with biomarkers sources • comparison between factor levels.

  17. Step 1: Fit a model on AT • The design matrix Y is rewritten into 2 separate matrixs: • Z1 the (n x p1) incidence matrix for the p1 covariates with fixed effects • Z2the (n x p2) incidence matrix for the p2 covariates with random effects • For each of the q recovered sj, we assume a linear relation between its vector of weights and the design variables: aj= Z1βj+ Z2 γj +εj • Models with only fixed effects covariates : aj= Z1β j + εj • Case 1: categorical covariates: ANOVA →ex1: biomarker to discriminate 2 groups of subjects: disease & sane. ex2: biomarker to discriminate 3 groups of subjects: disease1, disease2 & sane • Case 2: quantitative covariates : linear regression → ex: biomarker to explore the severity of an illness, the concentration of a drug • Models with only random effects covariates : aj= Z2 γj+ εj → ex: biomarker to explore variance component (machines, subjects, laboratories) • Models with fixed and random effects covariates : Mixed model: aj= Z1β j+ Z2 γj +εj

  18. Step 1: Fit a model: example • For each of the q = 6recovered sj,we construct a multiple linear regression model with 2 quantitative covariates ( p = 3) and no interaction: aj= βj0 + βj1 y1 + βj2 y2 +εj with βj0 = the intercept y1= the citrate concentration in mg (quantitative) y2= the hippurate concentration in mg (quantitative) β1= the effect of citrate on the mean aj for a fixed value of hippurate β2= the effect of hippurate on the mean aj for a fixed value of citrate εj= the vector of independent random error ~ N (0,σ2) • For each of the q recovered sj, the fitted model by least square technique is : âj= bj0 + bj1 y1 + bj2 y2 • In this example, we want to identify biomarkers for the concentration of citrate. The covariate of interest yk= y1

  19. Step 1: Fit a model: example s3:Hippurate s2: Citrate a2 a3 Citrate (y1) Citrate (y1) hippurate (y2) hippurate (y2) (y1) (y1)

  20. Step 2: Biomarker identification • Goal: • Among the q sources, we want to select the onespresenting a significant effect of the chosen covariate yk on their weights. • These “discriminant sources” represent the spectrum of an independent metabolite with a concentration depending on the chosen covariate = biomarkers. • For each source sj, test the significance of the parameter βjk of the covariate of interest yk : (ex: research of biomarkers for the dose of citrate y1, we testeach ofthe 6 βj1) • H0: βjk = 0 vs H1: βjk ≠ 0 • compute the following statistic: tj = bjk / s(bjk) ~ t(n-p) • take the corresponding p-value: pj= P( t (n-p) |tj| ) • We are in a multiple tests situation: the selection of a significant set of r coefficients βjk based on q pj obtained from q individual tests. → Bonferroni correction: select, in a (m x r) matrix S*, the r sources with pj < 0.05/q

  21. Step 2: Biomarker identification: example We research of biomarkers for the dose of citrate y1=yK → we testeach ofthe 6 βj1 P-values α = 0.05/6 Sources 9.18 x 10-13 1.84x10-15 2.86 x 10-31

  22. Step 3: Comparison of the intensities in biomarkers • Goal:comparison of the effects on the biomarker caused by ≠ changes in yk. • Choose 3 or more values of yk: • yk1 : a first value of reference of yk • yk2 : a new value of interest of yk • yk3 : a second new value of interest of yk • Compute: • The effect on the biomarker of the change of yk from yk1 to yk2 : C1= S* βk* (yk2- yk1 ) • The effect on the biomarker of the change of yk from yk1 to yk3 : C2= S* βk* (yk3- yk1 )

  23. Hippurate Step 3: example yk1 yk2 yk3 yk4 Citrate yk

  24. Conclusions: • With the presented methodology combining ICA with statistical modeling, • we visualize the independent metabolites contained in the studied biofluid (through the sources) and their quantity (through the mixing weights) • we identify biomarkers or spectral regions changing according to a chosen factor by a selection of source. • we compare the effects on this spectral biomarkers caused by different changes of this factor. • In comparison with the PCA, ICA: • gives more biologically meaningful and natural representations of this data.

  25. Thank you for your attention

  26. Hippurate Citrate Example2: the data • 18 spectra of 600 values • 1 characteristic in Y X(18x600) Y (18x1) y1= disease group of the rat (qualitative) • We want biomarkers for group of disease described in y1. → a model with qualitative covariates Group 1= disease 1 Group 2= disease 2 Group 3= no disease

  27. Example2 :Part I. Dimension reduction by ICA XTC= S.AT S (600 x 5) AT (5x18)

  28. Example 2: Part II: biomarkers discovery through statistical modeling Step 1: Fit a model on ATModels with only a categorical covariate with fixed effects: ANOVA I aj= Z1β j + εj Step 2: Biomarker identification: • For each of the q recovered sj, test the effect of y1 → Fj statistics→ pj • Bonferroni correction: select, in a (m x r) matrix S*, the r sources with pj < 0.05/q 0.009797431 0.0002412604 0.005710213

  29. Step 3: Comparison of the intensities in biomarkers • Goal:comparison of the effects on the biomarker caused by ≠ changes in yk. • Choose 3 or more values of yk: • yk1 : a first value of reference of yk • yk2 : a new value of interest of yk • yk3 : a second new value of interest of yk • Compute: • The effect on the biomarker of the change of yk from yk1 to yk2 : C1= S* βk* (yk2- yk1 ) • The effect on the biomarker of the change of yk from yk1 to yk3 : C2= S* βk* (yk3- yk1 )

  30. Hippurate Step 3: Comparison of the intensities in biomarkers Goal:comparison of the effects on the biomarker caused by the changes of group. Citrate

  31. Others slides

  32. Example 1 : Step 4: Comparison of the intensities in biomarkers • For a first chosen value of interest of yk (not necessary observed): yk0 • Choose a value of reference for the other factors: y0≠k → Z01 vector of values : (1, yk0,y0≠k) • For each of the r source selected in S*, use the model to predict weights for the biomarkers: âj* (yk0) = bj*Z01 → â* (yk0) a vector of r new weights • Reconstruction of the values to expect in the biomarkers: S*â*(yk0)

  33. Example 2: the reconstructed spectra

More Related