Multivariate Regression Analysis

Multivariate Regression Analysis

Aim • Establish a predictive model between one or more response variables and one or more input variables Measurement Response

Areas where Regression Analysis is useful • Process and Environmental Monitoring • Process Control • Product Quality/Product Properties

Why? • Reveal correspondences/correlations • Increased Accuracy/Precision in the Information Process • Improved (reduced) Response time in the Information Process (“on-line”, “at-line”)

How? 1. Collect Data 2. Analyse Data 3. Establish a Predictive Model Y = BX, yi = f (x1, x2, .., xm) y = bx, y = f (x1, x2, .., xm)

Start

m y = X + e b m n n m ^ y = Xb Multivariate Regression Model: y = Xb + e

The solution of regression problems y = Xb + e When e is minimised: y = Xb Xty = XtXb The “Normal equation”:(XtX)-1Xty = b Minimise with respect to b0, b1,…,bM Condition: XtX must have full rank

Problems • Many x-variables, few objects (measurements) • Correlation between the x-variables det |XtX |  0 (XtX)-1 does not exist! • “Noise” in X

Generalised inverse Generalised inverse:X+ = (XtX)-1Xt  Normal equation: b = X+y Biased Regression Methods differ in the way that the Generalised Inverse is calculated

Latent Variable Regression

Problem Specification Standards with known concentrations are measured on two highly correlated wavelength. Make a calibration model between the concentrations and the measured intensities at the two wavelengths: c = f(x1,x2)

x2  7 PC1  5 t1  6 t2 x1 3  .  4 . .  1 tN  2 Dimensionality Reduction t, score vector  c, concentration vector Quantitative information about the concentration in t

 PC1 y ^ ^   y1 t1 = bPC1  t2 y2 . . ^ . .  t = f(x1, x2) = f(c) . .  tN yN  The Regression

^ ^ t1 y1 = bPC1 y - y = bPC1t + e t2 y2 . . ^ . . . . yN tN tt(y - y ) bPC1 = ttt Calculation of the Regression Coefficient

Response (output) variable System y Instrumental (spectral) variables I y = f(X)  I X  Regression modelling

A X = TPt + E = tapat + E a=1 A y = y+bata + e a=1 Solution 1. Decompose the matrix of spectral data (X) into (orthogonal) latent variables (LVs) 2. Model the dependent variable in terms of the latent-variable score vectors

Scores: t = f (c1, c2, …) Contains quantitative info about the concentrations Loadings: p= f (1, 2, …) Contains qualitative info about the spectra Scores and Loadings

Partial Least Squares (PLS) - best for prediction Principal Component Regression (PCR) - best for outlier checking Regression Methods  Combine the methods

Visualisation of PLS X Y

= bLV t1 t2 tA y-y orthogonal y = y + bLV1t1 + bLV1t2 + .. + bLVAtA Data described by several Latent Variables Model:

A y - y = bLV,ata + e a=1 A tbt(y - y)= bLV,a tbttLVa + e a=1 zero, except for a=b (y - y)tbt bLV,B= tbt tb Calculation of the regression vector 

Latent-Variable Regression Modelling The Modelling process Validation Interpretation (Regr. coeff., loadings) Number oflatent variables (Explained var. in X and Y, Cross Validation, Regr. Coeff., Loadings etc.) OutlierDetection

Cross Validation (statistical validation) i) Divide the samples into a number of groups, ng. ii) For each LV dimension, a=1,2,.., A+1, perform the following calculations:1. Estimate the LV a with group k of samples excluded. 2. Predict the responses for samples in group k. 3. Calculate the squared prediction error for the left-out samples, iii) Repeat step ii)until all samples have been kept out once, and only once, then calculate iv) If SEP(a)<SEP(a-1) go to ii), otherwise stop and select number of dimensions (LVs) in model as a-1, A

Application Example 1 Process industry, where the principal qualities1 of products are linked to chemical composition of raw material and the manufacturing process. 1 O. M. Kvalheim, Chemom. & Intel. Lab. Syst. 19 (1993) iii-iv.

Application Example 2 Environmental sciences, such as the prediction of the diversity of a biological system from instrumental fingerprinting of the chemical environment, principal environmental responses.

Multivariate Regression Analysis