Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
CALIBRATION Prof.Dr.Cevdet Demir firstname.lastname@example.org
LINKING TWO SETS OF DATA TOGETHER • Peak height to concentration • Spectra to concentrations • Taste to chemical constituents • Biological activity to structure • Biological classification to chromatographic peak areas
NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL PARAMETER e.g. concentration or biological classification WE TAKE SOME MEASUREMENTS e.g. spectra or chromatograms WE WANT TO USE THESE MEASUREMENTS TO GIVE US A PREDICTION OF THE FUNDAMENTAL PARAMETER
UNIVARIATE CALIBRATION • One measurement e.g. a peak height • MULTIVARIATE CALIBRATION • Several measurements e.g. spectra
NOTATION “x” block is measured data e.g. spectra, chromatograms, GCMS of biological extract, structural parameters “c” block is what we are trying to predict e.g. concentration, species, acceptability of a product, taste
Measurement Response e.g. e.g. Spectroscopic spectrosc opic X Y Calibration Experimental design C X Predicted Independent parameter , e.g. variable , e.g. Concentration Concentration
c x c X C X
MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY • Single component. • Example, concentration of chlorophyll a by uv/vis spectra. • Mixture of components, all compounds known. • Example, mixture of pharmaceuticals, all pure compounds known.
Mixture of components, only some compounds known. • Example, coal tar pitch volatiles in industrial waste studied by spectroscopy, only some known. • Statistical parameters. • Example, protein in wheat by NIR spectroscopy.
UNIVARIATE CALIBRATION “x” and “c” blocks consist of single measurements. Traditional analytical chemistry CLASSICAL CALIBRATION xc . s Unknown : s sc+ . x where c+is the pseudo-inverse
x c s =
x c TREATMENT OF ERRORS IN CLASSICAL CALIBRATION
PROBLEMS 1. Modern lab : dilution and sample preparation errors (in “c”) are probably bigger than spectroscopic errors (in “x”). Spectra are more reproducible. Differs to classical statistics. 2. Want to predict concentration from spectra etc. not vice versa. Most classical textbooks in analytical chemistry and most spreadsheets incorrectly recommend classical calibration.
x INVERSE CALIBRATION cx . b Unknown : b bc . x+ c
x c b = =
INCLUDING THE INTERCEPT : first column of “x” is 1s • c b0+ b1x • c X . b • bX+ . c c b X = =
I 2 = - å ˆ E ( x x ) / d i i = i 1 • HOW WELL IS THE MODEL PREDICTED? • Huge number of approaches • Root mean square error(divide by degrees of freedom – number of samples – 1 or 2 according to parameters in the model). • Often express as percentage either of the mean measurement or the standard deviation of the measurements
Correlation coefficient of predicted versus true – has problems if the number of samples is small. • ANOVA and replicates analysis using lack-of-fit error, as discussed in the experimental design lectures. • Leaving samples out and predicting them : cross-validation and testing will be discussed later.
PROBLEMS • Outliers can be a major difficulty. Graphical ways of looking for outliers – big area. • Undue influence on least square models.
MULTIWAVELENGTH • Example : four compounds, four wavelengths. • MULTIPLE LINEAR REGRESSION (MLR) • X = C. B • Know • X: a series of spectra • C: concentrations
WAYS OF PERFORMING THE CALIBRATION • Producing a series of mixture spectra of known concentrations by weighing different amounts and adding together • Taking a series of spectra and calibrating against and independent method e.g. HPLC.
B = X+ . C estimated [pyrene] = -3.870 A330 + 8.609 A335 – 5.098 A340 + 1.848 A345
ˆ = C X.S+ Can also use classical methods This can be done by knowledge of the pure spectra. Different to calibration where a series of mixtures recorded
MULTIPLE LINEAR REGRESSION • Why use only 4 wavelengths? • Why not 10 or 100 wavelengths? • More information – not arbitrary choice of wavelengths. • Number of wavelengths can be greater than number of compounds.
B C X = • Example • 25 spectra • 10 compounds • 100 wavelengths
B = X+ . C • In this case • Bis a matrix of coefficients, 100 10 • Xis a spectral matrix, 25 100 • C is a concentration matrix, 25 10 • Some technical problems using inverse calibration in this case, and often it does not work.
Better approach • 1. First predict the spectra S. • Either they are known from the calibration of the pure standards • Or they can be predicted from the mixture spectra • S C+. X • 2. Then use these predictions in a model (e.g. of unknowns) • C X. S+
MLR effectively models a spectrum as a sum of spectra of the components, e.g. for a 3 component model Observed spectrum = conc A spectrum A + conc B spectrum B + conc C spectrum C
ENHANCEMENTS • Selecting only certain variables, not all the wavelengths. • Weighting of variables.
ERROR ANALYSIS This now becomes more sophisticated. In addition to errors in the “c” block (concentration errors), now also errors in the “x” block (reconstruction of spectra). Discuss later.
LIMITATIONS AND PROBLEMS WITH MLR • Number of experiments and number of wavelengths must never be less than number of compounds • All significant compounds must be known. If still unknowns, then these are mixed up with the knowns. Problems if no pure standards and no reliable reference method. THIS IS THE BIGGEST LIMITATION. • Sometimes extra wavelengths can be bad ones e.g. noise or background. • Assume that concentrations are perfectly known, errors in only one variable, using classical approach.
However if information on all the significant compounds is known then MLR is a simple an effective method.
PRINCIPAL COMPONENTS REGRESSION (PCR) Do not need to know all components in advance, simply "how many components", and the compounds of interest. Overcomes a major limitation of MLR
Detector (e.g. wavelength) Samples X PCA P T Regression concentration r T Samples c cT . r
The first step is to perform PCA. Obtain a scores matrix, retaining A components The value of A may be a guess of the number of compounds in the mixture. Then r = T+. c
Can extend to more than one concentration – C T . R T R C
Example • 25 spectra taken at 100 wavelengths • We know about and want to predict 4 compounds • We think there are around 10 compounds in the mixture, 6 are unknown. • T is a matrix of dimensions 25 10 • C is a matrix of dimensions 25 4 • R is a matrix of dimensions 10 4
Example of the calculation of the concentration of pyrene in a set of 25 uv/vis spectra containing 10 different PAHS. How many PCA components to use? The prediction gets better the more the number of components.
ERRORS – “x” block Simply as in PCA, look at eigenvalues as more principal components are calculated
ERRORS – “c” block Look at errors in calculation of concentrations – often different behaviour
0.8 0.7 0.6 0.5 0.4 predicted concentration 0.3 0.2 0.1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 observed concentration 0.8 0.7 0.6 0.5 0.4 predicted concentration 0.3 0.2 0.1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 observed concentration 0.8 0.7 0.6 0.5 0.4 predicted concentration 0.3 0.2 0.1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 observed concentration Predictions for pyrene concentration using 1, 5 and 10 principal components.
Why not use a large number of PCA components? Then one can get perfect prediction? FALLACY : the idea is to predict unknowns, after the knowns have been modelled. Later PCs often model noise. Choose no of PCs equal to number of compounds in the mixture? Methods for determining number of PCs described later when this is unknown.
Advantage over MLR - only partial knowledge necessary. • Disadvantage : assumption that all errors in the "x" block. • Practical situation. • Modern instruments very reproducible. • Volumetrics, measuring cylinders, syringes are inaccurate.
PARTIAL LEAST SQUARES (PLS) This technique assumes that errors in both “x” and “c” block are equally significant.
. P = + E X T . q + = f c T
What does this mean? X = T.P + E c = T.q + f
THERE IS A COMMON SCORES MATRIX FOR BOTH “x” AND “c” BLOCKS. In PCR we calculate the scores just for the “x” block and then use a separate step for regression. A big difference between PCR and PLS is that in PCR there is only one scores matrix whereas for PLS (using 1 column) there are different scores matrices according for each compound. The vector q is analogous to loadings.