Calibration in Environmental Analysis Issues and Proposals for Improvement Richard Burrows

Calibration in Environmental AnalysisIssues and Proposals for ImprovementRichard Burrows FSEA May 2008 ©2007, TestAmerica Analytical Testing Corp. All rights reserved. TestAmerica & Design TM are trademarks of TestAmerica Analytical Testing Corp.

Premise 1 • The impact of calibration models on the ability to detect and quantify analytes is substantial. • Nonetheless, use of the appropriate calibration models is poorly understood and poorly controlled, and in many cases we are instructed to use calibration models that produce false positives, false negatives, and wildly inaccurate quantitation.

Instrument Calibration • What do we want from the calibration? • Accurate translation of instrument response to analyte amount • Minimize the errors introduced by the calibration itself

Relative vs. Absolute Errors • How does the system behave? • Which kind of error are we measuring with our QC? • Which is more important from the risk standpoint?

What do we care most about? Calibration curve 1-100ppb • Do we prefer and expect: • +/- 5ppb at all levels (Absolute error) • +/- 10% at all levels (Relative error)

Risk The difference in risk level between a concentration of 100 and 110 is small, but the difference between 0 and 1 may be very large.

Characteristics of Variance • Method 3520/8270, 8 replicates prepared and analyzed at 100ppb, 10ppb, 1ppb Average of 84 analytes Unweighted regression is valid onlyif the standard deviation is constant across the range.

What type of Error? We are concerned about relative error We need a: • Calibration model that minimizes relative error • Measure for the calibration curve that evaluates how much relative error exists

EPA calibration criteria • Linear regression is an option for almost all EPA methods • For linear regression, unweighted is usually the first option • Correlation coefficient is always the measure for a linear regression • Some methods have an option for average response factor

Least squares curve fitting Assumption: Constant Variance

A Simple Calibration Curve

Effect of Weighting

Weighted Regressions • Unweighted Σ [(predicted – actual)2] • 1/X Weighting Σ [[(predicted – actual)/ conc]2] • 1/X2 Weighting Σ [[(predicted – actual)/ (conc)2]2] • Weighted regressions tend to minimize relative error as opposed to absolute error • Weighted regressions are appropriate when variance is not constant (for example most environmental analysis method have fairly constant relative standard deviation (RSD) across the calibration range)

Unweighted Linear Regression • Unweighted regressions minimize the absolute residuals • In a calibration from 1-100, an error (residual) of 5 at the 1.0 point has the same weight as an error of 5 at the 100 point. • 1/ (Conc)2 weighted regressions minimize the relative residuals • In a calibration from 1-100, an error (residual) of 5% at the 1.0 point has the same weight as an error of 5% at the 100 point.

Calibration Approaches • SW-846 Method 8000B • ..begin with the simplest approach, the linear model through the origin, and progress through other options until calibration criteria are met • If RSD is < 20%, linearity through the origin may be assumed, and the average calibration or response factor may be used to determine sample concentrations

Linear Regression • SW-846 Method8000B • If the RSD is > 20% then linearity through the origin cannot be assumed. In this case, the analyst may employ a regression equation that does not pass through the origin. • 8000C • Linear least squares regression may be employed based on past experience or at the discretion of the analyst.

Weighting • 8000B • The analyst may also employ a weighted least squares regression if replicate multi point curves have been performed 1/SD2 • 8000C • Weighting may significantly improve the ability of the regression to fit the linear model to the data. • The mathematics used in the least squares regression has a tendency to favor numbers of larger value over numbers of smaller value. Thus the regression curves that are generated will tend to fit points that are at the upper calibration levels better than those points at the lower calibration levels.

Method 8000C • Weighting • Examples of weighting factors which can place more emphasis on numbers of smaller value are: • wi = 1/yi or wi = 1/yi2 • These weighting factors are recommended if weighting other than wi = 1 is to be used

Method 1631 Guidance • Weighting • “An unweighted regression is incorrect for nearly all instruments and analytical systems.” • “The calibration included a data point at the Method 1631 MDL (0.2 ng/L). The RSD for the CF/WR approach was 7.8 percent. The coefficient of determination (r2) for the unweighted approach was 1.000, indicating no error in calibration. The reason for the indication of zero error is that the low calibration points are, essentially, unweighted. Therefore, the unweighted regression is equivalent to a single-point calibration at the highest calibration point. We do not believe that this form of calibration is consistent with the best science.”

Acceptance Criteria

Calibration criteria • Acceptance criteria • RSD must be less than 20 (15)% or • r, COD or r2 must be greater or equal to 0.99

Second Premise The Correlation coefficient (and the coefficient of determination) are pretty much useless for evaluating the suitability of a calibration curve.

Correlation Coefficient • Correlation Coefficient Pros • Included in most EPA methods • Correlation coefficient Cons • Not technically justified • Does not establish the best fit for environmental analysis • Cannot be compared to the RSE of an average RF

Correlation Coefficient • For most applications, and calibration curves in particular, the correlation coefficient must be regarded as a relic of the past • Meier and Zund, Statistical Methods in Analytical Chemistry, 2000

Correlation Coefficient • “The correlation coefficient in the context of linearity testing is potentially misleading and should be avoided” • Royal Society of Chemistry, Technical brief • “The author has seen cases where a correlation coefficient of 0.997 was believed to be a better fit than 0.996 of a 5 point calibration curve. One can even find requirements in quality assurance plans to recalibrate if the correlation coefficient is less than 0.995!” • Taylor, Statistical Techniques for Data Analysis, 1990

IUPAC • Guidelines for calibration in Analytical Chemistry, 1998 • “The correlation coefficient which is a measure of relationship of two random variables, has no meaning in calibration….because the values x are not random quantities in the calibration experiment”

Correlation Coefficient • “One practice that should be discouraged is the use of the correlation coefficient as a means of evaluating goodness of fit of linear models” • Van Arendonk and Skogerboe, Anal. Chem. 53, 1981, 2349-2350

What alternatives are available?

Calibration Objectives • The calibration model should minimize relative error • The calibration measure(s) should determine how well this objective is met

One Additional Requirement • If we accept different ways of evaluating the curve, we want some consistency. • We don’t want one measure to say a curve is good, and another measure to say that it is bad

Relative Standard Deviation • RSD Pros • Simple • Evaluates relative error • Reasonable criteria established and included in many SW-846 methods • RSD Cons • Can only be applied to the average response factor type calibration – not suitable for linear or quadratic regressions

= predicted response from curve RSD and RSE C = curve coefficient x = Concentration y = response RSE = RSD when calculated for the average COULD USE THE SAME CRITERIA

EPA clarification memo on the use of SW-846 methods, Aug 7 1998 • “Further, the Agency recognizes that the relative standard error (RSE) is a useful measure of the goodness of fit of a calibration model that the Agency had not previously considered. The RSE is useful for both linear regression models as well as non-linear models, as it considers the error at each point in the calibration model as a function of the concentration of that standard.”

EPA clarification memo on the use of SW-846 methods, Aug 7 1998 • “Using the RSE as a metric has the added advantage of allowing the same numerical standard to be applied to the calibration model, regardless of the form of the model. Thus, if a method states that the RSD should be <20% for the traditional linear model through the origin, then the RSE acceptance limit can remain 20% as well. Similarly, if a method provides an RSD acceptance limit of 15%, then that same figure can be used as the acceptance limit for the RSE.”

The Calibration Curve that Can’t Fail! (A Digression) • “We really want to make sure we carefully define the low end of the curve.”

Reporting limit corresponds to the low point on the curve RSE = 149%

The Impact of Calibration Models on Analyte Detection and Accuracy at Low Concentrations • Example GC/MS Data • Example ICP/MS Data • Example IC data Data

GC/MS Data • Three calibration models, • Average response factor • Linear regression with no weighting • Linear regression with inverse square weighting. • If a sample gave the same response as our low standard, what would we detect and report?

One calibration, processed three different ways

Three different results

ICP/MS Data • Compare Continuing Calibration Blank results using two different calibration models, linear regression without weighting and linear regression with 1/X weighting.

The test • If the CCB result is greater than the MDL, you have a high risk of false positives • If the CCB result is less than the negative value of the MDL, you have a high risk of false negatives

CCB 1

Weighted versus unweighted • Unweighted – 50% fail test. CCB results are either > MDL or < -MDL • Weighted – 1.2% fail test. One high result for molybdenum • Same data, same instrument same sensitivity, only difference is calibration model

Method 300 example

Calibration in Environmental Analysis Issues and Proposals for Improvement Richard Burrows