- 650 Views
- Uploaded on
- Presentation posted in: General

Bivariate data – Graphical & statistical techniques

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Bivariate data

- Graphical techniques
- Scatter-plot
- Difference-plot
- Residual-plot
- Krouwer-plot
- Influences on the plots (data-range; subgroups; outliers; scaling)
- Influences of random- and systematic errors on the plots
- Linearity
- Specifications in plots

- Combined graphical/statistical techniques
- The Bland & Altman approach

- Correlation
- The statistical model
- Correlation in method comparison
- Non-parametric correlation

- Regression
- Ordinary linear regression (OLR)
- Deming regression
- Passing-Bablok regression (non-parametric)
- Weighted regression
- Regression & method comparison
- Regression & calibration

Datasets; GraphBivariate-EXCEL;

Correlation&Regression; CorrRegr-EXCEL; Bland&Altman

Statistics & graphics for the laboratory

Graphical techniques

- Construction of the axes
- x-axis: comparative method (A)
- y-axis: test method (B)
- line of equality ( y = x ): - - -

- The absolute difference plot
- Construction of the x-axis
- Hierarchically higher (A) and lower method (B)
- x-axis: hierarchically higher method (A)

- Hierarchically equivalent methods
- x-axis: (A + B)/2

- Hierarchically higher (A) and lower method (B)
- Construction of the y-axis
- y = B - A

Usually, both axes extendfrom 0 to the highest result

– y-axis is freely scalable

– x-axis bisects the y-axis at 0

Statistics & graphics for the laboratory

Graphical techniques

- Construction of the x-axis
- Hierarchically higher (A) and lower method (B)
- x-axis: hierarchically higher method (A)

- Hierarchically equivalent methods
- x-axis: (A + B)/2

- Hierarchically higher (A) and lower method (B)
- Construction of the y-axis
- y = [(B - A)/A]x100, or[(B - A)/0.5x(A +B)]x100

- x-axis: comparison method (A)
- y-axis: regression (OLR) residuals: yi - ŷ

– y-axis is freely scalable

– x-axis bisects the y-axis at 0

– y-axis is freely scalable

– x-axis bisects the y-axis at 0

Statistics & graphics for the laboratory

Graphical techniques

- Construction of the axes
- x-axis: %-bias
- y-axis: “folded” cumulative percentage

- Construction of the Krouwer plot
- "Folded cumulative percentage"

Statistics & graphics for the laboratory

Graphical techniques

- Scatter plot (with the line y = x)
- Simple construction: the same for methods with the same/different hierarchy
- Good overview about the data through the comparison with the y = x line
- “Difference” plots (absolute, %, residuals)
- y-axis is freely scalable
- Construction depends on method hierarchy
- The residuals plot can only be constructed with knowledge of regression data
- Not a pure graphical technique, but: useful for the judgement of linearity (shown later)

Statistics & graphics for the laboratory

Datasets-MethComp

Graphical techniques

- -Graphical resolution of the scatter plot: worse than the bias plots.
- -The resolution of the scatter plot can be improved by an insert.
- -Graphical resolution of the scatter plot: worse than the bias plots.
- -Scatter plot: improve resolution by an insert or logarithmic scale.
- Don't expect that "one size fits all"

Statistics & graphics for the laboratory

Graphical techniques

- Note: y-axis of the difference plot is freely scalable! Therefore, its graphical resolution, usually, is better than the one of the scatter plot
- A “subgroup” is easier to see in the %-difference plot
than in the scatter plot

Outliers

- Glucose“normal”
- Glucose“withoutliers”
- Outliers have no influence on the resolution of the scatter plot, but reduce the resolution of the difference plots.
- Scatter plot more robust than difference plots

Statistics & graphics for the laboratory

Graphical techniques

- y-scaling …Effect…
- A: "as the data are"Good resolution, but x- & y-axis
- cannot be compared directly
- B: freeGood/poor agreement
- can be manipulated graphically
- C: identical (graphical distance)Loss of resolution,
- x and y scalingbut better direct comparison possible
- Graphs and errors
- Random errors
- SD constant (small range; e.g., sodium)
- CV constant (medium range; e.g., glucose)
- SD/CV variable (wide range; e.g., estradiol)

Common situation

CV constant/SD decreasing down to a certain concentration, then

SD constant and CV increasing

Statistics & graphics for the laboratory

Graphical techniques

- Systematic errors
-Constant

-Proportional

-Combination

(constant/proportional)

-Non-linearity

- Graphs and errors
- Examples
- Systematic errors
- y = x
- y = 1.1 • x
- y = x + 1
- Random errors
- General examples withCV = 2% and SD = 0.1

Statistics & graphics for the laboratory

Graphical techniques

- What could be noted?
- For case 1: y = x
- Better resolution of the difference plots
- Scatter plot
- At constant CV, typical V-form of the random error limits
- At constant SD, parallel limits for random error

- Absolute difference plot
- At constant CV, typical V-form of the random error limits
- At constant SD, parallel limits for random error

- %-difference plot
- At constant CV, parallel limits for random error
- At constant SD, typical hyperbolic limits for random error

Statistics & graphics for the laboratory

Graphical techniques

- What could be noted?
- (additionally to y = x)
- A large proportional error
- Deteriorates the resolution of the absolute difference plot
- Has no influence on the %-difference plot

- A large constanterror
- (as compared to the random error)
- Has no influence on the absolute difference plot
- The hyperbolic error limits in the %-difference plot become “one-sided”

- Summary
- The difference plots, generally, have a better resolution than the scatter plot
- The scatter plot is robust against all sorts of errors
The limits for random error are

- V-shaped (constant CV)
- parallel (constant SD)

- The absolute difference plot is robust against constant errors, but sensitive to proportional errors (loss of resolution)
The limits for random error are

- V-shaped (constant CV)
- parallel (constant SD)

- The %-difference plot is robust against proportional errors,
- but sensitive towards constant errors
The limits for random error are

- parallel (CV constant and no constant error),
- 2-sided hyperbolic (SD constant and no const. error), or
- 1-sided hyperbolic (existence of a relatively big constant error)

Statistics & graphics for the laboratory

Graphical techniques

- Judgement of linearity
- Consider the following ways
- Best with regression (residuals plot)
- For a broad range
- Logarithmic
- Easier with a logarithmic plot
- Conclusion: "no size fits all"

Statistics & graphics for the laboratory

Graphical techniques

- Specifications are needed for the interpretation of a method comparison.
- We look for specifications in
- The scatter plot
- The absolute difference plot
- The %-difference plot
- The scatter plot

The absolute difference plot The %-difference plot

Statistics & graphics for the laboratory

Graphical techniques

- "Error grid analysis" (glucose)
- Summary
- The scatter plot is useful for all sorts of specifications
The limits for specifications (around y = x) are

- parallel (absolute specification)
- or V-shaped (% specification)

- The absolute difference plot is most appropriate for absolute specifications
The limits for specifications (around 0) are

- parallel (absolute specification)
- or V-shaped (% specification)

- The %-difference plot is most appropriate for % specifications
The limits for specifications (around 0) are

- parallel (% specification)
- 2-sided hyperbolic (absolute specification)

- Annex
- More examples
- Examples sorted according to plot-type

Statistics & graphics for the laboratory

Exercises

GraphBivariate-EXCEL

- This file is a template for a
- Scatter plot (with line of equality)
- Absolute bias plot (x-axis with hierarchichally higher method, only)
- % bias plot (x-axis with hierarchichally higher method, only)
- Absolute bias plot (x-axis with average x&y)
- % bias plot (x-axis with average x&y)
- Residuals plot
- It may be adapted to the needs of the user.
- This file can also be used to reproduce most of the plots in this tutorial by using the datasets in:

Datasets(Method comparison: Sodium, Glucose, Estradiol)

Statistics & graphics for the laboratory

Graphical techniques

Statistics & graphics for the laboratory

Graphical techniques

Statistics & graphics for the laboratory

Notes

Statistics & graphics for the laboratory

Combined graphical/statistical techniques

- The Bland&Altman approach for the interpretation of method comparison studies
- References
- Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician 1983;32:307-17.
- Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;307-10.
- Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135-60.
- Approach
- The goal of the Bland & Altman approach is to compare the outcome of method comparison studies in terms of systematic (SE) and total error (TE) with quality specifications for systematic (SEspec) and total error (TEspec).
- Calculations
- This requires the following calculations (note: the B&A symbols are used here)
- -Mean difference (đ) and its 95% confidence limits (CL) (equivalent to SE)
- -1.96 SDdiff and its CL (equivalent to TE)
- SDdiff = standard deviation of the differences between the methods
- Those are to be compared with the specifications in the following way:
- đ ± CL SEspec and
- 1.96 SDdiff ± CL TEspec,
- Graphics
- At the same time, Bland&Altman recommended to present the data in an absolute bias plot including the lines for đ and 1.96 SDdiff.

Original plot

Adapted from:

Bland JM, Altman DG. Lancet 1986;i:307-10.

Statistics & graphics for the laboratory

Combined graphical/statistical techniques

- Limitations of the original plot
- Does not recognize different method hierarchies
- Same hierarchy: x = (A+B)/2Different hierarchy: x = A
- In many cases a %-bias plot is more appropriate. In that connection, it is better to calculate the 1.96 CV values, because the SD is often increasing with the level, so that no mean SD exists.
- Does not include confidence limits
- Does not include TE/SE-specifications

Statistics & graphics for the laboratory

Combined graphical/statistical techniques

- Remember: Quality specifications in the
- Absolute bias plot%-bias plot
- Limitations of the original plot
- Does not recognize different method hierarchies
- In many cases a %-bias plot is more appropriate
- Does not include confidence limits
- Does not include TE/SE-specifications
- Because of these limitations, it is recommended to use an "extended" Bland&Altman plot (see next page)
- See also following references
- Stöckl D. Beyond the myths of difference plots [letter]. Ann Clin Biochem 1996;33:575-7.
- Dewitte K, Fierens C, Stöckl D, Thienpont LM. Application of the Bland-Altman plot for the interpretation of method-comparison studies: a critical investigation of its practice. Clin Chem 2002;48:799-801.
- Stöckl D, Rodríguez Cabaleiro D, Van Uytfanghe K, Thienpont LM. Interpreting method comparison studies by use of the bland-altman plot: reflecting the importance of sample size by incorporating confidence limits and predefined error limits in the graphic. Clin Chem 2004;50:2216-8.

Statistics & graphics for the laboratory

Combined graphical/statistical techniques

- Recommendations
- Construct the x-axis according to the hierarchy of the methods
- Choose a bias-plot (absolute, %) that fits your data
- Use the "extended" version of the plot (+specifications and CL's)
- (the 1-sided limits are chosen because the comparison is versus a specification).
- Be aware of the meaning of the calculated estimates "mean" bias or "mean" SD/CV
- This file contains a template for the Bland&Altman plot with pre-programmed confidence limits and entries for the SE and TE specifications. It may be adapted to the needs of the user.

Bland&Altman

Statistics & graphics for the laboratory

Notes

Statistics & graphics for the laboratory

Notes

Statistics & graphics for the laboratory

Correlation and Regression

- Correlation
- The statistical model
- Correlation in method comparison
- Non-parametric correlation

- Regression
- Ordinary linear rgression (OLR)
- Deming regression
- Passing-Bablok regression (non-parametric)
- Weighted regression
- Regression & method comparison
- Regression & calibration

Statistics & graphics for the laboratory

Correlation

- Correlation
- Correlation concerns association between variables, e. g. serum cholesterol and indicators of heart disease.
- Correlation is a descriptive measure that does not allow conclusions concerning causal relationships.
- Correlation is also used together with regression (method comparison studies)
- Comparison Correlation <> Regression
- Regression model: one variable (the dependent variable, y) is a function of another variable (the independent variable, x)
- Example: Blood pressure may be considered a function of age

- Correlation model: both variables are random effects factors
- Example: Human arm and leg lengths are correlated

- Univariate and multivariate correlation
- Univariate (simple): between two variables
- Multivariate: between several variables and an outcome measure, e.g. between serum cholesterol and triglyceride and an indicator of heart disease
- Univariate correlation and relationships of data
- Linear relationship (often implicitly assumed)
- A curvilinear relationship, e.g. a polynomial model
- Cyclical relationship

- Linear correlation – Computations
- Pearson´s product – moment correlation coefficient r
- Computation from the cross product and sums of squared deviations from the mean values:
- The correlation coefficient can be computed regardless of variable distribution types.
- Associated significance tests depend on the type of distributions and are valid for the bivariate normal distribution.
- Coefficient of determination (r2)
- Squaring r gives the Coefficient of determination which tells us the proportion of variance that the two variables have in common. For a height-weight example, r = 0.807 and squaring r gives 0.6512, which means that the height of a person explains 65% of the person’s weight; the other 35% could probably be explained by other factors, perhaps nature and nurture.

r is dimensionless and can take values from -1 to +1

Statistics & graphics for the laboratory

Correlation

- Testing against zero:
- Standard error of r :
SEr = [(1-r2)/(N-2)]0.5

t-test for significance against zero:

t = (r – 0)/SErwith (N-2) degrees of freedom

- Non-zero correlation
- r is transformed to z = 0.5 ln[(1+r)/(1-r)] (Fisher´s z-transformation, which yields a symmetric normal-like distribution)
- SE of z: [1/(N-3)]0.5
- Hypothesis testing and confidence intervals are based on the z-transformation
- Critical values for r
- Correlation and P
- A weak correlation may be highly statistically significant given a large N, e.g. as observed in large epidemiological studies.
- The clinical importance of a given degree of correlation depends on the situation.

Statistics & graphics for the laboratory

Correlation

- Meaning of the Pearson correlation coefficient, r
- Measure for the strength of linear correlation
- Becomes smaller (e.g., <1) when dispersion in y occurs
- r is a measure for random analytical error
Correlation in method comparison studies

- Systematic errors have no influence on r

A: No SE

B: Constant SE

C: Proportional SE

D: Constant & proportional SE

Statistics & graphics for the laboratory

Correlation

- The Pearson correlation coefficient, r
- Influence of the data range:
- r: increases with the range
- Inclusion of extreme values: artificial improvement of r

- Influence of the data range:
- Conclusion: correlation in method comparison studies
- The Pearson correlation coefficient, r is, as measure for method comparability, difficult to interpret
- r depends on the range of x-values. The greater the range is, the higher are the values for r
- r is not influenced by systematic errors
- Often, much too small values of r (e.g., r = 0.8) are judged as a good correlation in method comparison
Some advocate to use r as indicator for proper data distribution before applying linear regression and recommend for this purpose r-values >0.975 (small range) or >0.99 (wide range).

- However, when several methods are compared with the same data-set, r is a useful index for ranking the methods.
- Nonparametric correlation
- The parametric correlation coefficient is sensitive towards outliers.
- Nonparametric correlation coefficients (Spearman or Kendall) are more robust and are calculated on the basis of the ordered (ranked) observations.
- The computation principle is an assessment of how well the rank order of the second variable corresponds to the rank order of the first variable.

Statistics & graphics for the laboratory

Notes

Statistics & graphics for the laboratory

Regression

- Linear regression procedures
- Linear regression procedures assume a linear relationship between 2 variables (e.g., 2 methods): yi = a0 + b • xi (a0 = intercept; b = slope)
- Slope and intercept of the regression line are determined by minimizing the sum of the squared distances between the data points and the regression line (parametric procedures)

- Linear regression in method comparison gives information on:
- Constant systematic error (intercept)
- Proportional systematic error (slope)
- Random error (SDy/x)

- Non-linear or curvilinear regression procedures
- Minimize the sum of squares of the residuals on the basis of any clear mathematical relationship (polynomial, logarithmic, etc.) between two methods
- In the easiest case, the curve can be approximated by several linear regression calculations performed over different ranges of x (e.g., the low, middle, and high range)
- Curvilinear regression is most adequate for calibration purposes, either for the dose/response case, or for calibration of a routine method through method comparison with a reference method

Statistics & graphics for the laboratory

Regression

- Linear regression procedures: Overview
- Ordinary least-squares regression (OLR)
- Weighted variant

- Deming regression
- Weighted variant

- Passing Bablok regression (non-parametric)
- Ordinary least-squares regression (OLR)
- Assumptions (see also figure):
- x: error-free, which implicates that SDax = 0
- y : measurement uncertainty is present, with assumption that SDay is constant throughout the measurement range and normally distributed.

Statistics & graphics for the laboratory

Regression

- Computations
- OLR: REMARKS
- OLR minimizes the sum of the squares of the y-residuals (= deviations of yi from the regression line in y-direction)
- The regression line will pass through a centroid point that is the mean of x and the mean of y
- Disadvantage of OLR is its sensitivity towards outliers (i.e. extreme values of x or big residuals in the y-direction)
- OLR gives biased slope estimate in case of narrow range and measurement error in x
- Linear regression estimates: graphical presentation
- OLR: Limitation SDay constant
- SDay is normally not constant but increases with increasing values of x (when measurement values are distributed over a decade or more). This is reflected in the residual plot by a trend towards increasing scatter at high levels.
- >Because of the latter, weighted forms of linear regression have been introduced.

- Statistical estimates of OLR
- Slope (b), SE(b) & 0.95-confidence limits (CLs) for b
- Intercept (a0), SE(a0) & 0.95-CLs for a0
- Standard error of the y-estimate (SDy/x)
- Regression residuals
- 95% prediction interval for single points

Statistics & graphics for the laboratory

Regression

- Weighted linear regression
- In a weighted regression procedure, SDa
- is regarded proportional to a function
- of the level x (=ch(x)) and weights are:
- wi = 1/h(xi)2

- Weights are inversely proportional to SDay
- Example:
- Proportional relationship
- between SDa and xi (Fig., upper part):
- wi = 1/(xi)2
- Ppossibly truncated at a low limit
- (Fig., lower part)
- Centroid of a weighted regression line
- The centroid point of the
- weighted regression line does not pass
- through the mean of x and y
- but lies more at the side of the origin
- Further limitations of OLR
- Biased slope estimate in case
- of narrow range (x)
- and analytical error (a) in x
- ß´= ß/(1+λ)
- λ = ơa2/ ơX2
- This leads to incorrect
- testing of significance
- Limitation: error (a) in x
- In method comparison studies, the assumption of an error-free x is often not valid. For that reason, regression techniques have been developed that allow error in both variables (x & y): >e.g., Deming regression

Statistics & graphics for the laboratory

Regression

- Assumptions:
- x & y: measurement errors may be present in both, with SDax = SDay orSDax and SDay related (SDax/SDay)
- SDax and SDay: constant throughout the measurement range and normally distributed

- Deming regression estimates a straight line by minimization of the sum of squared distances at an angle to the regression line dependent on the relation between the x and y precision, resulting in an estimate without bias (contrarily to OLR that gives a biased slope estimate in case of SDax0)
- Graphical representation of the assumptions:
- measurement uncertainty in both x and y
- Computation of the Deming regression line
- Minimization of the weighted sum of squared
- distances to the line:
- S = [(x - X)2 + (y-Y)2]; = SDax2/ SDay2
- which provides the solution:
- b =[(λq - u)+[(u- λq)2+4 λp2]0.5]/ 2λp
- a0 = ym - bxm
- Computation of standard errors SE(a0) and SE(b) requires a specialized procedure, e.g. the jackknife method.
- Jackknife principle
- Computerized resampling principle. Sampling variation is simulated by consecutively withdrawing one (x, y) of the set with recalculation of estimates
- From the dispersion of estimates, SEs are derived.

The model assumed in

Deming regression analysis

Statistics & graphics for the laboratory

Model assumed in weighted

Deming regression analysis

weighted

Regression

- Higher efficiency in case of proportional measurement uncertainty (constant CV), reflected by more homogenous scatter of standardized residuals and smaller SEs of slope and intercept estimates.
- Method comparison example (Datasets-MethComp)
- (n = 50; Statistics: CBstat, K. Linnet)
- Deming regression:
- Slope: b = 1.053
- SE(b) = 0.023; 0.02< P < 0.05
- Intercept:a0 = -0.22
- SE(a0) = 0.19; n.s. from 0
- No significant deviation from linearity
- No outliers
- Residuals show increased scatter
- at high levels: poor model fit.
- Weighted Deming regression:
- Slope: b = 1.032
- SE(b) = 0.012; 0.01< P < 0.02
- Intercept:a0 = -0.01
- SE(a0) = 0.07; n.s. from 0
- Homogeneous scatter of residuals:
- Better model fit.
- Weighted versus unweighted Deming
- Smaller SEs of slope and intercept:
- SE(b) = 0.012 versus 0.023 unweighted
- SE(a0) = 0.07 versus 0.19 unweighted

Statistics & graphics for the laboratory

Regression

- Weighted versus unweighted Deming regression
- Given measurement values distributed over a decade or more, the analytical SD seldom is constant but varies often proportionally, so that the CV% is about constant.
- In this case, it is advantageous to apply a weighted analysis, which provides lower SEs of estimates.
- Weighted Deming regression analysis covers the probably most commonly occurring data situation in method comparison studies.
- Passing-Bablok regression
- Assumptions:
- Passing-Bablok regression is a non-parametric method, making no assumptions about distribution of errors. May be used in case of constant or proportional error
- It assumes that the ratio SDax/SDay is equal to the slope
- Passing-Bablok regression uses the slopes between any two data points xi/yi to calculate the slope of the regression line. The intercept is estimated so that at least half of the data points are located above or on the regression line and at least half the data points below or on the regression line.
- An advantage of Passing-Bablok regression is its robustness to outliers.
- A disadvantage are the broader confidence intervals (due to the nature of non-parametric procedures).
- Geometrical interpretation of regression techniques (minimizing residuals)

Statistics & graphics for the laboratory

Regression

- Slope (b), SE(b) & 0.95-confidence limits (CLs) for b
- Intercept (a0), SE(a0) & 0.95-CLs for a0
- Standard error of the y-estimate (SDy/x)
- Correlation coefficient (+ P-value)
- Outlier identification (4s)
- Scatter plot with regression line, 0.95-confidence region and x = y line
- Residuals plot (normalized)
- Additional: runs test for linearity
- Residuals for linearity testing
- Relationship correlation & regression
- r is related to the regression slope(s):
- r = [byxbxy]0.5: r is the geometric mean of the two regression slopes
- byx = r SDy/SDx: i.e. r is a rescaled version of the regression slope (identity given SDy = SDx)
- r is related to SDy/x: r2~ 1 – SD2y/x/SD2ay
- Linear regression – In method comparison
- Calculation of a bias (DC)
- Bias may consist of:
- Constant part (0)
- (e.g. fixed matrix effect)
- Proportional part (ß-1)
- (e.g. calibration difference)
- DC = YC – XC = a0 + (b – 1) • XC

Runs test:

Sequences of residuals with the same sign are counted and related to critical limits (= testing of randomness of residuals)

Statistics & graphics for the laboratory

Regression

- Confidence interval of a bias or Prediction interval
- systematic difference (SE)
- Statistical significance of estimates:
- Does the slope deviate from 1? t = (b-1)/SE(b)
- Indication for proportional error
- Does the intercept deviate from 0? t = (a0 - 0)/SE(a0)
- Indication for constant error
- SDy/x (from OLR)
- Measure for random error
- Are the data pairs linearly related?
- Additional in CBstat
- Runs test or visual inspection of residuals plot
- Indication for (non)linear relationship
- Further application
- CI for SE at a critical concentration
- Statistical test for slope equal to 1
- (95% confidence limits to consider)

Simulation: CV 5%

Regression

y = 0.9443x – 0.1521

95% CLs for slope

0.9106 – 0.9780

Significantly different

from 1

Simulation: CV = 15%

Regression

y = 0.9414x – 0.1233

95% CLs for slope

0.8656 – 1.0172

NOT significantly different

from 1: high RE!

Statistics & graphics for the laboratory

Regression

- SDy/x is a measure for the random error component in method comparison, i.e. in both x and y. Thus: SDy/x is related to the expected total imprecision: SDy/x2 = SDay2 + b2 SDax2
- Given proportional analytical errors (and intercept around 0), approximately: CVy/x2 = CVay2 + CVax2 or CVy/x = 2 CVay for CVay = CVax
- If only imprecision effects play a role in the method comparison, SDy/x2 SDay2 + b2 SDax2 (to convert SDy/x into CVy/x, take value of y)
- If SDy/x2>> SDay2 + b2 SDax2
- Proof of sample-related effects(see exercises)
- Comparison of regression procedures in practice
- Conclusion
- Choose the "statistically best" regression method?
- Answer: No, look for analytical reasons of the poor comparability!
- Notice also that the 95% CLs of the slope are: 1.08 – 1.44

- All regression procedures …
- Ordinary least-squares regression (OLR)
- Deming regression (DR)
- Passing-Bablok regression (PBR)

… give nearly thesame results

Note: r = 0.993

- The regression procedures …
- OLR, DR, PBR

… give different results

Note: r = 0.871

Statistics & graphics for the laboratory

Regression

- When different regression procedures give different results …
- OLR (red): y = 0.750 x – 0.006 (r = 0.996)
- Passing Bablok (blue): y = 0.686 x + 0.022

- … look whether the data are linear!
- The residuals plot demonstrates non-linearity
- CI for SE at a critical concentration
- Therapeutic interval for drug assay: 300 – 2000 nmol/L
Delta = Ŷ-X = a0 + (b-1)X = 20.3 +(1.014 – 1)X

- X = 300 : Delta = 24.5 ; SE(Delta) = 9.5
Significance test:

texp = (Delta – 0)/SE = 2.6 ; tcrit[0.05;n –2) = 1.998 significant

- X = 2000: Delta = 48.9 ; SE(Delta) = 34.2
t = 1.4 not significant

- X = 300 : Delta = 24.5 ; SE(Delta) = 9.5
- Conclusion:
- At the lower decision point, a statistical significant difference exists, but it is judged to be clinically unimportant
- At the upper decision point, no difference of statistical significance
- The assays can be interchanged without clinical consequences

Statistics & graphics for the laboratory

Regression

- Summary
- Perform correlation analysis before [r-values >0.975 (small range) or >0.99 (wide range)]
- In case of a method comparison of methods of the same hierarchy, regression techniques, that take the error in x and y into account, should be used. We recommend Deming regression.
- Classical OLR is only applicable in case of method comparison with a reference method or in the calibration case (weighed-in concentrations).
- Regression data are the more unreliable the greater the random error and the smaller the data range are.
- Often forgotten data from regression analysis are the 95% confidence limits of slope and intercept.
- Linear regression always results in a line, even when the data are not linear. Therefore, linear regression data always should be accompanied by a graphical presentation of results (scatter plot with x = y line or residuals plot) and the indication of the number of observations. The graph should be visually inspected for adequate range, distribution of data, and linearity
- Regression analysis provides information about:
- constant systematic difference (intercept)
- proportional systematic difference (slope)
- random error (SDy/x from OLR)
- sample-related effects (SDy/x >>>SDay2 + b2SDax2)
- Regression software
- CBstat (K. Linnet): A Windows program
- (weighted) OLR
- (weighted) Deming regression
- Passing Bablok regression

- (www.cbstat.com)
- MedCalc
- (www.medcalc.com)
- EP-Evaluator (D. G. Rhoads Ass., USA)
- (www.dgrhoads.com)
- Analyse-It (Excel-plug-in)
- (www.analyse-it.com)

Statistics & graphics for the laboratory

Regression

- Calculations
- Concentration of unknown and its random error
- Limit of Detection (LoD)
- Graphical model
- Concentration (x0) and its random error (Sx0)
- NOTE: do not confuse with x at zero (0) concentration!
- Calculate x0from signal(y0)via regression equation y = bx + a x0 = (y0 – a)/b
- Sx0: approximation:
- m = number of measurements of unknown
- n = number of calibration points
- The confidence interval of x0 is:
- CI = ± t(n-2, a) • Sx0
- Calculation of LoD
- Yb = "Signal of blank" via regression = intercept a
- Sb = "Standard deviation of blank" = Sy/x
- "Signal" LoD = a + 3 Sy/x
- Calculate CLoD via regression equation.

- S = Signal
- Yb = Signal of blank
- via regression = intercept a
- Sb = Sy/x
- LoD = a + 3 Sy/x

Statistics & graphics for the laboratory

Data transformation

- Introduction of non-linearity by data transformation in method comparison and commutability studies.
- Stöckl D, Thienpont LM. Clin Chem Lab Med 2008;46:1784-5.

Statistics & graphics for the laboratory

Exercises

CorrRegr-EXCEL

- This EXCEL-file describes the advantages and disadvantages of the different EXCEL options for performing correlation and regression analysis.
- These options are:
- 1. With the fx icon
- 2. With Tools>Data Analysis
- 3. With a figure
- It also contains a worksheet with the additional regression features
- -95% confidence interval of the slope
- -95% prediction interval
- This tutorial contains interactive exercises for self-education in:
- -Correlation, and
- -Regression
- Worksheet correlation shows
- the influence of dispersion, slope, intercept, and range on r.
- Worksheet regression1 shows
- the influence of dispersion, slope, and intercept on the standard errors of slope, intercept, and Sy/x.
- Worksheet regression2 shows
- the influence of the range on the standard errors of slope, intercept, and Sy/x (this example is constructed with a constant SD over the range).
- Note
- r and r-square are given for information, only.

Correlation&Regression

Datasets (Method comparison: Weighted Deming, PractRegr1, PractRegr2)

Statistics & graphics for the laboratory

Annex

- EXCEL® requirements
- The "Data Analysis" Add-in
- In the "Worksheet Menu Bar", under
- Tools
"Data Analysis" should appear

- If it is not present,
- Click "Tools" and Add-Ins
Activate Analysis ToolPak & Analysis ToolPak - VBA

- Click "Tools" and Add-Ins
- … if not present in "Add-Ins"
- Install them from the EXCEL or Office package
- "Add-ins"

Statistics & graphics for the laboratory

Annex

Data&DataPresentation

("Figure")

- Create a figure: "Chart-wizard"
- Modify a figure with:
- "Chart-wizard"
- "Chart-menu"
- Double-click (left) on an element

- Move or size
- Left mouse click depressed:
Notice the full squares!

- Shift & left click:
Notice the empty squares!

: move with

: size with right click > Format object,

: or, direct with the "Format" menu

- Left mouse click depressed:

Statistics & graphics for the laboratory

Annex

- Make your own "templates": Activate figure
- Chart>Chart type>Custom types>User defined>Add

- IMPORTANT: Scale names and sizes are kept too!
- Layout tips for EXCEL-figures
- Not more than 8 columns (standard width 8,43)
- Not more than 22 rows (standard height 12,75)
- Font: minimum 16 (14), bold preferred
- Use thick lines
- Symbol size 6 or 7
- Click off autoscaling
- Click "Don't move or size with cells"

Statistics & graphics for the laboratory

Annex

- Windows 98 with Office 2000 experience
- Copy & paste direct if animation is intended
- Copy & paste direct, then >Copy>Delete>Edit>Paste special: Picture (Enhanced metafile: "EMF") = Easy magnification without loss of quality
- Copy & paste direct, then >Copy>Delete>Edit>Paste special: GIF, preferably keep 100% size: often looks more attractive
- Note: Often preferred to copy the cell-range where the figure is placed ("What you see is what you get": colours, layout)
- Adding text: often preferable in PowerPoint!

- Print EXCEL®-figures from PowerPoint
- Windows 98 with Office 2000 experience
- Note: Printing of ppt-Figures may pose problems.
- Check the print early if you want to make handouts!
- EMF figures ("Cells direct", then EMF) print well
- In the absence of Gaussian-type lines
- Incorporate text preferably in the .ppt slide and copy both as EMF

- [Bigger] GIF figures ("Cells direct", then GIF)
- Advantage: better print of Gaussian-type lines
- Problems: Incorporated text and scales have poor resolution, can be improved by
- Paste direct, then GIF, then add text (& axes, eventually) in ppt, then copy & paste special both as EMF
- Note: Overpaste of figure scales with .ppt text fields works only with GIF, but not with EMF.

Statistics & graphics for the laboratory

Direct

Cells direct

Cells direct & EMF

Cells direct & GIF

Annex

- Example: 5 columns, 16 rows, font 16 & 14 bold

Statistics & graphics for the laboratory

Annex

- Glossary of statistical terms
- http://linkage.rockefeller.edu/wli/glossary/stat.html
- http://www.statsoft.com/textbook/glosfra.html
- http://www.stats.gla.ac.uk/steps/glossary/index.html(most practical)
- http://davidmlane.com/hyperstat/glossary.html
- http://stat-www.berkeley.edu/~stark/SticiGui/Text/gloss.htm
- Interesting educational resources
- http://www.ruf.rice.edu/%7Elane/rvls.html
- http://www.math.uah.edu/stat/index.xml
- http://cast.massey.ac.nz/
- http://www.anu.edu.au/nceph/surfstat/surfstat-home/surfstat.html (with progress tests!)
- http://www.stat.vt.edu/~sundar/java/applets/
- http://www.kuleuven.ac.be/ucs/java/version2.0/Content.htm
- http://www.seeingstatistics.com/seeing1999/resources/opening.html (many possibilities, own data!)
- http://www.margaret.net/statistics/p02.htm
- http://bmj.bmjjournals.com/collections/statsbk/index.shtml
- http://science.widener.edu/svb/stats/stats.html
- http://www.vam.org.uk/vamstatdemo/demolist.asp
- http://www.stat.sc.edu/~west/applets/tdemo1.html (t-distribution)
- http://www.visualstatistics.net/ (t-distribution for EXCEL!)
- http://www.stat.uiowa.edu/~rlenth/Power/ (power)
- Statistical software
- General
- http://www.spss.com/sigmastat/
- http://www.sas.com/technologies/analytics/statistics/index.html
- http://www.stata.com/
- http://www.minitab.com/
- http://www.graphpad.com/(also educational!)
- "Laboratory statistics"
- http://www.medcalc.be
- http://www.cbstat.com
- http://www.analyse-it.com
- http://www.dgrhoads.com/

Statistics & graphics for the laboratory

Annex

- Books
- Biometry: The Principles and Practice of Statistics in Biological Research. Robert R. Sokal, F. James Rohlf
- Statistics and Chemometrics for Analytical Chemistry. James N. Miller, Jane C. Miller
- Clinical Investigation and Statistics in Laboratory Medicine. Richard Jones, Brian Payne
- Statistics at Square One. Ninth Edition. T D V Swinscow
- (see also: http://bmj.bmjjournals.com/collections/statsbk/index.shtml)
- •http://www.statsoft.com/textbook/stathome.html
- •http://davidmlane.com/hyperstat/
- •http://faculty.vassar.edu/lowry/webtext.html
- •http://www.tufts.edu/~gdallal/LHSP.HTM
- Books (PDF) from the net
- Analyzing Data withGraphPad Prism. A companion to GraphPad Prism version 3 (graphpad.com).
- The InStatGuide toChoosingandInterpretingStatisticalTests. A manual for GraphPad InStat Version 3 (graphpad.com).
- NIST/SEMATECH e-Handbook of Statistical Methods (http://www.itl.nist.gov/div898/handbook/)

Statistics & graphics for the laboratory

Statistical tables

- Factors for control limits
- of range rules

Statistics & graphics for the laboratory

Cochran C – Critical values

Statistics & graphics for the laboratory

Annex

- Publications
- Stöckl D. Beyond the myths of difference plots [letter]. Ann Clin Biochem 1996;33:575-7.
- Stöckl D. Difference versus mean plots [reply]. Ann Clin Biochem 1997;34:571.
- Hyltoft Petersen P, Stöckl D, Blaabjerg O, Pedersen B, Birkemose E, Thienpont L, Flensted Lassen J, Kjeldsen J. Graphical interpretation of analytical data from comparison of a field method with a reference method by use of difference plots [opinion]. Clin Chem 1997;43:2039-46.
- Stöckl D, Dewitte K, Thienpont LM. Validity of linear regression in method comparison studies: is it limited by the statistical model or the quality of the analytical input data? Clin Chem 1998;44:2340-6.
- Stöckl D, Dewitte K, Fierens C, Thienpont LM. Evaluating clinical accuracy of systems for self-monitoring of blood glucose by error grid analysis. Comment on constructing the “upper A-line”. Diabetes Care 2000;11:1711-2.
- Dewitte K, Fierens C, Stöckl D, Thienpont LM. Application of the Bland-Altman plot for interpretation of method-comparison studies: a critical investigation of its practice. Clin Chem 2002;48:799-801;discussion 801-2.
- Cabaleiro DR, Stöckl D, Thienpont LM. Error messages when calculating chi-square statistics with microsoft EXCEL. Clin Chem Lab Med 2004;42:243.
- Stöckl D, Rodríguez Cabaleiro D, Van Uytfanghe K, Thienpont LM.Interpreting method comparison studies by use of the bland-altman plot:reflecting the importance of sample size by incorporating confidence limits andpredefined error limits in the graphic.Clin Chem 2004;50:2216-8.
- Stöckl D, Rodríguez Cabaleiro D, Thienpont LM. Peculiarities and problems with the EXCEL F-test. Clin Chem Lab Med 2004:42:273.
- Courses
- Analytical quality in the medical laboratory: Concepts for method selection, evaluation, and control. In cooperation with Belgian Association of Laboratory Technologists and Hogeschool Gent (Gent, Belgium, 1998).
- Practice-oriented strategies for the development and evaluation of analytical methods. FOCUS: Graphical and statistical techniques for the interpretation of method comparison studies (academical year 2000/1). In cooperation with Prof. LM Thienpont (University of Ghent).
- Graphical and statistical techniques for the interpretation of method comparison studies. 14th IFCC European Congress of Clinical Chemistry and Laboratory Medicine - Euromedlab 2001 (Prague, Czech Republic).
- Graphical techniques for the intepretation of method comparison studies. Education days for Clinical Biochemists: Method validation. Odense, Denmark: 17-20 December 2001.
- Educational course on biostatistics. 18th International Congress of Clinical Chemistry and Laboratory Medicine IFCC Worldlab 2002 (Kyoto, Japan).
- Statistical and graphical techniques for the intepretation of method comparison studies. 15th IFCC European Congress of Clinical Chemistry and Laboratory Medicine - Euromedlab 2003 (Barcelona, Spain).
- Statistical and graphical tools for the medical laboratory – A problem oriented journey from test utility to internal quality control. 2003 (Bratislava, Slovakia).
- Statistical and graphical tools for the laboratory – from test utility to IQC. Full-day Workshop, AACC 2004.

Statistics & graphics for the laboratory