Practical applications of statistical methods in the clinical laboratory
Download
1 / 217

Practical Applications of Statistical Methods in the Clinical Laboratory - PowerPoint PPT Presentation


  • 233 Views
  • Uploaded on

Practical Applications of Statistical Methods in the Clinical Laboratory. Roger L. Bertholf, Ph.D., DABCC Associate Professor of Pathology Director of Clinical Chemistry & Toxicology UF Health Science Center/Jacksonville.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Practical Applications of Statistical Methods in the Clinical Laboratory' - JasminFlorian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Practical applications of statistical methods in the clinical laboratory l.jpg

Practical Applications of Statistical Methods in the Clinical Laboratory

Roger L. Bertholf, Ph.D., DABCC

Associate Professor of Pathology

Director of Clinical Chemistry & Toxicology

UF Health Science Center/Jacksonville


Sir francis galton 1822 1911 l.jpg

“[Statistics are] the only tools by which an opening can be cut through the formidable thicket ofdifficulties that bars the path of those who pursue the Science of Man.”

[Sir] Francis Galton (1822-1911)


There are three kinds of lies lies damned lies and statistics l.jpg

“There are three kinds of lies: Lies, damned lies, and statistics”

Benjamin Disraeli (1804-1881)


What are statistics and what are they used for l.jpg
What are statistics, and what are they used for? statistics”

  • Descriptive statistics are used to characterize data

  • Statistical analysis is used to distinguish between random and meaningful variations

  • In the laboratory, we use statistics to monitor and verify method performance, and interpret the results of clinical laboratory tests


Do not worry about your difficulties in mathematics i assure you that mine are greater l.jpg

“Do not worry about your difficulties in mathematics, I assure you that mine are greater”

Albert Einstein (1879-1955)


I don t believe in mathematics l.jpg

“I don't believe in mathematics” assure you that mine are greater”

Albert Einstein


Summation function l.jpg
Summation function assure you that mine are greater”


Product function l.jpg
Product function assure you that mine are greater”


The mean average l.jpg
The Mean (average) assure you that mine are greater”

The mean is a measure of the centrality of a set of data.


Mean arithmetical l.jpg
Mean (arithmetical) assure you that mine are greater”


Mean geometric l.jpg
Mean (geometric) assure you that mine are greater”


Use of the geometric mean l.jpg
Use of the Geometric mean: assure you that mine are greater”

The geometric mean is primarily used to average ratios or rates of change.


Mean harmonic l.jpg
Mean (harmonic) assure you that mine are greater”


Example of the use of harmonic mean l.jpg
Example of the use of Harmonic mean: assure you that mine are greater”

Suppose you spend $6 on pills costing 30 cents per dozen, and $6 on pills costing 20 cents per dozen. What was the average price of the pills you bought?


Example of the use of harmonic mean15 l.jpg
Example of the use of Harmonic mean: assure you that mine are greater”

You spent $12 on 50 dozen pills, so the average cost is 12/50=0.24, or 24 cents.

This also happens to be the harmonic mean of 20 and 30:


Root mean square rms l.jpg
Root mean square (RMS) assure you that mine are greater”


Slide17 l.jpg

For the data set: assure you that mine are greater”

1, 2, 3, 4, 5, 6, 7, 8, 9, 10:


The weighted mean l.jpg
The Weighted Mean assure you that mine are greater”


Other measures of centrality l.jpg
Other measures of centrality assure you that mine are greater”

  • Mode


The mode l.jpg
The Mode assure you that mine are greater”

The mode is the value that occurs most often


Other measures of centrality21 l.jpg
Other measures of centrality assure you that mine are greater”

  • Mode

  • Midrange


The midrange l.jpg
The Midrange assure you that mine are greater”

The midrange is the mean of the highest and lowest values


Other measures of centrality23 l.jpg
Other measures of centrality assure you that mine are greater”

  • Mode

  • Midrange

  • Median


The median l.jpg
The Median assure you that mine are greater”

The median is the value for which half of the remaining values are above and half are below it. I.e., in an ordered array of 15 values, the 8th value is the median. If the array has 16 values, the median is the mean of the 8th and 9th values.


Example of the use of median vs mean l.jpg
Example of the use of median vs. mean: assure you that mine are greater”

Suppose you’re thinking about building a house in a certain neighborhood, and the real estate agent tells you that the average (mean) size house in that area is 2,500 sq. ft. Astutely, you ask “What’s the median size?” The agent replies “1,800 sq. ft.”

What does this tell you about the sizes of the houses in the neighborhood?


Measuring variance l.jpg
Measuring variance assure you that mine are greater”

Two sets of data may have similar means, but otherwise be very dissimilar. For example, males and females have similar baseline LH concentrations, but there is much wider variation in females.

How do we express quantitatively the amount of variation in a data set?


The variance l.jpg
The Variance assure you that mine are greater”


The variance29 l.jpg
The Variance assure you that mine are greater”

The variance is the mean of the squared differences between individual data points and the mean of the array.

Or, after simplifying, the mean of the squares minus the squared mean.


The variance30 l.jpg
The Variance assure you that mine are greater”


The variance31 l.jpg
The Variance assure you that mine are greater”

In what units is the variance?

Is that a problem?


The standard deviation l.jpg
The Standard Deviation assure you that mine are greater”


The standard deviation33 l.jpg
The Standard Deviation assure you that mine are greater”

The standard deviation is the square root of the variance. Standard deviation is not the mean difference between individual data points and the mean of the array.


The standard deviation34 l.jpg
The Standard Deviation assure you that mine are greater”

In what units is the standard deviation?

Is that a problem?


The coefficient of variation l.jpg
The Coefficient of Variation assure you that mine are greater”*

*Sometimes called the Relative Standard Deviation (RSD or %RSD)


Standard deviation or error of the mean l.jpg
Standard Deviation (or Error) of the Mean assure you that mine are greater”

The standard deviation of an average decreases by the reciprocal of the square root of the number of data points used to calculate the average.


Exercises l.jpg
Exercises assure you that mine are greater”

How many measurements must we average to improve our precision by a factor of 2?


Answer l.jpg
Answer assure you that mine are greater”

To improve precision by a factor of 2:


Exercises39 l.jpg
Exercises assure you that mine are greater”

  • How many measurements must we average to improve our precision by a factor of 2?

  • How many to improve our precision by a factor of 10?


Answer40 l.jpg
Answer assure you that mine are greater”

To improve precision by a factor of 10:


Exercises41 l.jpg
Exercises assure you that mine are greater”

  • How many measurements must we average to improve our precision by a factor of 2?

  • How many to improve our precision by a factor of 10?

  • If an assay has a CV of 7%, and we decide run samples in duplicate and average the measurements, what should the resulting CV be?


Answer42 l.jpg
Answer assure you that mine are greater”

Improvement in CV by running duplicates:


Population vs sample standard deviation l.jpg
Population vs. Sample standard deviation assure you that mine are greater”

  • When we speak of a population, we’re referring to the entire data set, which will have a mean :


Population vs sample standard deviation44 l.jpg
Population vs. Sample standard deviation assure you that mine are greater”

  • When we speak of a population, we’re referring to the entire data set, which will have a mean 

  • When we speak of a sample, we’re referring to a subset of the population, customarily designated “x-bar”

  • Which is used to calculate the standard deviation?


Sir i have found you an argument i am not obliged to find you an understanding l.jpg

“Sir, I have found you an argument. I am not obliged to find you an understanding.”

Samuel Johnson (1709-1784)


Population vs sample standard deviation46 l.jpg
Population vs. Sample standard deviation find you an understanding.”


Distributions l.jpg
Distributions find you an understanding.”

  • Definition


Statistical probability distribution l.jpg
Statistical (probability) Distribution find you an understanding.”

  • A statistical distribution is a mathematically-derived probability function that can be used to predict the characteristics of certain applicable real populations

  • Statistical methods based on probability distributions are parametric, since certain assumptions are made about the data


Distributions49 l.jpg
Distributions find you an understanding.”

  • Definition

  • Examples


Binomial distribution l.jpg
Binomial distribution find you an understanding.”

The binomial distribution applies to events that have two possible outcomes. The probability of r successes in n attempts, when the probability of success in any individual attempt is p, is given by:


Example l.jpg
Example find you an understanding.”

What is the probability that 10 of the 12 babies born one busy evening in your hospital will be girls?


Solution l.jpg
Solution find you an understanding.”


Distributions53 l.jpg
Distributions find you an understanding.”

  • Definition

  • Examples

    • Binomial


God does arithmetic l.jpg

“God does arithmetic” find you an understanding.”

Karl Friedrich Gauss (1777-1855)


The gaussian distribution l.jpg
The Gaussian Distribution find you an understanding.”

What is the Gaussian distribution?


Slide56 l.jpg

63 find you an understanding.”

81

36

12

28

7

79

52

96

17

22

4

61

85

etc.


Slide58 l.jpg

63 find you an understanding.”

81

36

12

28

7

79

52

96

17

22

4

61

85

22

73

54

33

99

5

61

28

58

24

16

77

43

8

85

152

90

45

127

12

140

70

154

41

38

81

104

93

+

=


Slide60 l.jpg

. . . find you an understanding.”etc.


Slide61 l.jpg

Probability find you an understanding.”

x


The gaussian probability function l.jpg
The Gaussian Probability Function find you an understanding.”

The probability of x in a Gaussian distribution with mean and standard deviation  is given by:


The gaussian distribution63 l.jpg
The Gaussian Distribution find you an understanding.”

  • What is the Gaussian distribution?

  • What types of data fit a Gaussian distribution?


Alan lindsay mackay 1926 l.jpg

“Like the ski resort full of girls hunting for husbands and husbands hunting for girls, the situation isnot as symmetrical as it might seem.”

Alan Lindsay Mackay (1926- )


Are these gaussian l.jpg
Are these Gaussian? and husbands hunting for girls, the situation is

  • Human height

  • Outside temperature

  • Raindrop size

  • Blood glucose concentration

  • Serum CK activity

  • QC results

  • Proficiency results


The gaussian distribution66 l.jpg
The Gaussian Distribution and husbands hunting for girls, the situation is

  • What is the Gaussian distribution?

  • What types of data fit a Gaussian distribution?

  • What is the advantage of using a Gaussian distribution?


Gaussian probability distribution l.jpg
Gaussian probability distribution and husbands hunting for girls, the situation is

Probability

.67

.95

µ-3

µ-2

µ-

µ

µ+

µ+2

µ+3


What are the odds of an observation l.jpg
What are the odds of an observation . . . and husbands hunting for girls, the situation is

  • more than 1 from the mean (+/-)

  • more than 2 greater than the mean

  • more than 3  from the mean


Some useful gaussian probabilities l.jpg
Some useful Gaussian probabilities and husbands hunting for girls, the situation is

Range

Probability

Odds

+/- 1.00 

68.3%

1 in 3

+/- 1.64 

90.0%

1 in 10

+/- 1.96 

95.0%

1 in 20

+/- 2.58 

99.0%

1 in 100


Example70 l.jpg
Example and husbands hunting for girls, the situation is

That

This


Gabriel lippman 1845 1921 l.jpg

[On the Gaussian curve] and husbands hunting for girls, the situation is“Experimentalists think that it is a mathematical theorem while the mathematicians believe it to bean experimental fact.”

Gabriel Lippman (1845-1921)


Distributions72 l.jpg
Distributions and husbands hunting for girls, the situation is

  • Definition

  • Examples

    • Binomial

    • Gaussian


Life is good for only two things discovering mathematics and teaching mathematics l.jpg

"Life is good for only two things, discovering mathematics and teaching mathematics"

Siméon Poisson (1781-1840)


The poisson distribution l.jpg
The Poisson Distribution and teaching mathematics"

The Poisson distribution predicts the frequency of r events occurring randomly in time, when the expected frequency is 


Examples of events described by a poisson distribution l.jpg
Examples of events described by a Poisson distribution and teaching mathematics"

?

  • Lightning

  • Accidents

  • Laboratory?



Using the poisson distribution l.jpg
Using the Poisson distribution and teaching mathematics"

How many counts must be collected in an RIA in order to ensure an analytical CV of 5% or less?


Answer78 l.jpg
Answer and teaching mathematics"


Distributions79 l.jpg
Distributions and teaching mathematics"

  • Definition

  • Examples

    • Binomial

    • Gaussian

    • Poisson


The student s t distribution l.jpg
The Student’s t Distribution and teaching mathematics"

When a small sample is selected from a large population, we sometimes have to make certain assumptions in order to apply statistical methods


Questions about our sample l.jpg
Questions about our sample and teaching mathematics"

  • Is the mean of our sample, x bar, the same as the mean of the population, ?

  • Is the standard deviation of our sample, s, the same as the standard deviation for the population, ?

  • Unless we can answer both of these questions affirmatively, we don’t know whether our sample has the same distribution as the population from which it was drawn.


Slide82 l.jpg

Recall that the Gaussian distribution is defined by the probability function:

Note that the exponential factor contains both and , both population parameters. The factor is often simplified by making the substitution:


Slide83 l.jpg

The variable probability function:z in the equation:

is distributed according to a unit gaussian, since it has a mean of zero and a standard deviation of 1


Gaussian probability distribution84 l.jpg
Gaussian probability distribution probability function:

Probability

.67

.95

-3

-2

-1

0

1

2

3

z


Slide85 l.jpg

But if we use the sample mean and standard deviation instead, we get:

and we’ve defined a new quantity, t, which is not distributed according to the unit Gaussian. It is distributed according to the Student’s t distribution.


Important features of the student s t distribution l.jpg
Important features of the Student’s t distribution instead, we get:

  • Use of the t statistic assumes that the parent distribution is Gaussian

  • The degree to which the t distribution approximates a gaussian distribution depends on N (the degrees of freedom)

  • As N gets larger (above 30 or so), the differences between t and z become negligible


Application of student s t distribution to a sample mean l.jpg
Application of Student’s t distribution to a sample mean instead, we get:

The Student’s t statistic can also be used to analyze differences between the sample mean and the population mean:


Comparison of student s t and gaussian distributions l.jpg
Comparison of Student’s t and Gaussian distributions instead, we get:

Note that, for a sufficiently large N (>30), t can be replaced with z, and a Gaussian distribution can be assumed


Exercise l.jpg
Exercise instead, we get:

The mean age of the 20 participants in one workshop is 27 years, with a standard deviation of 4 years. Next door, another workshop has 16 participants with a mean age of 29 years and standard deviation of 6 years.

Is the second workshop attracting older technologists?


Preliminary analysis l.jpg
Preliminary analysis instead, we get:

  • Is the population Gaussian?

  • Can we use a Gaussian distribution for our sample?

  • What statistic should we calculate?


Solution91 l.jpg
Solution instead, we get:

First, calculate the t statistic for the two means:


Solution cont l.jpg
Solution, cont. instead, we get:

Next, determine the degrees of freedom:


Statistical tables l.jpg
Statistical Tables instead, we get:


Conclusion l.jpg
Conclusion instead, we get:

Since 1.16 is less than 1.64 (the t value corresponding to 90% confidence limit), the difference between the mean ages for the participants in the two workshops is not significant


The paired t test l.jpg
The Paired t Test instead, we get:

Suppose we are comparing two sets of data in which each value in one set has a corresponding value in the other. Instead of calculating the difference between the means of the two sets, we can calculate the mean difference between data pairs.


Slide96 l.jpg

Instead of: instead, we get:

we use:

to calculate t:


Advantage of the paired t l.jpg
Advantage of the Paired t instead, we get:

If the type of data permit paired analysis, the paired t test is much more sensitive than the unpaired t.

Why?


Applications of the paired t l.jpg
Applications of the Paired t instead, we get:

  • Method correlation

  • Comparison of therapies


Distributions99 l.jpg
Distributions instead, we get:

  • Definition

  • Examples

    • Binomial

    • Gaussian

    • Poisson

    • Student’s t


The 2 chi square distribution l.jpg
The instead, we get:2(Chi-square) Distribution

There is a general formula that relates actual measurements to their predicted values


The 2 chi square distribution101 l.jpg
The instead, we get:2(Chi-square) Distribution

A special (and very useful) application of the 2 distribution is to frequency data


Exercise102 l.jpg
Exercise instead, we get:

In your hospital, you have had 83 cases of iatrogenic strep infection in your last 725 patients. St. Elsewhere, across town, reports 35 cases of strep in their last 416 patients.

Do you need to review your infection control policies?


Analysis l.jpg
Analysis instead, we get:

If your infection control policy is roughly as effective as St. Elsewhere’s, we would expect that the rates of strep infection for the two hospitals would be similar. The expected frequency, then would be the average


Calculating 2 l.jpg
Calculating instead, we get:2

First, calculate the expected frequencies at your hospital (f1) and St. Elsewhere (f2)


Calculating 2105 l.jpg
Calculating instead, we get:2

Next, we sum the squared differences between actual and expected frequencies


Degrees of freedom l.jpg
Degrees of freedom instead, we get:

In general, when comparing k sample proportions, the degrees of freedom for 2 analysis are k - 1. Hence, for our problem, there is 1 degree of freedom.


Conclusion107 l.jpg
Conclusion instead, we get:

A table of 2 values lists 3.841 as the 2 corresponding to a probability of 0.05.

So the variation (2between strep infection rates at the two hospitals is within statistically-predicted limits, and therefore is not significant.


Distributions108 l.jpg
Distributions instead, we get:

  • Definition

  • Examples

    • Binomial

    • Gaussian

    • Poisson

    • Student’s t

    • 2


The f distribution l.jpg
The instead, we get:F distribution

  • The F distribution predicts the expected differences between the variances of two samples

  • This distribution has also been called Snedecor’s F distribution, Fisher distribution, and variance ratio distribution


The f distribution110 l.jpg
The instead, we get:F distribution

The F statistic is simply the ratio of two variances

(by convention, the larger V is the numerator)


Applications of the f distribution l.jpg
Applications of the instead, we get:F distribution

There are several ways the F distribution can be used. Applications of the F statistic are part of a more general type of statistical analysis called analysis of variance (ANOVA). We’ll see more about ANOVA later.


Example112 l.jpg
Example instead, we get:

You’re asked to do a “quick and dirty” correlation between three whole blood glucose analyzers. You prick your finger and measure your blood glucose four times on each of the analyzers.

Are the results equivalent?


Slide113 l.jpg
Data instead, we get:


Analysis114 l.jpg
Analysis instead, we get:

The mean glucose concentrations for the three analyzers are 70, 85, and 76.

If the three analyzers are equivalent, then we can assume that all of the results are drawn from a overall population with mean  and variance 2.


Analysis cont l.jpg
Analysis, cont. instead, we get:

Approximate  by calculating the mean of the means:


Analysis cont116 l.jpg
Analysis, cont. instead, we get:

Calculate the variance of the means:


Analysis cont117 l.jpg
Analysis, cont. instead, we get:

But what we really want is the variance of the population. Recall that:


Analysis cont118 l.jpg
Analysis, cont. instead, we get:

Since we just calculated

we can solve for 


Analysis cont119 l.jpg
Analysis, cont. instead, we get:

So we now have an estimate of the population variance, which we’d like to compare to the real variance to see whether they differ. But what is the real variance?

We don’t know, but we can calculate the variance based on our individual measurements.


Analysis cont120 l.jpg
Analysis, cont. instead, we get:

If all the data were drawn from a larger population, we can assume that the variances are the same, and we can simply average the variances for the three data sets.


Analysis cont121 l.jpg
Analysis, cont. instead, we get:

Now calculate the F statistic:


Conclusion122 l.jpg
Conclusion instead, we get:

A table of F values indicates that 4.26 is the limit for the F statistic at a 95% confidence level (when the appropriate degrees of freedom are selected). Our value of 10.6 exceeds that, so we conclude that there is significant variation between the analyzers.


Distributions123 l.jpg
Distributions instead, we get:

  • Definition

  • Examples

    • Binomial

    • Gaussian

    • Poisson

    • Student’s t

    • 2

    • F


Unknown or irregular distribution l.jpg
Unknown or irregular distribution instead, we get:

  • Transform


Log transform l.jpg
Log transform instead, we get:

Probability

Probability

log x

x


Unknown or irregular distribution126 l.jpg
Unknown or irregular distribution instead, we get:

  • Transform

  • Non-parametric methods


Non parametric methods l.jpg
Non-parametric methods instead, we get:

  • Non-parametric methods make no assumptions about the distribution of the data

  • There are non-parametric methods for characterizing data, as well as for comparing data sets

  • These methods are also called distribution-free, robust, or sometimes non-metric tests


Application to reference ranges l.jpg
Application to Reference Ranges instead, we get:

The concentrations of most clinical analytes are not usually distributed in a Gaussian manner. Why?

How do we determine the reference range (limits of expected values) for these analytes?


Application to reference ranges129 l.jpg
Application to Reference Ranges instead, we get:

  • Reference ranges for normal, healthy populations are customarily defined as the “central 95%”.

  • An entirely non-parametric way of expressing this is to eliminate the upper and lower 2.5% of data, and use the remaining upper and lower values to define the range.

  • NCCLS recommends 120 values, dropping the two highest and two lowest.


Application to reference ranges130 l.jpg
Application to Reference Ranges instead, we get:

What happens when we want to compare one reference range with another? This is precisely what CLIA ‘88 requires us to do.

How do we do this?



Solution 1 simple comparison l.jpg
Solution #1: Simple comparison simpler.”

Suppose we just do a small internal reference range study, and compare our results to the manufacturer’s range.

How do we compare them?

Is this a valid approach?


Nccls recommendations l.jpg
NCCLS recommendations simpler.”

  • Inspection Method: Verify reference populations are equivalent

  • Limited Validation: Collect 20 reference specimens

    • No more than 2 exceed range

    • Repeat if failed

  • Extended Validation: Collect 60 reference specimens; compare ranges.


Solution 2 mann whitney l.jpg
Solution #2: Mann-Whitney simpler.”*

Rank normal values (x1,x2,x3...xn) and the reference population (y1,y2,y3...yn):

x1, y1, x2, x3,y2, y3 ... xn, yn

Count the number of y values that follow each x, and call the sum Ux. Calculate Uy also.

*Also called the U test, rank sum test, or Wilcoxen’s test.


Mann whitney cont l.jpg
Mann-Whitney, cont. simpler.”

It should be obvious that: Ux + Uy = NxNy

If the two distributions are the same, then:

Ux = Uy = 1/2NxNy

Large differences between Ux and Uy indicate that the distributions are not equivalent


Obvious is the most dangerous word in mathematics l.jpg

“‘Obvious’ is the most dangerous word in mathematics.”

Eric Temple Bell (1883-1960)


Solution 3 run test l.jpg
Solution #3: Run test mathematics.”

In the run test, order the values in the two distributions as before:

x1, y1, x2, x3, y2, y3 ... xn, yn

Add up the number of runs (consecutive values from the same distribution). If the two data sets are randomly selected from one population, there will be few runs.


Solution 4 the monte carlo method l.jpg
Solution #4: The Monte Carlo method mathematics.”

Sometimes, when we don’t know anything about a distribution, the best thing to do is independently test its characteristics.


The monte carlo method l.jpg
The Monte Carlo method mathematics.”

y

x


The monte carlo method140 l.jpg

mean, SD mathematics.”

N

mean, SD

N

mean, SD

N

mean, SD

N

The Monte Carlo method

Reference population


The monte carlo method141 l.jpg
The Monte Carlo method mathematics.”

With the Monte Carlo method, we have simulated the test we wish to apply--that is, we have randomly selected samples from the parent distribution, and determined whether our in-house data are in agreement with the randomly-selected samples.


Analysis of paired data l.jpg
Analysis of paired data mathematics.”

  • For certain types of laboratory studies, the data we gather is paired

  • We typically want to know how closely the paired data agree

  • We need quantitative measures of the extent to which the data agree or disagree

  • Examples?


Examples of paired data l.jpg
Examples of paired data mathematics.”

  • Method correlation data

  • Pharmacodynamic effects

  • Risk analysis

  • Pathophysiology


Correlation l.jpg
Correlation mathematics.”

50

45

40

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Linear regression least squares l.jpg
Linear regression (least squares) mathematics.”

Linear regression analysis generates an equation for a straight line

y = mx + b

where m is the slope of the line and b is the value of y when x = 0 (the y-intercept).

The calculated equation minimizes the differences between actual y values and the linear regression line.


Correlation146 l.jpg

y mathematics.” = 1.031x - 0.024

Correlation

50

45

40

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Covariance l.jpg
Covariance mathematics.”

Do x and y values vary in concert, or randomly?


Slide148 l.jpg

  • What if mathematics.” yincreases when x increases?

  • What if ydecreases when x increases?

  • What if y and x vary independently?


Covariance149 l.jpg
Covariance mathematics.”

It is clear that the greater the covariance, the stronger the relationship between x and y.

But . . . what about units?

e.g., if you measure glucose in mg/dL, and I measure it in mmol/L, who’s likely to have the highest covariance?



The correlation coefficient151 l.jpg
The Correlation Coefficient mathematics.”

  • The correlation coefficient is a unitless quantity that roughly indicates the degree to which x and y vary in the same direction.

  •  is useful for detecting relationships between parameters, but it is not a very sensitive measure of the spread.


Correlation152 l.jpg
Correlation mathematics.”

50

45

40

y = 1.031x - 0.024

 = 0.9986

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Correlation153 l.jpg
Correlation mathematics.”

50

45

40

y = 1.031x - 0.024

 = 0.9894

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Standard error of the estimate l.jpg
Standard Error of the Estimate mathematics.”

The linear regression equation gives us a way to calculate an “estimated” y for any given x value, given the symbol ŷ (y-hat):


Standard error of the estimate155 l.jpg
Standard Error of the Estimate mathematics.”

Now what we are interested in is the average difference between the measured y and its estimate, ŷ :


Correlation156 l.jpg
Correlation mathematics.”

50

45

40

y = 1.031x - 0.024

 = 0.9986

sy/x=1.83

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Correlation157 l.jpg
Correlation mathematics.”

50

45

40

y = 1.031x - 0.024

 = 0.9894

sy/x = 5.32

35

30

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

45

50


Standard error of the estimate158 l.jpg
Standard Error of the Estimate mathematics.”

If we assume that the errors in the y measurements are Gaussian (is that a safe assumption?), then the standard error of the estimate gives us the boundaries within which 67% of the y values will fall.

2sy/x defines the 95% boundaries..


Limitations of linear regression l.jpg
Limitations of linear regression mathematics.”

  • Assumes no error in x measurement

  • Assumes that variance in y is constant throughout concentration range


Alternative approaches l.jpg
Alternative approaches mathematics.”

  • Weighted linear regression analysis can compensate for non-constant variance among y measurements

  • Deming regression analysis takes into account variance in the x measurements

  • Weighted Deming regression analysis allows for both


Evaluating method performance l.jpg
Evaluating method performance mathematics.”

  • Precision


Method precision l.jpg
Method Precision mathematics.”

  • Within-run: 10 or 20 replicates

    • What types of errors does within-run precision reflect?

  • Day-to-day: NCCLS recommends evaluation over 20 days

    • What types of errors does day-to-day precision reflect?


Evaluating method performance163 l.jpg
Evaluating method performance mathematics.”

  • Precision

  • Sensitivity


Method sensitivity l.jpg
Method Sensitivity mathematics.”

  • The analytical sensitivity of a method refers to the lowest concentration of analyte that can be reliably detected.

  • The most common definition of sensitivity is the analyte concentration that will result in a signal two or three standard deviations above background.


Slide165 l.jpg

Signal/Noise threshold mathematics.”

Signal

time


Other measures of sensitivity l.jpg
Other measures of sensitivity mathematics.”

  • Limit of Detection (LOD) is sometimes defined as the concentration producing an S/N > 3.

    • In drug testing, LOD is customarily defined as the lowest concentration that meets all identification criteria.

  • Limit of Quantitation (LOQ) is sometimes defined as the concentration producing an S/N >5.

    • In drug testing, LOQ is customarily defined as the lowest concentration that can be measured within ±20%.


Question l.jpg
Question mathematics.”

At an S/N ratio of 5, what is the minimum CV of the measurement?

If the S/N is 5, 20% of the measured signal is noise, which is random. Therefore, the CV must be at least 20%.


Evaluating method performance168 l.jpg
Evaluating method performance mathematics.”

  • Precision

  • Sensitivity

  • Linearity


Method linearity l.jpg
Method Linearity mathematics.”

  • A linear relationship between concentration and signal is not absolutely necessary, but it is highly desirable. Why?

  • CLIA ‘88 requires that the linearity of analytical methods is verified on a periodic basis.


Ways to evaluate linearity l.jpg
Ways to evaluate linearity mathematics.”

  • Visual/linear regression


Slide171 l.jpg

Signal mathematics.”

Concentration


Outliers l.jpg
Outliers mathematics.”

We can eliminate any point that differs from the next highest value by more than 0.765 (p=0.05) times the spread between the highest and lowest values (Dixon test).

Example: 4, 5, 6, 13

(13 - 4) x 0.765 = 6.89


Limitation of linear regression method l.jpg
Limitation of linear regression method mathematics.”

If the analytical method has a high variance (CV), it is likely that small deviations from linearity will not be detected due to the high standard error of the estimate


Slide174 l.jpg

Signal mathematics.”

Concentration


Ways to evaluate linearity175 l.jpg
Ways to evaluate linearity mathematics.”

  • Visual/linear regression

  • Quadratic regression


Quadratic regression l.jpg
Quadratic regression mathematics.”

Recall that, for linear data, the relationship between x and y can be expressed as

y = f(x) = a + bx


Quadratic regression177 l.jpg
Quadratic regression mathematics.”

A curve is described by the quadratic equation:

y = f(x) = a + bx + cx2

which is identical to the linear equation except for the addition of the cx2 term.


Quadratic regression178 l.jpg
Quadratic regression mathematics.”

It should be clear that the smaller the x2 coefficient, c, the closer the data are to linear (since the equation reduces to the linear form when c approaches 0).

What is the drawback to this approach?


Ways to evaluate linearity179 l.jpg
Ways to evaluate linearity mathematics.”

  • Visual/linear regression

  • Quadratic regression

  • Lack-of-fit analysis


Lack of fit analysis l.jpg
Lack-of-fit analysis mathematics.”

  • There are two components of the variation from the regression line

    • Intrinsic variability of the method

    • Variability due to deviations from linearity

  • The problem is to distinguish between these two sources of variability

  • What statistical test do you think is appropriate?


Slide181 l.jpg

Signal mathematics.”

Concentration


Lack of fit analysis182 l.jpg
Lack-of-fit analysis mathematics.”

The ANOVA technique requires that method variance is constant at all concentrations. Cochran’s test is used to test whether this is the case.


Lack of fit method calculations l.jpg
Lack-of-fit method calculations mathematics.”

  • Total sum of the squares: the variance calculated from all of the y values

  • Linear regression sum of the squares: the variance of y values from the regression line

  • Residual sum of the squares: difference between TSS and LSS

  • Lack of fit sum of the squares: the RSS minus the pure error (sum of variances)


Lack of fit analysis184 l.jpg
Lack-of-fit analysis mathematics.”

  • The LOF is compared to the pure error to give the “G” statistic (which is actually F)

  • If the LOF is small compared to the pure error, G is small and the method is linear

  • If the LOF is large compared to the pure error, G will be large, indicating significant deviation from linearity


Significance limits for g l.jpg
Significance limits for mathematics.” G

  • 90% confidence = 2.49

  • 95% confidence = 3.29

  • 99% confidence = 5.42


If your experiment needs statistics you ought to have done a better experiment l.jpg

“If your experiment needs statistics, you ought to have done a better experiment.”

Ernest Rutherford (1871-1937)


Evaluating clinical performance of laboratory tests l.jpg
Evaluating Clinical Performance of laboratory tests done a better experiment.”

  • The clinical performance of a laboratory test defines how well it predicts disease

  • The sensitivity of a test indicates the likelihood that it will be positive when disease is present


Clinical sensitivity l.jpg
Clinical Sensitivity done a better experiment.”

If TP as the number of “true positives”, and FN is the number of “false negatives”, the sensitivity is defined as:


Example189 l.jpg
Example done a better experiment.”

Of 25 admitted cocaine abusers, 23 tested positive for urinary benzoylecgonine and 2 tested negative. What is the sensitivity of the urine screen?


Evaluating clinical performance of laboratory tests190 l.jpg
Evaluating Clinical Performance of laboratory tests done a better experiment.”

  • The clinical performance of a laboratory test defines how well it predicts disease

  • The sensitivity of a test indicates the likelihood that it will be positive when disease is present

  • The specificity of a test indicates the likelihood that it will be negative when disease is absent


Clinical specificity l.jpg
Clinical Specificity done a better experiment.”

If TN is the number of “true negative” results, and FP is the number of falsely positive results, the specificity is defined as:


Example192 l.jpg
Example done a better experiment.”

What would you guess is the specificity of any particular clinical laboratory test? (Choose any one you want)


Answer193 l.jpg
Answer done a better experiment.”

Since reference ranges are customarily set to include the central 95% of values in healthy subjects, we expect 5% of values from healthy people to be “abnormal”--this is the false positive rate.

Hence, the specificity of most clinical tests is no better than 95%.


Sensitivity vs specificity l.jpg
Sensitivity vs. Specificity done a better experiment.”

  • Sensitivity and specificity are inversely related.


Slide195 l.jpg

Marker concentration done a better experiment.”

-

+

Disease


Sensitivity vs specificity196 l.jpg
Sensitivity vs. Specificity done a better experiment.”

  • Sensitivity and specificity are inversely related.

  • How do we determine the best compromise between sensitivity and specificity?


Receiver operating characteristic l.jpg

True positive rate done a better experiment.”

(sensitivity)

False positive rate

1-specificity

Receiver Operating Characteristic


Evaluating clinical performance of laboratory tests198 l.jpg
Evaluating Clinical Performance of laboratory tests done a better experiment.”

  • The sensitivity of a test indicates the likelihood that it will be positive when disease is present

  • The specificity of a test indicates the likelihood that it will be negative when disease is absent

  • The predictive value of a test indicates the probability that the test result correctly classifies a patient


Predictive value l.jpg
Predictive Value done a better experiment.”

The predictive value of a clinical laboratory test takes into account the prevalence of a certain disease, to quantify the probability that a positive test is associated with the disease in a randomly-selected individual, or alternatively, that a negative test is associated with health.


Illustration l.jpg
Illustration done a better experiment.”

  • Suppose you have invented a new screening test for Addison disease.

  • The test correctly identified 98 of 100 patients with confirmed Addison disease (What is the sensitivity?)

  • The test was positive in only 2 of 1000 patients with no evidence of Addison disease (What is the specificity?)


Test performance l.jpg
Test performance done a better experiment.”

  • The sensitivity is 98.0%

  • The specificity is 99.8%

  • But Addison disease is a rare disorder--incidence = 1:10,000

  • What happens if we screen 1 million people?


Analysis202 l.jpg
Analysis done a better experiment.”

  • In 1 million people, there will be 100 cases of Addison disease.

  • Our test will identify 98 of these cases (TP)

  • Of the 999,900 non-Addison subjects, the test will be positive in 0.2%, or about 2,000 (FP).


Predictive value of the positive test l.jpg
Predictive value of the positive test done a better experiment.”

The predictive value is the % of all positives that are true positives:


What about the negative predictive value l.jpg
What about the negative predictive value? done a better experiment.”

  • TN = 999,900 - 2000 = 997,900

  • FN = 100 * 0.002 = 0 (or 1)


Summary of predictive value l.jpg
Summary of predictive value done a better experiment.”

Predictive value describes the usefulness of a clinical laboratory test in the real world.

Or does it?


Lessons about predictive value l.jpg
Lessons about predictive value done a better experiment.”

  • Even when you have a very good test, it is generally not cost effective to screen for diseases which have low incidence in the general population. Exception?

  • The higher the clinical suspicion, the better the predictive value of the test. Why?


Efficiency l.jpg
Efficiency done a better experiment.”

We can combine the PV+ and PV- to give a quantity called the efficiency:

The efficiency is the percentage of all patients that are classified correctly by the test result.


Efficiency of our addison screen l.jpg
Efficiency of our Addison screen done a better experiment.”


Ronald aylmer fisher 1890 1962 l.jpg

“To call in the statistician after the experiment is done may be no more than asking him to performa postmortem examination: he may be able to say what the experiment died of.”

Ronald Aylmer Fisher (1890 - 1962)


Application of statistics to quality control l.jpg
Application of Statistics to Quality Control may be no more than asking him to perform

  • We expect quality control to fit a Gaussian distribution

  • We can use Gaussian statistics to predict the variability in quality control values

  • What sort of tolerance will we allow for variation in quality control values?

  • Generally, we will question variations that have a statistical probability of less than 5%


He uses statistics as a drunken man uses lamp posts for support rather than illumination l.jpg

“He uses statistics as a drunken man uses lamp posts -- for support rather than illumination.”

Andrew Lang (1844-1912)


Westgard s rules l.jpg

1 for support rather than illumination.”2s

13s

22s

R4s

41s

10x

1 in 20

1 in 300

1 in 400

1 in 800

1 in 600

1 in 1000

Westgard’s rules


Some examples l.jpg
Some examples for support rather than illumination.”

+3sd

+2sd

+1sd

mean

-1sd

-2sd

-3sd


Some examples214 l.jpg
Some examples for support rather than illumination.”

+3sd

+2sd

+1sd

mean

-1sd

-2sd

-3sd


Some examples215 l.jpg
Some examples for support rather than illumination.”

+3sd

+2sd

+1sd

mean

-1sd

-2sd

-3sd


Some examples216 l.jpg
Some examples for support rather than illumination.”

+3sd

+2sd

+1sd

mean

-1sd

-2sd

-3sd


Paul adrien maurice dirac 1902 1984 l.jpg

“In science one tries to tell people, in such a way as to be understood by everyone, something thatno one ever knew before. But in poetry, it's the exact opposite.”

Paul Adrien Maurice Dirac (1902- 1984)


ad