Update on statistics

1 / 32

# Update on statistics - PowerPoint PPT Presentation

Update on statistics. Phil Rowe Liverpool School of Pharmacy. Update on statistics. Plan ahead Interpretation of significance/non-significance Sample size calculations in the real world Keep it simple - keep it clear Keep it simple - keep it powerful Avoid multiple analyses

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Update on statistics' - maylin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Update on statistics

Phil Rowe

Liverpool School of Pharmacy

Update on statistics
• Interpretation of significance/non-significance
• Sample size calculations in the real world
• Keep it simple - keep it clear
• Keep it simple - keep it powerful
• Avoid multiple analyses
• Other techniques are available

Or else …

• Poor experimental design
• Results incapable of analysis
• Optimal analysis no longer legitimate

Amazing proportion of published experiments were virtually guaranteed to produce a non-significant result even if a sizeable experimental effect had been present.

Poor experimental design

What statistical methods are applicable will depend upon details of experimental design. If you perform an unpaired experiment, you may have to use (say) a two-sample t-test, whereas a paired design could have allowed the more powerful paired t-test

Results incapable of analysis

Fundamental flaws in experimental design should be identifiable even without considering statistical analysis. However, if you produce a satisfactory statistical analysis plan in advance you can assure the exclusion of a lot of errors (eg lack of proper controls).

Optimal analysis no longer legitimate

Some statistical tweaks are only legitimate if planned in advance. eg one tailed tests, equivalence limits.

Interpretation of significance/non-significance

A statistically significant result provides evidence against the null hypothesis and therefore shifts the balance of evidence in favour of the alternative hypothesis (There is an experimental effect).

But remember several things …

Significant results are not absolute

The evidence against the null hypothesis is not absolute. If (say) P = 0.01, we have merely demonstrated that the results we obtained would have been unlikely to arise if the null hypothesis were true. We have not shown that the null hypothesis is impossible.

Results might have been unlikely to arise if the null hypothesis is true, but the alternative hypothesis may be even less likely!

A clinical trial of a homeopathic medicine suggests there is a pharmacological effect (P = 0.02).

Explanation 1: Null hypothesis is true. Homeopathic medicine has no real effect. The apparent effect we saw was due to a statistical fluke that would arise on 1 occasion in 50.

Explanation 2: Alternative hypothesis is true. Homeopathic medicine does work.

First explanation is difficult to believe, but the other is even harder. Rational conclusion is still that homeopathy doesn’t work. (A series of successful trials would eventually force acceptance of effectiveness.)

• Where P < 0.05 all you have produced is evidence that an effect does exist. You should always consider the size of the effect.
• With a measured end point – How much does the mean value change as a result of the change in treatment?
• With a classified end point – How great is the change in the proportion of individuals falling into each category?
Non-significant doesn’t justify a negative conclusion
• A non-significant result may arise either because
• There is no effect present
• or
• There is an effect but your experiment lacks the power to detect it.
Non-significant doesn’t justify a negative conclusion

If you want to achieve an effective exclusion of any difference, you must establish “Equivalence limits” and compare your results to these.

See standard statistics lecture 9.

Taken from Stats lecture 9

Determining whether two digoxin preparations are equivalent

Mintab reports the 95% C.I. For the difference in AUCs as: -0.303 to +0.45

-0.8 -0.6 -0.4 -0.2 0 +0.2 +0.4 +0.6 +0.8

Change in AUC (μg.h.L-1)

Superimpose a ‘Region of equivalence’. Judgement is that a difference of ± 0.6 µg.h.L-1 (or less) is of no practical significance.

Conclusion: Two preparations are ‘Equivalent’.

Sample size calculations in the real world

Say we want to look at the effect of training on successful completion of a task by hospital pharmacists.

Randomise pharmacists into 2 groups. Train one group and leave the others alone (Controls). Test ability to complete the task. Classify each individual as ‘Successful’ or ‘Unsuccessful’.

Assume that 60% of controls will be successful, that we want to be able to detect an increase to 80% among the trained group and that we want 80% power.

Sample size for contingency c2 test

Size of difference

between outcomes to be detected

Sample size calculation

n

+

Power required

Calculating necessary sample size

Stat

Power and Sample Size

2 Proportions ...

80% success

Power of 80%

60% success

Minitab output

Power and Sample Size

Test for Two Proportions

Testing proportion 1 = proportion 2 (versus not =)

Calculating power for proportion 2 = 0.8

Alpha = 0.05

Sample Target

Proportion 1 Size Power Actual Power

0.6 82 0.8 0.803780

The sample size is for each group.

Require 82 controls plus 82 trained.

Unrealistic – approach it another way

Within un undergraduate project, there is no way that we will be able to experiment on 164 pharmacists!

Start out by deciding the maximum number we might conceivably deal with. Say this is 25 controls and 25 trained. Now use Minitab to calculate the size of change that would be detectable

Max group size we can deal with

60% success rate for controls

Mintab output

Power and Sample Size

Test for Two Proportions

Testing proportion 1 = proportion 2 (versus not =)

Calculating power for proportion 2 = 0.6

Alpha = 0.05

Sample

Size Power Proportion 1 Proportion 1

25 0.8 0.928484 0.219428

The sample size is for each group.

A sample size of 25 would allow us to distinguish between a success rate of 60% and one of 93% or 22%

Is the experiment worth doing?
• May decide either that …
• There is no realistic probability that the training method will raise success rates to 93%. So, even if the training was pretty successful (Maybe raise success rates to 85%) the experiment would still be virtually guaranteed to produce a non-significant result. Abandon the whole proposal.
• or …
• Training might be that successful, so it is worth carrying on. If we do, we must remember that the experiment has less than optimum power and a non-significant result must not be interpreted as definite evidence that the training failed. Non-significance could simply reflect the lack of power of our experiment.
Keep it simple Keep it clear

t-test versus ANOVA

Compare 2 treatments (A & B). If the t-test produces a significant result the interpretation is unambiguous. Treatment A leads to higher/lower values than B.

Compare 5 treatments (A …E). If an ANOVA produces significance, where are the differences? Can use ‘Follow up’ tests such as Tukey test, but even that never as clear as the t-test.

2X2 contingency table versus large table

Even worse! There are no follow up tests.

Keep it simple Keep it powerful

If treatments A & B genuinely differ from one another, a simple t-test comparison of the 2 may show a significant difference. However if these 2 are accompanied by a string of additional treatments that produce results intermediate between A & B and an ANOVA is used, the significance may be masked.

Keep it simple Keep it powerful

eg: Real purpose of experiment is to see whether an Iridium catalyst will increase the yield of a chemical process where a platinum catalyst is currently used. However various other metals (Palladium, Pt/Ir alloy and Pd/Ir alloy) are available, so we try these as well.

Keep it simple Keep it powerful

Yields of product (g)

Pt Ir Pd Pt/Ir Pd/Ir

3.45 5.34 2.23 3.71 2.611.81 3.61 2.92 3.97 3.142.95 3.25 2.12 1.41 2.420.89 2.66 2.25 3.22 3.672.22 2.26 3.56 2.43 3.113.57 3.97 1.24 1.83 3.272.79 3.39 2.01 3.73 1.932.06 1.41 4.93 4.45 3.202.38 4.13 3.10 2.39 2.081.94 4.99 3.06 1.70 3.91

If we’d kept it simple …

Two-Sample T-Test and CI: Platinum, Iridium

Two-sample T for Platinum vs Iridium

N Mean StDev SE MeanPlatinum 10 2.406 0.811 0.26Iridium 10 3.50 1.20 0.38

Difference = mu (Platinum) - mu (Iridium)

Estimate for difference: -1.09500

95% CI for difference: (-2.07008, -0.11992)

T-Test of difference = 0 (vs not =): T-Value = -2.39 P-Value = 0.030 DF = 15

Statistical significance is achieved

But we would get smart …

One-way ANOVA: Platinum, Iridium, Palladium, Pt/Ir, Pd/Ir

Source DF SS MS F PFactor 4 6.314 1.578 1.68 0.172Error 45 42.372 0.942Total 49 48.686

S = 0.9704 R-Sq = 12.97% R-Sq(adj) = 5.23%

Statistical significance is no longer achieved

But we would get smart …

Individual 95% CIs For Mean Based on

Level +---------+---------+---------+---------

Platinum (---------*---------)

Iridium (---------*----------)

Pt/Ir (---------*---------)

Pd/Ir (---------*---------)

+---------+---------+---------+---------

1.80 2.40 3.00 3.60

Platinum and Iridium contrast strongly (Sig), but within the group of 5 metals contrasts are generally weaker (Non-sig.)

Avoid multiple analyses

If you test a treatment that has absolutely no effect, there is always a 5% risk that random sampling error will lead to an apparent effect great enough to pass as statistically significant. That level of risk is considered acceptable.

However, if you make 10 comparisons, there is a 40% risk that at least one will generate a false positive. Ultimately this becomes a problem.

Avoid multiple analyses
• Need to apply some common sense. Many projects will realistically need more than one statistical analysis, but …
• Avoid unnecessary proliferation of tests
• Consider declaring (in advance) one or two tests as being ‘Primary’ and others as ‘Secondary’. If the latter are significant, the results would need to be confirmed by further work.
• Be especially wary of an odd isolated “Significant” result amid a sea of non-significance, after a long series of tests.
Other techniques are available
• In my lectures (L2) I only covered a limited range of possible experimental structures. Your project may well not fit any of these.
• Does a measured variable affect a categorical one. (eg Does age affect the likelihood that patients will comply with instructions?)
• Is a measured endpoint affected by two factors one of which is a classification and the other a measured value. (eg Is blood pressure affected by age and gender?)
• Don’t panic!!! There’s a statistical procedure for most experimental structures that you are likely to use.
Summary
• Write statistical analysis plan before generating data
• Think about how you will interpret significance or non-significance
• Can be flexible about sample size calculations but you must consider power
• Keep it simple
• Beware of multiple analyses