Anova analysis of variance
Download
1 / 36

ANOVA ( Analysis of Variance) - PowerPoint PPT Presentation


  • 286 Views
  • Uploaded on

ANOVA ( Analysis of Variance). Martina Litschmannová m artina.litschmannova @vsb.cz K210. The basic ANOVA situation. Two variables: 1 Categorical, 1 Quantitative

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ANOVA ( Analysis of Variance)' - ania


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Anova analysis of variance

ANOVA(Analysisof Variance)

Martina Litschmannová

[email protected]

K210


The basic anova situation
The basic ANOVA situation

  • Two variables: 1 Categorical, 1 Quantitative

  • Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in?

  • If categorical variable has only 2 values:

    • nullhypothesis:

  • ANOVA allows for 3 or more groups


An example anova situation
An example ANOVA situation

  • Subjects: 25 patients with blisters

  • Treatments: Treatment A, Treatment B, Placebo

  • Measurement: # of days until blisters heal

  • Data [and means]:

    • A: 5,6,6,7,7,8,9,10 [7.25]

    • B: 7,7,8,9,9,10,10,11 [8.875]

    • P: 7,9,9,10,10,10,11,12,13 [10.11]

  • Are these differences significant?


Informal investigation
Informal Investigation

  • Graphical investigation:

    • side-by-side box plots

    • multiple histograms

  • Whether the differences between the groups are significant depends on

    • the difference in the means

    • the standard deviations of each group

    • the sample sizes

  • ANOVA determines p-value from the F statistic



What does anova do

At its simplest (there are extensions) ANOVA tests the following hypotheses:

H0: The means of all the groups are equal.

HA: Not all the means are equal.

Doesn’tsay how or which ones differ.

Can follow up with “multiple comparisons”.

Note: We usually refer to the sub-populations as “groups” when

doing ANOVA.

What does ANOVA do?


Assumptions of anova
Assumptions of ANOVA following hypotheses:

  • each group is approximately normal

    • check this by looking at histograms and/or normal quantile plots, or use assumptions

    • can handle some nonnormality, but not severe outliers

    • test of normality

  • standard deviations of each group are approximately equal

    • rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1

    • test ofhomoscedasticity


Normality check
Normality Check following hypotheses:

  • We should check for normality using:

    • assumptions about population

    • histograms for each group

    • normal quantile plot for each group

    • test of normality(Shapiro-Wilk, Liliefors, Anderson-

    • Darling test, …)

  • With such small data sets, there really isn’t a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.


Shapiro wilk test
Shapiro-Wilk following hypotheses:test

  • One of the strongest tests of normality. [Shapiro, Wilk]

    Online computer applet (Simon Ditto, 2009) for this test can be found here.


Standard deviation check
Standard Deviation Check following hypotheses:

Variable treatment N Mean Median StDev

days A 8 7.250 7.000 1.669

B 8 8.875 9.000 1.458

P 9 10.111 10.000 1.764

  • Compare largest and smallest standard deviations:

    • largest: 1,764

    • smallest: 1,458

    • 1,764/1,458=1,210<2 OK

  • Note: Std. dev. ratio greather then 2 signs heteroscedasticity.


Anova notation
ANOVA following hypotheses:Notation

Number of Individualsall together :,

Sample means: ,

Grand mean: ,

Sample Standard Deviations:


Levene test
Levene following hypotheses: Test

Null and alternativehypothesis:

H0: , HA:

Test Statistic:

,

where, ,, ,.

p-value:, where is CDF ofFisher-Snedecordistributionwith, degreesoffreedom.


How anova works outline
How ANOVA works (outline) following hypotheses:

ANOVA measures two sources of variation in the data and compares their relative sizes.


How anova works outline1
How ANOVA works (outline) following hypotheses:

  • Sum ofSquaresbetweenGroups,

    ,

    resp. MeanofSquares – betweengroups

    ,

    whereis degrees of freedom .

  • Sum ofSquares– errors

    ,

    resp. Meanofsquares - error

    ,

    whereis degrees of freedom.

DifferencebetweenMeans

DifferencewithinGroups


The ANOVA F-statistic is a ratio of the Between Group following hypotheses:Variaton divided by the Within Group Variation:

A large F is evidence againstH0, since it indicates that there is more difference between groups than within groups.


Anova output
ANOVA following hypotheses:Output


How are these computations made
How are these computations made? following hypotheses:


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

CountAverage Variance

------------------------------------------------------------------------------

<35 let 53 25,0796 10,3825

35 -50 let 123 25,9492 16,2775

>50 let 76 26,0982 12,3393

-------------------------------------------------------------------------------

Total 252 25,8113 13,8971


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Assumptions:

  • Normality

  • Homoskedasticita

    H0: , HA: (Levenetest)


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:

CountAverage Variance

------------------------------------------------------------------------------

<35 let 53 25,0796 10,3825

35 -50 let 123 25,9492 16,2775

>50 let 76 26,0982 12,3393

-------------------------------------------------------------------------------

Total 252 25,8113 13,8971

+

+34,0


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:

k … number of sanmples

n … t


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:

=

/

/

=


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:

,

whereF(x) is CDF ofFisher-Snedecor distribution with 2 , 249 degrees of freedom


Using the results of exploratory analysis following hypotheses:and ANOVA test, verify that the age of a statistically significant effect on BMI.

Null andalternative hypothesis:

H0: , HA:

Calculating of p-value:

Result:

Wedontrejectnullhypothesisatthesignificancelevel 0,05. There is not a statistically significant difference between the meansof BMI depended on theage.


Where s the difference
Where’s the Difference? following hypotheses:

Once ANOVA indicates that the groups do not all appear to have the same means, what do we do?

Analysis of Variance for days

Source DF SS MS F P

treatmen 2 34.74 17.37 6.45 0.006

Error 22 59.26 2.69

Total 24 94.00

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev ----------+---------+---------+------

A 8 7.250 1.669 (-------*-------)

B 8 8.875 1.458 (-------*-------)

P 9 10.111 1.764 (------*-------)

----------+---------+---------+------

Pooled StDev = 1.641 7.5 9.0 10.5

Clearest difference: P is worse than A (CI’s don’t overlap)


Multiple comparisons
Multiple Comparisons following hypotheses:

  • Once ANOVA indicates that the groups do not allhave the same means, we can compare them twoby two using the 2-sample t test.

    • We need to adjust our p-value threshold because we are doing

    • multiple tests with the same data.

    • There are several methods for doing this.

    • If we really just want to test the difference between one pair of

    • treatments, we should set the study up that way.


Bonferroni method post hoc analysis
Bonferroni following hypotheses:method – post hoc analysis

We rejectnullhypothesisif

,

whereiscorrectsignificancelevel, ,

is isquantileof Student distributionwithdegreesoffreedom.


Kruskal wallis test
Kruskal-Wallis following hypotheses: test

  • The Kruskal–Wallis test is most commonly used when there is one nominalvariableand onemeasurementvariable, and the measurement variable does not meet the normality assumption of an ANOVA.

  • It is the non-parametric analogue of aone-way ANOVA.


Kruskal wallis test1
Kruskal-Wallis following hypotheses: test

  • Like most non-parametric tests, it is performed onranked data,so the measurement observations are converted to their ranks in the overall data set: the smallest value gets a rank of 1, the next smallest gets a rank of 2, and so on. The loss of information involved in substituting ranks for the original values can make this a less powerful test than an anova, so the anova should be used if the data meet the assumptions.

  • If the original data set actually consists of one nominal variable and one ranked variable, you cannot do an anova and must use the Kruskal–Wallis test.


The farm bred three breeds of rabbits. An attempt was made following hypotheses:(rabbits.xls), whose objective was to determine whether there is statistically significant (conclusive) the difference in weight between breeds of rabbits. Verify.


The effects of three drugs on blood following hypotheses:clottingwasdetermined. Among other indicators was determined the thrombin time. Information about the 45 monitored patients are recorded in the file thrombin.xls. Does the magnitude of thrombin time dependon theused preparation?


Study materials
Study following hypotheses:materials :

  • http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf

    (p. 142 - p.154)

  • Shapiro, S.S., Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika. 1965, roč. 52, č. 3/4, s. 591-611. Dostupné z: http://www.jstor.org/stable/2333709.


ad