1 / 40

# ANALYSIS OF VARIANCE (ANOVA) - PowerPoint PPT Presentation

ANALYSIS OF VARIANCE (ANOVA). ? =. ? =. STATITICAL DATA ANALYSIS. COMMON TYPES OF ANALYSIS? Examine Strength and Direction of Relationships Bivariate (e.g., Pearson Correlation—r) Between one variable and another: r xy or Y = a + b 1 x 1

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about ' ANALYSIS OF VARIANCE (ANOVA)' - eliana-moody

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

(ANOVA)

?

=

?

=

• COMMON TYPES OF ANALYSIS?

• Examine Strength and Direction of Relationships

• Bivariate (e.g., Pearson Correlation—r)

• Between one variable and another: rxy or Y = a + b1 x1

• Multivariate (e.g., Multiple Regression Analysis)

• Between one dep. var. and each of several indep. variables, while holding all other indep. variables constant:

• Y = a + b1 x1 + b2 x2 + b3 x3 + . . . + bk xk

• Compare Groups

• Compare Proportions (e.g., Chi-Square Test—2)

• H0: P1 = P2 = P3 = … = Pk

• Compare Means (e.g., Analysis of Variance)

• H0: µ1 = µ2 = µ3 = …= µk

To compare the mean values of a certain characteristic among two or more groups.

To see whether two or more groups are equal (or different) on a given metric characteristic.

To examine whether a metric dependentvariable is a

function of a categorical independent variable.

ONE-WAY ANOVA

ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist

When Do You Use ANOVA?

Sir Ronald Fisher (1890-1962)

Remember: Level of measurement determines choice of statistical method.

Statistical Techniques and Levels of Measurement:

INDEPENDENT

NOMINAL/CATEGORICAL METRIC (ORDERED METRIC or HIGHER)

* Chi-Square * Discriminant Analysis

* Fisher’s Exact Prob. * Logit Regression

* T-Test * Correlation Analysis

* Analysis of Variance * Regression Analysis

(An Example ?)

NOMINAL

DEPENDENT

METRIC

ONE-WAY ANOVA statistical method.

H0 in ANOVA?

H0: There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H0: µ1 = µ2 = µ3 = …= µk

Ha (Conclusion if H0 rejected)?

Not all group means are equal(i.e., at least one group mean is different from the rest).

Scenario 1 statistical method..When comparing 2 groups, a one-step test :

2 Groups: A B

Step 1: Check to see if the two groups are different or not, and if so, how.

Scenario 2. When comparing >3 groups, if H0 is rejected, it isa two-step test:>3 Groups: A B C

Step 1: Overall testthat examines if all groups are equal or not.And, if not all are equal (H0 rejected), then:

Step 2: Pair-wise (post-hoc) comparison teststo see where(i.e., among which groups) the differences exit, and how.

ONE-WAY ANOVA

So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups:

ANOVA TABLE

Test Statistic

Let’s see the intuitive logic…

Sample Data: statistical method.A random sample of 9 banks, 10 retailers, and 10 utilities.

Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries

Banking Retailing Utility

\$6.42 \$3.52 \$3.55

2.83 4.21 2.13

8.94 4.36 3.24

6.80 2.67 6.47

5.70 3.49 3.06

4.65 4.68 1.80

6.20 3.30 5.29

2.71 2.68 2.96

8.34 7.25 2.90

----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

H0: There were no differences in average EPS of Banks, Utilities, and Retailers.

First logical thing you do?

_ _ _ =

xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21

ONE-WAY ANOVA

EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year.

Why is it statistical method.called ANOVA?

Differences in EPS (Dep. Var.) among all 29 firms hastwo components--differences among the groups and differences within the groups. That is,

There are some differences in EPS amongthe three groups of firms (Banks vs. Retailers vs. Utilities), and

There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves).

ONE-WAY ANOVA

• ANOVA will partition/analyze the varianceof the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups.

• WHY?

ONE-WAY ANOVA statistical method.

• The underlying intuitive logicin ANOVA:

• If the groups that are being compared, come from the same population (i.e., if groups are alike/equal):

• They should exhibit similar differences (have equal variability)

• Hence, the differencesamong these groups

• should be no more than the differences withinthem (i.e., among members within same groups).

• That is, groups that are alike/similar are expected to have about as much variability betweenthemasthey havewithinthem.

On the other hand… statistical method.

If the groups being compared are divergent/dissimilar/unequal ?

They would exhibit more difference between them thanthey show within them.

Among members within the same groups

That is, they will havegreater similarity/commonality internallythan they have externally(with members of the other groups).

ONE-WAY ANOVA

Compute statistical method. the differences that exist among these groups, and

Compareit with the differences that existwithinthese groups.

And, that is exactly what ANOVA does….

QUESTION: How do we usually measuredifferences?

ONE-WAY ANOVA

• CRITERION USED BY ANOVA: Groups can be considered different if there exists…?

• …if there exists larger differences among these groups than there are among members within them.

• QUESTION:

• Given the above, what would one have to do to conduct ANOVA?

• That is, what do you have to do to judge whether or not two or more groups can be considered different/equal (with respect to a given characteristic)?

VARIANCE: statistical method.A useful index of differences/variations/ dispersion among a set of values/scores.

Estimate of average (i.e., per observation) difference from the mean

Computation?

ONE-WAY ANOVA

QUESTION: How do we usually measure differences/variations?

• Sum of squared deviations from the mean

• S2 =

• Sample Size – 1

So, steps in performing ANOVA: statistical method.

Compute the BETWEEN-GROUP VARIANCE for the characteristic under study (i.e., the dependent variable),

Compute the WITHIN-GROUP VARIANCE for the same characteristic/variable, and then

COMPAREthe two

(i.e., check to see if Between Group var. > Within Group Var.)

NOTE: In ANOVA the term “MEAN SQUARE,” rather thanvariance, is utilized.

ONE-WAY ANOVA

Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries

Banking Retailing Utility

6.42 3.52 3.55

2.83 4.21 2.13

8.94 4.36 3.24

6.80 2.67 6.47

5.70 3.49 3.06

4.65 4.68 1.80

6.20 3.30 5.29

2.71 2.68 2.96

8.34 7.25 2.90

----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

___=

xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21

Total WITHIN Group Variance (or Mean Square WITHIN)?

ONE-WAY ANOVA

ONE-WAY ANOVA Three Industries

Mean Square WITHIN Groups (MSW):

Let’s see what we just did:

The generic mathematicalformula for MSW:

Called “Degrees of Freedom”=

(nB-1)+(nR-1)+(nU-1)

Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries

Banking Retailing Utility

6.42 3.52 3.55

2.83 4.21 2.13

8.94 4.36 3.24

6.80 2.67 6.47

5.70 3.49 3.06

4.65 4.68 1.80

6.20 3.30 5.29

2.71 2.68 2.96

8.34 7.25 2.90

----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

_ _ _ =

xB = 5.84 xR = 3.63 xU = 3.31 x = 4.21

Let’s now compute the BETWEEN Group Variance (Mean Square

BETWEEN--MSB)?

ONE-WAY ANOVA

ONE-WAY ANOVA Three Industries

Mean Square BETWEEN Groups (MSB):

Let’s see what we just did:

Weighted by respective group sizes

Mathematical formula for MSB:

Called Degrees of

Freedom

ONE-WAY ANOVA Three Industries

• Mean Square Between Groups = MSB =17.698

• MSBrepresents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries)

• That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities.

ONE-WAY ANOVA Three Industries

• Mean Square Within Groups (MSResidual/Error) =MSW =3.35

• MSWrepresents:

• The differences in EPS (the dependent variable) that aredue to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.)

• Plus . . .

• The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would have different EPS levels).

ONE-WAY ANOVA Three Industries

• Now, let’s compare MSB & MSW:

• MSB = 17.6 and MSW = 3.35.QUESTION:Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal?

• When MSB is significantly larger than MSW.

• QUESTION:

• What would be a reasonable index (a single number) that willshow how large MSB is compared to MSW?

• (i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)?

Compare BETWEEN and WITHIN Group Three IndustriesVariances/Mean Squares--Compute the F-Ratio:

• Ratio of MSB and MSW (Call it F-Ratio):

• What can we infer when F-ratiois close to1?

• MSB and MSW are likely to be equal and, thus, there is a strong likelihood that NO difference exists among the comparison groups.

• How about when F-ratio is significantly larger than 1?

• The more F-ratio exceeds 1, the larger MSB is compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist.

• Results of the above computations are usually summarized

in an ANOVA TABLE such as the one that follows:

ANOVA TABLE Three Industries

For our sample companies Three Industries, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW)

QUESTION: What is our null Hypothesis?

QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null?

ANSWER:It would be if the chance of being wrong (in rejecting the null) does not exceed 5%.

So, look up the F-value in the table of F-distribution (under appropriate degrees of freedom) to find out what the -level will be if, given this F-value, we decide to reject the null.

Degrees of Freedom: v1 = k – 1 = 2

v2 = n – k = 26

ONE-WAY ANOVA

Interpretation and Conclusion:

QUESTION: What does the F = 5.28 mean, intuitively?

11 Three Industries

F = 3.37 is significant at  = 0.05 (If F=3.37 and we reject H0, 5% chance of being wrong)

• Our F = Three Industries5.28 > 4.27

• So, what can we say about our -level?

• F = 4.27 is significant at  = 0.025.

• That is, if F=4.27 and we reject H0, we would face 5% chance of being wrong.

• But, our F = 5.28 > 4.27

• So, what can we say about our -level? Will it be larger or smaller than 0.025?

The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e.,  < 0.025).

Would rejecting the null be a safe bet?

Conclusion?

Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared.

Is the analysis complete?

ONE-WAY ANOVA

• Our F = 5.28 > 4.27

Is our analysis complete? would be

It would be if we were comparing only two groups; simply examine which sample mean is larger than which and report!!

HOWEVER, …

If null is rejectedandmore than two groups are being compared:

REMAINING QUESTION:Where exactly (i.e., between which groups) do the differences lie? And, which group(s) of firms exhibit relatively higher, lower, or equal EPS levels?

ANSWER: Perform post hoc, multiple comparison tests.

SPSS (and other software packages) offer a variety of options (e.g., LSD, Bonferroni, Tukey, etc.) to choose from.

Let’s now review the steps involved…

ONE-WAY ANOVA

Overall Ho: would be All Group Means Are Equal H1: Not All Groups Are Equal

How many groups are being compared?

ONE-WAY ANOVA

No( > .05)

Is overall F significant?

(i.e., < 0.05) Yes( <.05)

Don’t reject Ho; No group diff. found; stop

Reject Ho; Not all group means are equal. (i.e., at least 2 groups are diff.)

If only 2 If more than 2

Conduct post-hoc pairwise comparison tests to see where the differences lie. Examine the results.

Examine the group means.

Examine the group means.

Report which group has higher/lower mean

Report which groups have higher/lower means.

Stop

Stop

ANOVA in SPSS would be

Let’s now use SPSS to perform the same analysis.

NOTE: Students are supposed to have printed andbrought the “SPSS OUTPUT One-Way ANOVA” PDF file with them to class.

ONE_WAY_EPS_SPSS_FILE

TWO-WAY ANOVA (with Interaction) would be

In our EPS example, suppose you suspect that a company’s size category (small vs large)also may have a sig. effect on EPS. As such, since you did not attempt to control for company size when selecting your sample firms, small and large companies may not have been equally represented in the three industry groups (e.g., what if compared to the banks in the sample, all or a much greater % of retailers and utilities were small?). As such you are concerned that the potential confounding effect of company size may have distorted your earlier results. So, you now wish to examine possible EPS differences among the 3 industries while controlling for the possible confounding effect of company size (i.e., holding size constant/equal for the firms in our three industries). In other words, you wish to know if there are any differences among average EPS of banks, retailers, and utilities of equal size.

.

TWO-WAY ANOVA (with Interaction) would be

• So, Two-Way ANOVA will help us learn if banks in general, even after controlling for co. size, would, on average, have higher EPS than retailers and utilities.

• But an additional advantage of Two-Way ANOVA is that it can also show us whether a particular group of banks (i.e., CERTAIN COMBINATIONS of industry and size category) are more/less conducive to EPS than others combinations of the two characteristics.

• As just one example, it can show us if only the larger banks (and not all banks in general) have significantly higher EPS compared to firms in the other two industries (or compared to only the smaller firms in the other two industries).

TWO-WAY ANOVA (with would be Main & Interaction Effects):

Analyze: General Linear Models

Univariate: Y to “Dependent” box, Categorical X1 & X2 to the “Fixed Factors” box

Model: Full, Continue

Plots: X1 to “Horizontal”, X2 to “Separate Lines”, Add, Continue

Post Hoc: Move factors (IVs) with >2 groups to “Post Hoc Tests” box, select “Tukey or Bonferoni”, Continue

Options: Move Overall, X1, X2, and X1*X2 to “Display Means” Box, check “Descriptive Stats.”, Continue

OK

NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT Two-Way ANOVA with Interaction” PDF file with them to class.

ANOVA Using SPSS

TWO_WAY_EPS_SPSS_FILE

Ho: There are no differences among the groups represented by either variable

TWO-WAY ANOVA (Main & Interaction Effects Model)

No

Is overall F significant?

(i.e., < 0.05) Yes

Don’t reject Ho; No group diff. found; STOP

Reject Ho; Some differences among the groups represented by at least one of the var.

Determine if the interaction effect is significant?

NOYES Examine plot of interaction effect for resultsa. Examine which main effect, if any, is significant (i.e., differences existacross categories of which independent variable). STOP

b. Is the significant indep. var. dichotomous (i.e. represents only 2 groups)?

Yes, only 2 groups No, more than 2 groups

Examine the group means for that variable; report which group has higher/lower mean.

Conduct post-hoc pairwise comparison tests for that var. to see where the differences lie. Examine the results.

Examine the group means for that variable; report which groups have higher/lower means.

STOP

STOP

ANOVA either variable

CAUTION:

Don’t get carried away with the number of factors (independent categorical variables); DON’T DO N-WAY ANOVA !!!

ANOTHER EXAMPLES: either variable

Using the gss.sav data file, we wish to find out if the age at which one gets married (agewed) is a function of one’s gender (sex) and highest educational degree (degree). That is, if average marriage age is different among the two genders and various educational groups. If so, in what way?

NOTE: Here, we are considering/treating educational degree as a nominal/categorical variable, and NOT as an ordered metric variable.

ANOVA Using SPSS

ASSIGNMENT 4 either variable

1. Suppose, as a social scientist, you are interested in studying gender differences in preference for different types of music. Specifically, you wish to know if there are differences between men and women relative to how much they like classical music (variables classical). The gss.savdata file (on your SPSS Data Disk) includes data regarding such issues. This data set represents 1500 randomly selected cases from the 1993 General Social Survey. Use the data from this SPSS file to address the above questions.

NOTE:

If you check the value labels for the variables classical, opera, and country in the gss.sav file, you will see that they were measured on 5-point scales (1=Like Very Much, 5=Dislike Very Much) and, thus, can be considered metric.

ASSIGNMENT 4 either variable

• As a staff researcher in the HR Department of a major company, you are interested in learning if there are differences among male and female employees and among employees who have different levels of education regarding the level of importance that they attach (a) to having a fulfilling job. Data regarding such issues have been obtained through the General Social Research Survey using a representative sample of approximately 1500 working men and women in the U.S. You have access to the resulting data (see gss.sav SPSS data file, variablessex, impjob,anddegree). Use this data set to address the above issues.

IMPORTANT NOTES FOR QUESTIONS 2, 3, AND 4: either variable

• Treat variable “degree” as a categorical/nominal variable.

• When interpreting the results, please pay attention to the fact that if you check the value labels for the dependent variables, you will notice that it was measured on 5-point scales (1=One of Most Important, 5=Not at All Important).

• If you find it necessary to conduct ad-hoc multiple comparison tests,

use the Tukey option.

• IMPORTANT: If alpha level for a given test is just slightly higher than 0.05 (e.g., 0.054) consider that difference statistically significant.

REMINDERS:

• For each analysis, include the Notes part of the SPSS output in the printout. Also edit the first page of every output to include your name. Make sure that you state your complete interpretations and explanationson the appropriate pages of the output. Be specific as to how you have used what parts of the output to reach your conclusions. Make sure that your explanations are complete. For example, it is not enough to say that there is a difference between groups A and B regarding characteristic C. You have to go on to indicate how the two groups are different on characteristic C (e.g., “on average, group A exhibits more/less of the characteristic C”).

QUESTIONS OR COMMENTS either variable