Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven...
Sponsored Links
This presentation is the property of its rightful owner.
1 / 34

Applied Statistics Using SAS and SPSS PowerPoint PPT Presentation


  • 41 Views
  • Uploaded on
  • Presentation posted in: General

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven. Applied Statistics Using SAS and SPSS. Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay. Statistical Tools vs. Variable Types. Example: Battery Lifetime.

Download Presentation

Applied Statistics Using SAS and SPSS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.


Applied Statistics Using SAS and SPSS

Topic: One Way ANOVA

By Prof Kelly Fan, Cal State Univ, East Bay


Statistical Tools vs. Variable Types


Example: Battery Lifetime

  • 8 brands of battery are studied. We would like to find out whether or not the brand of a battery will affect its lifetime. If so, of which brand the batteries can last longer than the other brands.

  • Data collection: For each brand, 3 batteries are tested for their lifetime.

  • What is Y variable? X variable?


1 2 3 4 5 6 7 8

1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0

5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4

1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8

5.8

2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4

Data: Y = LIFETIME (HOURS)

BRAND

3 replications per level


Statistical Model

(Brand is, of course, represented as “categorical”)

“LEVEL” OF BRAND

1 2 • • •  •  •  • • • C

1

2

n

Y11 Y12 • • • • • • •Y1c

Yij = i + ij

i = 1, . . . . . , C

j = 1, . . . . . , n

Y21

YnI

Yij

Ync

•   •  •   •    •   •    •    • 


Hypotheses Setup

HO: Level of X has no impact on Y

HI: Level of X does have impact on Y

HO: 1 = 2 = • • • • 8

HI: not all j are EQUAL


ONE WAY ANOVA

Analysis of Variance for life

Source DF SS MS F P

brand 7 69.12 9.87 3.38 0.021

Error 16 46.72 2.92

Total 23 115.84

Estimate of the common variances^2

S = 1.709 R-Sq = 59.67% R-Sq(adj) = 42.02%


Review

  • Fitted value = Predicted value

  • Residual = Observed value – fitted value


Diagnosis: Normality

  • The points on the normality plot must more or less follow a line to claim “normal distributed”.

  • There are statistic tests to verify it scientifically.

  • The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much.

Normality plot: normal scores vs. residuals


From the Battery lifetime data:


Diagnosis: Equal Variances

  • The points on the residual plot must be more or less within a horizontal band to claim “constant variances”.

  • There are statistic tests to verify it scientifically.

  • The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much.

Residual plot: fitted values vs. residuals


From the Battery lifetime data:


Multiple Comparison

Procedures

Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.


These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES.

Errors (Type I):

We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However,

P(at least one type I error in the 3 tests)

= 1-P( accept all ) = 1 - (.95)3 .14

3, given true


In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23.

Question - Should we choose = .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or aexperimentwise)?

OR

Should we choose/control the overall error rate, “a”, to be .05, and find the individual test  by 1 - (1-)5 = .05, (which gives us  = .011)?


The formula

1 - (1-)5 = .05

would be valid only if the tests are independent; often they’re not.

[ e.g., 1=22=3, 1= 3

IF accepted & rejected, isn’t it more likely that rejected? ]

2

3

1

1

2

3


When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the overall error rate.


Categories of multiple comparison tests

- “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.)

“Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”)

“Post hoc” multiple comparisons (every column mean compared with each other column mean)


  • There are many multiple comparison procedures. We’ll cover only a few.

  • Post hoc multiple comparisons

  • Pairwise comparisons: Do a series of pairwise tests; Duncan and SNK tests

  • (Optional) Comparisons to control: Dunnett tests


Example: Broker Study

A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used.

Y=1000(A-P)/A

where

P=per share price paid for the stock;

A=average of high price and low price per share, for the day.

“The higher Y is the better the trade is.”


CoL: broker

1

12

3

5

-1

12

5

6

2

7

17

13

11

7

17

12

3

8

1

7

4

3

7

5

4

21

10

15

12

20

6

14

5

24

13

14

18

14

19

17

}

R=6

Five brokers were in the study and six trades

were randomly assigned to each broker.


SPSS Output

Analyze>>General Linear Model>>Univariate…


Homogeneous Subsets


Conclusion : 3, 1 2 4 5

???

Conclusion : 3, 1 2, 4, 5


Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers.

Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.

Conclusion : 3, 1 2 4 5


Comparisons to Control

Dunnett’s test

Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.”

Col

Example:

1 2 3 4 5

}

R=6

6 12 5 14 17

CONTROL


CONTROL

1 2 3 4 5

In our example:

6 12 5 14 17

- Cols 4 and 5 differ from the control [ 1 ].

- Cols 2 and 3 are not significantly different

from control.


Exercise: Sales Data

Sales


Exercise.

  • Find the Anova table.

  • Perform SNK tests at a = 5% to group treatments .

  • Perform Duncan tests at a = 5% to group treatments.

  • Which treatment would you use?


Post Hoc and Priori comparisons

  • F test for linear combination of column means (contrast)

  • Scheffe test: To test all linear combinations at once. Very conservative; not to be used for a few of comparisons.


  • Login