the use of dummy variables l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Use of Dummy Variables PowerPoint Presentation
Download Presentation
The Use of Dummy Variables

Loading in 2 Seconds...

play fullscreen
1 / 78

The Use of Dummy Variables - PowerPoint PPT Presentation


  • 430 Views
  • Uploaded on

The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent variables are categorical.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Use of Dummy Variables' - elina


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
In the examples so far the independent variables are continuous numerical variables.
  • Suppose that some of the independent variables are categorical.
  • Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.
slide4
Situation:
  • k treatments or k populations are being compared.
  • For each of the k treatments we have measured both
    • Y (the response variable) and
    • X (an independent variable)
  • Y is assumed to be linearly related to X with
    • the slope dependent on treatment (population), while
    • the intercept is the same for each treatment
slide6
This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.
  • Dummy variables are variables that are artificially defined
slide7
In this case we define a new variable for each category of the categorical variable.

That is we will define Xi for each category of treatments as follows:

slide9
In this case

Dependent Variable: Y

Independent Variables: X1, X2, ... , Xk

slide10
In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis

(q = k – 1)

slide11
The Reduced Model:

Dependent Variable: Y

Independent Variable:

X = X1+ X2+... + Xk

slide12
Example:

In the following example we are measuring

  • Yield Y

as it depends on

  • the amount (X) of a pesticide.

Again we will assume that the dependence of Y on X will be linear.

(I should point out that the concepts that are used in this discussion can easily be adapted to the non-linear situation.)

slide13
Suppose that the experiment is going to be repeated for three brands of pesticides:
  • A, B and C.
  • The quantity, X, of pesticide in this experiment was set at 4 different levels:
    • 2 units/hectare,
    • 4 units/hectare and
    • 8 units per hectare.
  • Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide.
slide14
Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent.
slide15

2

4

8

A

29.63

28.16

28.45

31.87

33.48

37.21

28.02

28.13

35.06

35.24

28.25

33.99

B

32.95

29.55

44.38

24.74

34.97

38.78

23.38

36.35

34.92

32.08

38.38

27.45

C

28.68

33.79

46.26

28.70

43.95

50.77

22.67

36.89

50.21

30.02

33.56

44.14

The data for this experiment is given in the following table:

slide17

Pesticide

X (Amount)

X1

X2

X3

Y

A

2

2

0

0

29.63

A

2

2

0

0

31.87

A

2

2

0

0

28.02

A

2

2

0

0

35.24

B

2

0

2

0

32.95

B

2

0

2

0

24.74

B

2

0

2

0

23.38

B

2

0

2

0

32.08

C

2

0

0

2

28.68

C

2

0

0

2

28.70

C

2

0

0

2

22.67

C

2

0

0

2

30.02

A

4

4

0

0

28.16

A

4

4

0

0

33.48

A

4

4

0

0

28.13

A

4

4

0

0

28.25

B

4

0

4

0

29.55

B

4

0

4

0

34.97

B

4

0

4

0

36.35

B

4

0

4

0

38.38

C

4

0

0

4

33.79

C

4

0

0

4

43.95

C

4

0

0

4

36.89

C

4

0

0

4

33.56

A

8

8

0

0

28.45

A

8

8

0

0

37.21

A

8

8

0

0

35.06

A

8

8

0

0

33.99

B

8

0

8

0

44.38

B

8

0

8

0

38.78

B

8

0

8

0

34.92

B

8

0

8

0

27.45

C

8

0

0

8

46.26

C

8

0

0

8

50.77

C

8

0

0

8

50.21

C

8

0

0

8

44.14

The data as it would appear in a data file.

The variables X1, X2 and X3 are the “dummy” variables

slide18

ANOVA

Coefficients

Intercept

df

26.24166667

SS

MS

F

Significance F

Regression

X1

0.981388889

3

1095.815813

365.2719378

18.33114788

4.19538E-07

X2

Residual

1.422638889

32

637.6415754

19.92629923

Total

X3

2.602400794

35

1733.457389

Fitting the complete model :

slide19

ANOVA

Coefficients

Intercept

df

26.24166667

SS

MS

F

Significance F

Regression

X

1

1.668809524

623.8232508

623.8232508

19.11439978

0.000110172

Residual

34

1109.634138

32.63629818

Total

35

1733.457389

Fitting the reduced model :

slide20

df

SS

MS

F

Significance F

common slope zero

1

623.8232508

623.8232508

31.3065283

3.51448E-06

Slope comparison

2

471.9925627

235.9962813

11.84345766

0.000141367

Residual

32

637.6415754

19.92629923

Total

35

1733.457389

The Anova Table for testing the equality of slopes

slide21

Example:Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)

slide22
Situation:
  • k treatments or k populations are being compared.
  • For each of the k treatments we have measured both Y (then response variable) and X (an independent variable)
  • Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment.
  • Y is called the response variable, while X is called the covariate.
slide25
This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.
slide26
In this case we define a new variable for each category of the categorical variable.

That is we will define Xi for categories I

i = 1, 2, …, (k – 1) of treatments as follows:

slide28
In this case

Dependent Variable: Y

Independent Variables:

X1, X2, ... , Xk-1, X

slide29
In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis

(q = k – 1)

slide30
The Reduced Model:

Dependent Variable: Y

Independent Variable: X

slide31
Example:

In the following example we are interested in comparing the effects of five workbooks (A, B, C, D, E) on the performance of students in Mathematics. For each workbook, 15 students are selected (Total of n = 15×5 = 75). Each student is given a pretest (pretest score ≡ X) and given a final test (final score ≡ Y). The data is given on the following slide

the data
The data

The Model:

some comments
Some comments
  • The linear relationship between Y (Final Score) and X (Pretest Score), models the differing aptitudes for mathematics.
  • The shifting up and down of this linear relationship measures the effect of workbooks on the final score Y.
slide38

Here is the data file in SPSS with the Dummy variables, (X1 , X2, X3, X4 )added. The can be added within SPSS

slide39

Fitting the complete model

The dependent variable is the final score, Y.

The independent variables are the Pre-score X and the four dummy variables X1, X2, X3, X4.

the interpretation of the coefficients43
The interpretation of the coefficients

The intercept for workbook E

the interpretation of the coefficients44
The interpretation of the coefficients

The changes in the intercept when we change from workbook E to other workbooks.

slide45

The model can be written as follows:

The Complete Model:

  • When the workbook is E then X1 = 0,…, X4 = 0 and
  • When the workbook is A then X1 = 1,…, X4 = 0 and

hence d1 is the change in the intercept when we change form workbook E to workbook A.

slide46

Testing for the equality of the intercepts

The reduced model

The dependent variable in only X (the pre-score)

slide47

Fitting the reduced model

The dependent variable is the final score, Y.

The independent variables is only the Pre-score X.

the output continued49
The Output - continued

Increased R.S.S

slide51

The Reduced model

The Complete model

slide53

Testing for zero slope

The reduced model

The dependent variables are X1, X2, X3, X4(the dummies)

slide54

The Reduced model

The Complete model

the analysis of covariance
The Analysis of Covariance
  • This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA)
  • The package sets up the dummy variables automatically
slide58

In SPSS to perform ANACOVA you select from the menu –

Analysis->General Linear Model->Univariatee

slide60

You now select:

  • The dependent variableY (Final Score)
  • The Fixed Factor (the categorical independent variable – workbook)
  • The covariate (the continuous independent variable – pretest score)
slide61

The output: The ANOVA TABLE

Compare this with the previous computed table

slide62

The output: The ANOVA TABLE

This is the sum of squares in the numerator when we attempt to test if the slope is zero (and allow the intercepts to be different)

another application of the use of dummy variables
Another application of the use of dummy variables
  • The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes).

Y

X

nodes

slide64

bk

Y

b2

b1

X

x1

x2

xk

The model

or

slide66
Then the model

can be written

an example
An Example

In this example we are measuring Y at time X.

Y is growing linearly with time.

At time X = 10, an additive is added to the process which may change the rate of growth.

The data

testing for no change in slope
Testing for no change in slope

Here we want to test

H0: b1 = b2 vs HA: b1≠ b2

The reduced model is

Y = b0 + b1 (X1+ X2)+ e

= b0 + b1 X+ e

slide75
Fitting the reduced model

We now regress y on x.