I. Qualitative (or Dummy) Independent Variables

1 / 90

# I. Qualitative (or Dummy) Independent Variables - PowerPoint PPT Presentation

?. I. Qualitative (or Dummy) Independent Variables. “Binary” vs. “Dummy”. I was taught to call these variables “binary variables” I now believe that “binary variables” is a better (more descriptive) name So, let’s call them “binary variables” from now on.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' I. Qualitative (or Dummy) Independent Variables' - lamar-james

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
“Binary” vs. “Dummy”
• I was taught to call these variables “binary variables”
• I now believe that “binary variables” is a better (more descriptive) name
• So, let’s call them “binary variables” from now on
III. Introduction
• A. New type of variable
• 1. Past: used quantitative variables (numerically measurable); continuous
• 2. Now: variables that take small number of values; discrete
• a) Gender
• b) Market size
• c) Region of country
• d) Marital status (married vs. not), etc
Introduction (cont.)
• B. Used as IV in this section
• C. Used as DV later in course
Introduction (cont.)
• Institute of Management Accounts (IMA) publishes an annual Salary Guide
• In Strategic Finance magazine
• Annual survey of members
• “…based on a regression equation derived from survey results.”
IMA Salary Guide (cont.)

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

• TOP=1 if top level mgmt, 0 if not
• SENIOR=1 if senior level mgmt , 0 if not
• ENTRY=1 if entry level , 0 if not
• NODEGREE=1 if no degree , 0 if not
• PROFCERT=1 if hold professional certification , 0 if not
• MALE=1 if male , 0 if not
• YEARS=years of experience
IMA Salary Guide (cont.)
• Average IMA member (1999)
• Male
• 14.5 years experience
• Professional certification
• Salary = \$66,356
• Figure obtained from substituting values into regression equation

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs. (note)

Introduction (cont.)
• D. Example #1
• 1. Y =  + X2 + 
• 2. Y: social program expenditures per state
• 3. X2: state’s total revenue
• 4. Suppose states’ legislatures controlled by Democrats spend more from same revenue than those controlled by Republicans
• 5. How account for this in model?
• 6. What’s the categorical variable?
Introduction (cont.)
• E. Example #2
• 1. Y =  + X2 + 
• 2. Y: coach’s earnings
• 3. X2: coach’s experience
• 4. Suppose women earn less than men with equal experience (& other characteristics)
• 5. How account for this in model?
• 6. What’s the categorical variable?
Introduction (cont.)
• F. Example #3
• 1. Y =  + X2 + 
• 2. Y: sales of swimsuits in Minnesota
• 3. X2: Minnesota’s population
• 4. Suppose sales peak in warm months
• 5. How account for this in model?
• 6. What’s the categorical variable?
Introduction (cont.)
• G. Example #4
• 1. Y =  + X2 + 
• 2. Y: profits of NBA teams
• 3. X2: wins
• 4. Suppose teams in large markets make more profit on their wins than teams in other markets
• 5. How account for this in model?
• 6. What’s the categorical variable?
Introduction (cont.)
• G. Will use Binary (or Dummy) Independent Variables
• 1. Create a special variable that takes a value of
• a) if the unit of observation falls into one category
• b) if the unit falls into the other category

1

0

A. A MAN NAMED “ALFRED DUMMY” INVENTED THEM

C. THEY REPRESENT CATEGORICAL VARIABLES

B. ANYONE WHO USES THEM IS. . . A DUMMY

?

Introduction (cont.)
• 2. Example
• a) GENDER = 1 for all females in the sample
• b) GENDER = 0 for all males
Introduction (cont.)
• c. you pick which category gives a value of 1 and which category gives a value of 0
• EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males

OBS # GENDER

1 male

2 male

3 female

4 male

5 female

6 female

Introduction (cont.)
• c. you pick which category gives a value of 1 and which category gives a value of 0
• EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males

OBS # GENDER

1 male 0

2 male

3 female

4 male

5 female

6 female

Introduction (cont.)
• c. you pick which category gives a value of 1 and which category gives a value of 0
• EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males

OBS # GENDER

1 male 0

2 male 0

3 female 1

4 male

5 female

6 female

Introduction (cont.)
• c. you pick which category gives a value of 1 and which category gives a value of 0
• EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males

OBS # GENDER

1 male 0

2 male 0

3 female 1

4 male 0

5 female 1

6 female 1

Only Use 0 & 1 Values
• Only use 0 & 1 values
• Never use 1, 2, 3,… (for example)
• Why not?
• 2 is how many times bigger than 1? (2/1)
• 3 is how many times bigger than 2? (3/2)
• 1 is how many times bigger than 0? (1/0)
IV. Binary Variables Change Intercept-Two Categories
• A. Intercept term changes according to the two values of the one binary variable
• intercept is one value when D = 0
• intercept is different value when D = 1
• B. Use only ONE binary variable per variable with two categories
Binary Variables Change Intercept-Two Categories (cont.)
• C. In model Y =  + X2 + 
• 1. for same value of X,

Y (in group #1) not = Y(in group #2)

Male: when X2 = 16, Y = 34

Female: when X2 = 16, Y = 29

• 2. Since X is the same value for both groups,
• a) either  or  must be different to cause

Y (in group #1) not = Y(in group #2)

Binary Variables Change Intercept-Two Categories (cont.)
• D. Different cases
• 1.  differs between groups OR
• 2.  differs between groups OR
• 3. both and  differ between groups

Y =  + X2 + 

Binary Variables Change Intercept-Two Categories (cont.)
• E. In the model Y =  + X2 + 
• 1. Y: profits per NBA team (\$1,000,000s)
• 2. X2: wins per season
• 3. Suppose teams in large markets make more profit on same number of wins than teams in other markets
Binary Variables Change Intercept-Two Categories (cont.)
• 4. How account for this in model?
• 5. D is the binary variable
• a) D = 1 if the team is in a large market
• b) D = 0 if the team is in a mid-sized or small market
Binary Variables Change Intercept-Two Categories (cont.)
• D = 1 if the team is in a large market
• D = 0 if the team is not
•  = 0 + 1D
• Y =  + X2 + 
• Y = 0 + 1D + X2 + 
Students
• Write Y = 0 + 1D + X2 + 

for two cases:

• mid or small market (D = 0)
• large market (D = 1)
Binary Variables Change Intercept-Two Categories (cont.)
• 7. Y = 0 + 1D + X2 + 
• 8. mid/small: Y = 0 + X2 +  (D=0)
• 9. large: Y = (0 + 1) + X2 +  (D=1)
• 10. What differs between 2 models?
Changing Intercept

Profits per team

Large market:

Y = (0 + 1)+ X2 + 

Mid/small market:

Y = 0 + X2 + 

1

0

Wins

per team

(assuming 1 > 0)

1 : 3 Equivalent Meanings
• 13. 1 shows change in intercept relative to control group
• 14. 1 shows change in intercept due to change in market size
• 15. 1 measures difference in profits for same number of wins between teams in large markets vs. those in other markets
Changing Intercept

Profits per team

Large market: (LA Clippers)

Y = (0 + 1)+ X2 + 

Mid/small market

(SD Clippers)

Y = 0 + X2 + 

\$16.5M

What’s value of 1?

\$0.5M

1

0

Wins

per team

50

1 measures difference in profits for same number of wins between teams in large markets vs. those in other markets

Binary Variables Change Intercept-Two Categories (cont.)
• 16. Comparison group or control group
• a) Group for which binary variable = 0
• 17. Who decides which group is control group?
• a) You do
• b) It doesn’t matter statistically
• c) Remember which group is control when interpret results
Binary Variables Change Intercept-Two Categories (cont.)
• 18. Hypothesis Test (Y = 0 + 1D + X2 +  )
• a) H0: no difference in Y (for same X) between markets OR
• b) H0: 1 = 0 Both: Y = 0 + X2 + 
• c) HA: is difference in Y (for same X) between markets OR
• d) HA: 1  0 Large: Y = (0 + 1)+ X2 + 

Mid/small: Y = 0 + X2 + 

• e) What test statistic use?
Binary Variables Change Intercept-Two Categories (cont.)
• F. Example

See regression output for binary variables as IVs – case #1A. (note)

Binary Variables Change Intercept-Two Categories (cont.)
• How interpret p-value on LARGE in model?
• Interpret coefficient on LARGE. (see note p.)
• a) "Between 2 teams with same number of wins, the one in the large market is expected to earn \$ ??? more (or less?) ”

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Changing Intercept

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

Coefficient Interpretation Exercise
• SEE DRAWING
• Questions repeated on next three slides
• Q1: Interpret the number –8.339
• Q2: Interpret the number 0.282
• Q3: Interpret the number 16.524
Changing Intercept

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q1: Interpret the number -8.340

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

Changing Intercept

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q2: Interpret the number 0.282

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

Changing Intercept

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q3: Interpret the number 16.524

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

Review
• F. Example
• A. PRICE = 1 + 2SQFT + 
• IGNORE OTHER IVs for this example
• 2. Add POOL to model
• a) POOL = 1 if house has pool
• b) POOL = 0 otherwise
Review (cont.)

A. PRICE = 1 + 2SQFT + 

B. PRICE = 1 + 2SQFT + 5POOL + 

• ESTIMATE MODEL B
Review (cont.)

Variable Model B

CONSTANT 22.673

(0.09)

SQFT 0.1444

(0.001)

POOL 52.790

(0.03)

Review (cont.)
• How interpret p-value on POOL in Model B?
• Interpret coefficient on POOL. (see note p.)
• a) "Between 2 houses of same size, the one with a pool is expected to sell for \$

more (or less?) ”

52,790

Review (cont.)
• Estimated model:

PRICE = 22.673 + 0.1444SQFT +52.79POOL

No Pool: (POOL=0)

PRICE = 22.673 + 0.1444SQFT

With Pool: (POOL=1)

PRICE = 22.673 + 0.1444SQFT +52.79*1

= (22.673+ 52.79 ) + 0.1444SQFT

Review (cont.)
• What’s price for 1000 sq. ft house . . .
• And NO pool?

(notes page)

• WITH pool?
Review (cont.)

PRICE

Model F:

with POOL

Model F:

no POOL

0.1444

52.790

22.673

SQFT

Q2: Interpret the number 52.790

Exercise

Binary Variables #3

Suppose reverse 0 & 1 cases for POOL:

POOL = 1 if NO pool

= 0 if HAVE pool

PRICE = 1 + 2SQFT + 5POOL + 

ESTIMATE THIS MODEL

One-Minute Essay Response

One-Minute Essay Response
• Estimated model:

PRICE = 75.463 + 0.1444SQFT - 52.79POOL

HAVE Pool: (POOL=0)

PRICE = + 0.1444SQFT

NO Pool: (POOL=1)

PRICE = 75.463 + 0.1444SQFT - 52.79*1

= + 0.1444SQFT

=+ 0.1444SQFT

75.463

75.463 - 52.79

22.673

V. Binary Variables Change Intercept-Many Categories
• A. Intercept term changes according to the two values of each binary variable
• B. Use MULTIPLE binary variables for variable with many categories
• C. Contrast with “two categories” case above
Binary Variables Change Intercept-Many Categories (cont.)
• D. In model Y =  + X2 + 
• 1. Y: profits per NBA team (\$1,000,000s)
• 2. X2: wins per season
Binary Variables Change Intercept-Many Categories (cont.)
• 3. Expect profit level per team to differ (for same wins) across different size markets
• Probably expect profits to rise as market size increases (for same number of wins)
• 4. How account for this in model?
Binary Variables Change Intercept-Many Categories (cont.)
• 6. L & M are the binary variables
• a) L = 1 if team in large market
• L = 0 otherwise
• b) M = 1 if team in mid-sized market
• M = 0 otherwise
• c) Which market size is the control

group?

• 7.  = 0 + 1L + 2M

Team L MType

1 0 0 small

2 1 0 large

3 0 1 mid

4 1 0 large

5 0 1 mid

6 0 0 small

Students

Write Y = 0 + 1L + 2M + X2 + 

for three cases:

• Small (L= 0, M = 0)
• Medium (L = 0, M = 1)
• Large (L = 1, M = 0)
Binary Variables Change Intercept-Many Categories (cont.)
• 8. Y = 0 + 1L + 2M+ X2 + 
• 9. Small
• a) L = 0; M = 0
• b) Y = 0 + X2 + 
• 10. Large
• a) L = 1; M = 0
• b) Y = (0 + 1) + X2 + 
Binary Variables Change Intercept-Many Categories (cont.)
• 11. Mid-sized
• a) L = 0; M = 1
• b) Y = (0 + 2) + X2 + 

Y = 0 + X2 +  small

Y = (0 + 1) + X2 + large

Y = (0 + 2) + X2 + mid

• 12. What differs among the three models?
• 13. 1 & 2 show changes in intercept relative to control group
1 & 2 : Equivalent Meanings
• 14. 1 & 2 show changes in intercept due to change in market size
• 15. 1 measures difference in Y between large and small market teams
• 16. 2 measures difference in Y between mid-sized and small market teams

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs – case #1B.

Binary Variables Change Intercept-Many Categories (cont.)
• E. Dummy Variable Trap
• 1. Notice:
• a) One variable with two categories
• (1) Use one binary variable
• b) One variable with three categories
• (1) Use two binary variables

(note: despite the fact that we’re using “binary”, this still uses the word “dummy”)

Binary Variables Change Intercept-Many Categories (cont.)
• 2. What\'s the rule for how many binary variables to create?
• 3.

Use one less binary variable than the

number of categories

Binary Variables Change Intercept-Many Categories (cont.)

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

• TOP=1 if top level mgmt, 0 if not
• SENIOR=1 if senior level mgmt , 0 if not
• ENTRY=1 if entry level , 0 if not
• NODEGREE=1 if no degree , 0 if not
• PROFCERT=1 if hold professional certification , 0 if not
• MALE=1 if male , 0 if not
• YEARS=years of experience
Binary Variables Change Intercept-Many Categories (cont.)

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

• Male workers earn, on average, ?? more (or less?) than females.
• An advanced degree is worth ??% more (or less?) than what education level?
• A professional certification is worth ??% more (or less?) than what?
VI. Binary Variables Change Slope-Two Categories
• A. Slope changes according to the two values of the one binary variable
• B. Use only ONE binary variable per variable with two categories
Binary Variables Change Slope-Two Categories (cont.)
• C. In the model Y =  + X2 + 
• 1. Y: profits per NBA team (\$1,000,000s)
• 2. X2: wins per season
• 3. Suppose teams in large markets make more profit on each additional win than teams in other markets
• a) How does this differ from intercept shifting case?
Binary Variables Change Slope-Two Categories (cont.)
• C. In the model Y =  + X2 + 
• Suppose teams in large markets make more profit on each additional win than teams in other markets

Change InterceptChange Slope

Different Total profit Different Extra profit

from from

Total no. of wins One MORE win

Binary Variables Change Slope-Two Categories (cont.)
• 4. How account for this in model?
• 5. D is the binary variable
• a) D = 1 if the team is in a large market
• b) D = 0 if the team is in a mid-sized or small market
• 6.  = 0 + 1D
Binary Variables Change Slope-Two Categories (cont.)
• Recall: Y =  + X2 + 

 = 0 + 1D

• 7. Y =  + (0 + 1D)X2 + 
• 8. Y =  + 0X2 + 1DX2 + 

ESTIMATE MODEL 8. ABOVE

Binary Variables Change Slope (cont.)
• recall: Y =  + 0X2 + 1DX2 + 
• mid/small: Y =  + 0X2 +  (D=0)
• large: Y =  + (0 + 1) X2+  (D=1)
• large: Y =  + * X2+  (D=1)
• 13. What differs between 2 models?
Changing Slope

Large market:

Y =  + (0 + 1) X2 + 

Profits per team

Mid/small market:

Y = 0 + 0X2 + 

Wins per team

assuming 1 > 0

1 : 3 Equivalent Meanings
• 15. 1 shows change in slope relative to control group
• 16. 1 shows change in slope due to change in market size
• 17. 1 measures difference in profits from each extra win between teams in different size markets, on average
Changing Slope

Large market:

Y =  + (0 + 1) X2 + 

Profits per team

Mid/small market:

Y = 0 + 0X2 + 

0 + 1

0

Wins per team

1 measures difference in profits from each extra win between teams in different size markets, on average

Large market:

Y =  + (0 + 1) X2 + 

Profits per team

Mid/small market:

Y = 0 + 0X2 + 

0

One extra win

Wins per team

Changing Slope

0 + 1

1 measures difference in profits from each extra win between teams in different size markets, on average

Large market:

Y =  + (0 + 1) X2 + 

Profits per team

Mid/small market:

Y = 0 + 0X2 + 

246

One extra win

Wins per team

Changing Slope

556

1 measures difference in profits from each extra win between teams in different size markets, on average

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs – case #2.

Binary Variables Change Slope-Two Categories (cont.)

Y =  + 0X2 + 1DX2+ 

Y =  + 0X2 + 

• 18. Hypothesis Test
• a) H0: no difference in extra profit (per extra win) between markets
• b) H0: 1 = 0
• c) HA: is difference in extra profit (per extra win) between markets
• d) H0: 1 not= 0
• e) What test statistic use?
Binary Variables Change Slope-Two Categories (cont.)
• D. Exercise (notes page )
• 1. Suppose you estimate

Y =  + 0X2 + 1DX2 + and . .

• 2. 1-hat = .310
• 3. 0-hat = .246
• 4. What is value of slope for large market teams?
• 5. What is interpretation of each number?
• A. Intercept Different Across Qualitative Variable\'s (Political Party, Gender,...) Categories
• 1. This happens whenever the relationship between Y and X (in ) unchanged for different categories of the qualitative variable, yet Y differs, even for the same values of X.
Intercept or Slope Vary? (cont.)
• 2. Examples
• Profit for same total number of wins differs across market size
• Earnings for females less than earnings for males even with same years of schooling
Intercept or Slope Vary? (cont.)
• B. Slope Coefficient Different Across Qualitative Variable\'s Categories
• 1. This happens whenever the relationship between Y and X (in ) differs for different categories of the qualitative variable and causes Y to differ, even for the same values of X.
Intercept or Slope Vary? (cont.)
• 2. Example
• Profit from each further win differs across market size
• It’s plausible (likely?) that an additional year of experience yields less extra earnings for females than for males
Intercept or Slope Vary? (cont.)
• C. How know which is correct?
• 1. Possibilities
• a) intercept binary variable correct
• b) slope binary variable correct
• c) both
• d) neither
• 2. Try and let tests tell you
Exercise
• For each case below, state whether you would use a slope or intercept dummy. (NOTES)
• 1. Southern workers earn less than those in other regions of the country even with same years of schooling & experience
• 2. An additional year of education yields less extra earnings for single males than for married males
• 3. More swimsuits are bought in Minnesota during the summer than during the winter
• 4. Each extra year of experience yields less additional earnings for female coaches than for males
Exercise

Binary Variables #2

Exercise