- 55 Views
- Uploaded on
- Presentation posted in: General

I.Qualitative (or Dummy) Independent Variables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

?

- I was taught to call these variables “binary variables”
- I now believe that “binary variables” is a better (more descriptive) name
- So, let’s call them “binary variables” from now on

- This page was intentionally left blank

- A.New type of variable
- 1. Past: used quantitative variables (numerically measurable); continuous
- 2. Now: variables that take small number of values; discrete
- a)Gender
- b)Market size
- c)Region of country
- d)Marital status (married vs. not), etc

- B.Used as IV in this section
- C.Used as DV later in course

- Institute of Management Accounts (IMA) publishes an annual Salary Guide
- In Strategic Finance magazine
- Annual survey of members
- “…based on a regression equation derived from survey results.”

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

- TOP=1 if top level mgmt, 0 if not
- SENIOR=1 if senior level mgmt , 0 if not
- ENTRY=1 if entry level , 0 if not
- ADVDEGREE=1 if advanced degree , 0 if not
- NODEGREE=1 if no degree , 0 if not
- PROFCERT=1 if hold professional certification , 0 if not
- MALE=1 if male , 0 if not
- YEARS=years of experience

- Average IMA member (1999)
- Male
- 14.5 years experience
- Professional certification
- Salary = $66,356
- Figure obtained from substituting values into regression equation

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs. (note)

- D.Example #1
- 1. Y = + X2 +
- 2. Y: social program expenditures per state
- 3. X2: state’s total revenue
- 4. Suppose states’ legislatures controlled by Democrats spend more from same revenue than those controlled by Republicans
- 5. How account for this in model?
- 6. What’s the categorical variable?

- E.Example #2
- 1. Y = + X2 +
- 2. Y: coach’s earnings
- 3. X2: coach’s experience
- 4. Suppose women earn less than men with equal experience (& other characteristics)
- 5. How account for this in model?
- 6. What’s the categorical variable?

- F.Example #3
- 1. Y = + X2 +
- 2. Y: sales of swimsuits in Minnesota
- 3. X2: Minnesota’s population
- 4. Suppose sales peak in warm months
- 5. How account for this in model?
- 6. What’s the categorical variable?

- G.Example #4
- 1. Y = + X2 +
- 2. Y: profits of NBA teams
- 3. X2: wins
- 4. Suppose teams in large markets make more profit on their wins than teams in other markets
- 5. How account for this in model?
- 6. What’s the categorical variable?

- G.Will use Binary (or Dummy) Independent Variables
- 1. Create a special variable that takes a value of
- a) if the unit of observation falls into one category
- b) if the unit falls into the other category

- 1. Create a special variable that takes a value of

1

0

A. A MAN NAMED “ALFRED DUMMY” INVENTED THEM

C. THEY REPRESENT CATEGORICAL VARIABLES

B. ANYONE WHO USES THEM IS. . . A DUMMY

?

- 2. Example
- a)GENDER = 1 for all females in the sample
- b)GENDER = 0 for all males

- c. you pick which category gives a value of 1 and which category gives a value of 0
- EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males
OBS #GENDER

1male

2male

3female

4male

5female

6female

- c. you pick which category gives a value of 1 and which category gives a value of 0
- EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males
OBS #GENDER

1male 0

2male

3female

4male

5female

6female

- c. you pick which category gives a value of 1 and which category gives a value of 0
- EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males
OBS #GENDER

1male 0

2male 0

3female 1

4male

5female

6female

- c. you pick which category gives a value of 1 and which category gives a value of 0
- EXAMPLE: the variable GENDER = 1 for all females in the sample and GENDER = 0 for all males
OBS #GENDER

1male 0

2male 0

3female 1

4male 0

5female 1

6female 1

- Only use 0 & 1 values
- Never use 1, 2, 3,… (for example)
- Why not?
- 2 is how many times bigger than 1? (2/1)
- 3 is how many times bigger than 2? (3/2)
- 1 is how many times bigger than 0? (1/0)

- A.Intercept term changes according to the two values of the one binary variable
- intercept is one value when D = 0
- intercept is different value when D = 1

- B.Use only ONE binary variable per variable with two categories

- C.In model Y = + X2 +
- 1. for same value of X,
Y (in group #1) not = Y(in group #2)

Male: when X2 = 16, Y = 34

Female: when X2 = 16, Y = 29

- 2. Since X is the same value for both groups,
- a)either or must be different to cause
Y (in group #1) not = Y(in group #2)

- a)either or must be different to cause

- 1. for same value of X,

- D.Different cases
- 1. differs between groups OR
- 2. differs between groups OR
- 3. both and differ between groups
Y = + X2 +

- E.In the model Y = + X2 +
- 1. Y: profits per NBA team ($1,000,000s)
- 2. X2: wins per season
- 3. Suppose teams in large markets make more profit on same number of wins than teams in other markets

- 4. How account for this in model?
- 5. D is the binary variable
- a)D = 1 if the team is in a large market
- b)D = 0 if the team is in a mid-sized or small market

- D = 1 if the team is in a large market
- D = 0 if the team is not

- Write Y = 0 + 1D + X2 +
for two cases:

- mid or small market (D = 0)
- large market (D = 1)

- 7. Y = 0 + 1D + X2 +
- 8. mid/small: Y = 0 + X2 + (D=0)
- 9. large: Y = (0 + 1) + X2 + (D=1)
- 10. What differs between 2 models?

Profits per team

Large market:

Y = (0 + 1)+ X2 +

Mid/small market:

Y = 0 + X2 +

1

0

Wins

per team

(assuming 1 > 0)

- 13. 1 shows change in intercept relative to control group
- 14. 1 shows change in intercept due to change in market size
- 15. 1 measures difference in profits for same number of wins between teams in large markets vs. those in other markets

Profits per team

Large market: (LA Clippers)

Y = (0 + 1)+ X2 +

Mid/small market

(SD Clippers)

Y = 0 + X2 +

$16.5M

What’s value of 1?

$0.5M

1

0

Wins

per team

50

1 measures difference in profits for same number of wins between teams in large markets vs. those in other markets

- 16. Comparison group or control group
- a)Group for which binary variable = 0

- 17. Who decides which group is control group?
- a)You do
- b)It doesn’t matter statistically
- c)Remember which group is control when interpret results

- 18. Hypothesis Test (Y = 0 + 1D + X2 + )
- a)H0: no difference in Y (for same X) between markets OR
- b)H0: 1 = 0 Both: Y = 0 + X2 +
- c)HA: is difference in Y (for same X) between markets OR
- d)HA: 1 0 Large: Y = (0 + 1)+ X2 +
Mid/small: Y = 0 + X2 +

- e)What test statistic use?

- F.Example

See regression output for binary variables as IVs – case #1A. (note)

- How interpret p-value on LARGE in model?
- Interpret coefficient on LARGE. (see note p.)
- a) "Between 2 teams with same number of wins, the one in the large market is expected to earn $ ??? more (or less?) ”

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

- SEE DRAWING
- Questions repeated on next three slides
- Q1: Interpret the number –8.339
- Q2: Interpret the number 0.282
- Q3: Interpret the number 16.524

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q1: Interpret the number -8.340

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q2: Interpret the number 0.282

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

PROFIT

PROFIT = -8.339 + 0.282 WINS + 16.524 LARGE

Q3: Interpret the number 16.524

LARGE market

0.282

MID/SMALL market

16.524

WINS

-8.340

- F.Example
- A. PRICE = 1 + 2SQFT +
- IGNORE OTHER IVs for this example
- 2. Add POOL to model
- a) POOL = 1 if house has pool
- b) POOL = 0 otherwise

A. PRICE = 1 + 2SQFT +

B. PRICE = 1 + 2SQFT + 5POOL +

- ESTIMATE MODEL B

Variable Model B

CONSTANT 22.673

(0.09)

SQFT 0.1444

(0.001)

POOL 52.790

(0.03)

Adj. R2 0. 890

- How interpret p-value on POOL in Model B?
- Interpret coefficient on POOL. (see note p.)
- a) "Between 2 houses of same size, the one with a pool is expected to sell for $
more (or less?) ”

52,790

- Estimated model:
PRICE = 22.673 + 0.1444SQFT +52.79POOL

No Pool: (POOL=0)

PRICE = 22.673 + 0.1444SQFT

With Pool: (POOL=1)

PRICE = 22.673 + 0.1444SQFT +52.79*1

= (22.673+ 52.79 ) + 0.1444SQFT

- What’s price for 1000 sq. ft house . . .
- And NO pool?
(notes page)

- WITH pool?

PRICE

Model F:

with POOL

Model F:

no POOL

0.1444

52.790

22.673

SQFT

Q2: Interpret the number 52.790

Binary Variables #3

Suppose reverse 0 & 1 cases for POOL:

POOL = 1 if NO pool

= 0 if HAVE pool

PRICE = 1 + 2SQFT + 5POOL +

ESTIMATE THIS MODEL

One-Minute Essay Response

- Estimated model:
PRICE = 75.463 + 0.1444SQFT - 52.79POOL

HAVE Pool: (POOL=0)

PRICE = + 0.1444SQFT

NO Pool: (POOL=1)

PRICE = 75.463 + 0.1444SQFT - 52.79*1

= + 0.1444SQFT

=+ 0.1444SQFT

75.463

75.463 - 52.79

22.673

- A.Intercept term changes according to the two values of each binary variable
- B.Use MULTIPLE binary variables for variable with many categories
- C.Contrast with “two categories” case above

- D.In model Y = + X2 +
- 1. Y: profits per NBA team ($1,000,000s)
- 2. X2: wins per season

- 3. Expect profit level per team to differ (for same wins) across different size markets
- Probably expect profits to rise as market size increases (for same number of wins)

- 4. How account for this in model?

- 6. L & M are the binary variables
- a)L = 1 if team in large market
- L = 0 otherwise
- b)M = 1 if team in mid-sized market
- M = 0 otherwise
- c)Which market size is the control
group?

- 7. = 0 + 1L + 2M

TeamL MType

100small

210large

301mid

410large

501mid

600small

Write Y = 0 + 1L + 2M + X2 +

for three cases:

- Small (L= 0, M = 0)
- Medium (L = 0, M = 1)
- Large (L = 1, M = 0)

- 8. Y = 0 + 1L + 2M+ X2 +
- 9. Small
- a)L = 0; M = 0
- b)Y = 0 + X2 +

- 10. Large
- a)L = 1; M = 0
- b)Y = (0 + 1) + X2 +

- 11. Mid-sized
- a)L = 0; M = 1
- b)Y = (0 + 2) + X2 +
Y = 0 + X2 + small

Y = (0 + 1) + X2 + large

Y = (0 + 2) + X2 + mid

- 12. What differs among the three models?
- 13. 1 & 2 show changes in intercept relative to control group

- 14. 1 & 2 show changes in intercept due to change in market size
- 15. 1 measures difference in Y between large and small market teams
- 16. 2 measures difference in Y between mid-sized and small market teams

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs – case #1B.

- E.Dummy Variable Trap
- 1. Notice:
- a)One variable with two categories
- (1)Use one binary variable

- b)One variable with three categories
- (1)Use two binary variables
(note: despite the fact that we’re using “binary”, this still uses the word “dummy”)

- (1)Use two binary variables

- a)One variable with two categories

- 2. What's the rule for how many binary variables to create?
- 3.

Use one less binary variable than the

number of categories

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

- TOP=1 if top level mgmt, 0 if not
- SENIOR=1 if senior level mgmt , 0 if not
- ENTRY=1 if entry level , 0 if not
- ADVDEGREE=1 if advanced degree , 0 if not
- NODEGREE=1 if no degree , 0 if not
- PROFCERT=1 if hold professional certification , 0 if not
- MALE=1 if male , 0 if not
- YEARS=years of experience

SALARY = 35,491 + 18393TOP + 8392SENIOR – 10615ENTRY +914YEARS +10975ADVDEGREE – 8684NODEGREE + 9195PROFCERT + 8417MALE

- Male workers earn, on average, ?? more (or less?) than females.
- An advanced degree is worth ??% more (or less?) than what education level?
- A professional certification is worth ??% more (or less?) than what?

- A.Slope changes according to the two values of the one binary variable
- B.Use only ONE binary variable per variable with two categories

- C.In the model Y = + X2 +
- 1. Y: profits per NBA team ($1,000,000s)
- 2. X2: wins per season
- 3. Suppose teams in large markets make more profit on each additional win than teams in other markets
- a)How does this differ from intercept shifting case?

- C.In the model Y = + X2 +
- Suppose teams in large markets make more profit on each additional win than teams in other markets
Change InterceptChange Slope

Different Total profitDifferent Extra profit

from from

Total no. of winsOne MORE win

- Suppose teams in large markets make more profit on each additional win than teams in other markets

- 4. How account for this in model?
- 5. D is the binary variable
- a)D = 1 if the team is in a large market
- b)D = 0 if the team is in a mid-sized or small market

- 6. = 0 + 1D

- Recall: Y = + X2 +
= 0 + 1D

- 7. Y = + (0 + 1D)X2 +
- 8. Y = + 0X2 + 1DX2 +
ESTIMATE MODEL 8. ABOVE

- recall: Y = + 0X2 + 1DX2 +
- mid/small: Y = + 0X2 + (D=0)
- large: Y = + (0 + 1) X2+ (D=1)
- large: Y = + * X2+ (D=1)
- 13. What differs between 2 models?

Large market:

Y = + (0 + 1) X2 +

Profits per team

Mid/small market:

Y = 0 + 0X2 +

Wins per team

assuming 1 > 0

- 15. 1 shows change in slope relative to control group
- 16. 1 shows change in slope due to change in market size
- 17. 1 measures difference in profits from each extra win between teams in different size markets, on average

Large market:

Y = + (0 + 1) X2 +

Profits per team

Mid/small market:

Y = 0 + 0X2 +

0 + 1

0

Wins per team

1 measures difference in profits from each extra win between teams in different size markets, on average

Large market:

Y = + (0 + 1) X2 +

Profits per team

Mid/small market:

Y = 0 + 0X2 +

0

One extra win

Wins per team

0 + 1

1 measures difference in profits from each extra win between teams in different size markets, on average

Large market:

Y = + (0 + 1) X2 +

Profits per team

Mid/small market:

Y = 0 + 0X2 +

246

One extra win

Wins per team

556

1 measures difference in profits from each extra win between teams in different size markets, on average

Are Wins Worth More in a Large Market?

See regression output for binary variables as IVs – case #2.

Y = + 0X2 + 1DX2+

Y = + 0X2 +

- 18. Hypothesis Test
- a)H0: no difference in extra profit (per extra win) between markets
- b)H0: 1 = 0
- c)HA: is difference in extra profit (per extra win) between markets
- d)H0: 1 not= 0
- e)What test statistic use?

- D.Exercise (notes page )
- 1. Suppose you estimate
Y = + 0X2 + 1DX2 + and . .

- 2. 1-hat = .310
- 3. 0-hat = .246
- 4. What is value of slope for large market teams?
- 5. What is interpretation of each number?

- 1. Suppose you estimate

- A.Intercept Different Across Qualitative Variable's (Political Party, Gender,...) Categories
- 1. This happens whenever the relationship between Y and X (in ) unchanged for different categories of the qualitative variable, yet Y differs, even for the same values of X.

- 2. Examples
- Profit for same total number of wins differs across market size
- Earnings for females less than earnings for males even with same years of schooling

- B.Slope Coefficient Different Across Qualitative Variable's Categories
- 1. This happens whenever the relationship between Y and X (in ) differs for different categories of the qualitative variable and causes Y to differ, even for the same values of X.

- 2. Example
- Profit from each further win differs across market size
- It’s plausible (likely?) that an additional year of experience yields less extra earnings for females than for males

- C.How know which is correct?
- 1. Possibilities
- a)intercept binary variable correct
- b)slope binary variable correct
- c)both
- d)neither

- 2. Try and let tests tell you

- 1. Possibilities

- For each case below, state whether you would use a slope or intercept dummy. (NOTES)
- 1. Southern workers earn less than those in other regions of the country even with same years of schooling & experience
- 2. An additional year of education yields less extra earnings for single males than for married males
- 3. More swimsuits are bought in Minnesota during the summer than during the winter
- 4. Each extra year of experience yields less additional earnings for female coaches than for males

Binary Variables #2

- “The Advertising Experiment”

Test Next Class

- 8-10 questions - most have several parts
- some: calculations
- some: interpret regression results
- some: show you understand a concept
- bring calculator – NO CELL PHONES!
- bring 5 x 8 card