Categorical Variables - PowerPoint PPT Presentation

Categorical variables
1 / 17

  • Uploaded on
  • Presentation posted in: General

Categorical Variables. Categorical Variables. Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Categorical Variables

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Categorical variables

Categorical Variables

Categorical variables1

Categorical Variables

  • Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.

  • Common examples are: time periods in which there is a price surge or bubble; months of the year; days of the week; gender; educational level.

Salary data example

Salary Data Example



  • Salary is the dependent variable that is to be estimated.

  • Executive is a categorical variable that has only two values: 0 – represents that the employee is not an executive; 1 – represents that the employee is an executive.

  • A linear regression model may be constructed:Salary = a + b * Executive

Executive model results

Executive Model Results

Salary = a + b * Executive

Highly Significant

Average Salary for non-Executive:


Average Salary for Executive:


Categorical variables effect

Categorical Variables Effect

  • The effect of a categorical variable is to add an additional constant amount to the y-intercept for the subset of points included in the category.

  • Graphically, it creates a separate regression line for each category. The slope of the line is constant but the y-intercepts vary.

Executive variable effect

Executive Variable Effect

Gender model results

Gender Model Results

Salary = a + b * Gender

Not Significant

Average Salary for 0-Gender:


Average Salary for 1-Gender:


Gender variable effect

Gender Variable Effect

Education model results

Education Model Results

Salary = a + b * Education


Average Salary for 0 Education:


Additional Value per year:


Education variable effect

Education Variable Effect

Education variable

Education Variable

  • The Education variable has multiple values: 0, 2, 4, 6, 8. This variable was used directly in the regression estimation. Although it appeared categorical, it was in fact used as a numerical variable.

  • Implicit in the use of any explanatory variable is that its effect is linearly increasing or decreasing. For the education variable, this would mean that the effect on Salary of having a two-year degree would be exactly ½ of the effect of having a four-year degree.

  • This linearity may be questionable.

Education results

Education Results

  • Linearity would imply “incorrectly” that dropping out after three years of college, in salary terms, would result in a loss of “only” $8448, compared with finishing one’s Bachelor’s degree.

  • But, since degrees are really “0” and “1”, a better approach is to consider each level of degree as a separate categorical variable.

Constructed education categories

Constructed Education Categories

  • If the linearity of a limited value variable is questionable, then the variable may be better modeled by constructing a series of indicator or dummy variables that each represents exactly one value: Education0, Education2, Education4, Education6, Education8. In this way, the effect of each level can be considered independently.

  • This technique frequently occurs with time variables, i.e. months. One should not implictly assume that the monthly effect in December (12) is 12 times as large as the monthly effect in January (10.

Education results1

Education Results

Results discussion

Results Discussion

  • The previous results illustrate some values, but also obscure other values.

  • The results show that having less than a Bachelor’s degree has a significant $30,000 effect on average salary at this company.

  • The lack of statistical significance for the Bachelor’s degree and Master’s degree variables obscures the fact that it is unreasonable to use these variables alone since it would conflate the salaries for Ph.D.’s with the salaries of those with no college, when it is clear that at this company the two groups do not earn anything near the same salary.

A peek ahead

A Peek Ahead

  • Login