Categorical variables
1 / 17

Categorical Variables - PowerPoint PPT Presentation

  • Uploaded on

Categorical Variables. Categorical Variables. Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Categorical Variables' - bridie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Categorical variables1
Categorical Variables

  • Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.

  • Common examples are: time periods in which there is a price surge or bubble; months of the year; days of the week; gender; educational level.


  • Salary is the dependent variable that is to be estimated.

  • Executive is a categorical variable that has only two values: 0 – represents that the employee is not an executive; 1 – represents that the employee is an executive.

  • A linear regression model may be constructed:Salary = a + b * Executive

Executive model results
Executive Model Results

Salary = a + b * Executive

Highly Significant

Average Salary for non-Executive:


Average Salary for Executive:


Categorical variables effect
Categorical Variables Effect

  • The effect of a categorical variable is to add an additional constant amount to the y-intercept for the subset of points included in the category.

  • Graphically, it creates a separate regression line for each category. The slope of the line is constant but the y-intercepts vary.

Gender model results
Gender Model Results

Salary = a + b * Gender

Not Significant

Average Salary for 0-Gender:


Average Salary for 1-Gender:


Education model results
Education Model Results

Salary = a + b * Education


Average Salary for 0 Education:


Additional Value per year:


Education variable
Education Variable

  • The Education variable has multiple values: 0, 2, 4, 6, 8. This variable was used directly in the regression estimation. Although it appeared categorical, it was in fact used as a numerical variable.

  • Implicit in the use of any explanatory variable is that its effect is linearly increasing or decreasing. For the education variable, this would mean that the effect on Salary of having a two-year degree would be exactly ½ of the effect of having a four-year degree.

  • This linearity may be questionable.

Education results
Education Results

  • Linearity would imply “incorrectly” that dropping out after three years of college, in salary terms, would result in a loss of “only” $8448, compared with finishing one’s Bachelor’s degree.

  • But, since degrees are really “0” and “1”, a better approach is to consider each level of degree as a separate categorical variable.

Constructed education categories
Constructed Education Categories

  • If the linearity of a limited value variable is questionable, then the variable may be better modeled by constructing a series of indicator or dummy variables that each represents exactly one value: Education0, Education2, Education4, Education6, Education8. In this way, the effect of each level can be considered independently.

  • This technique frequently occurs with time variables, i.e. months. One should not implictly assume that the monthly effect in December (12) is 12 times as large as the monthly effect in January (10.

Results discussion
Results Discussion

  • The previous results illustrate some values, but also obscure other values.

  • The results show that having less than a Bachelor’s degree has a significant $30,000 effect on average salary at this company.

  • The lack of statistical significance for the Bachelor’s degree and Master’s degree variables obscures the fact that it is unreasonable to use these variables alone since it would conflate the salaries for Ph.D.’s with the salaries of those with no college, when it is clear that at this company the two groups do not earn anything near the same salary.