- 114 Views
- Uploaded on
- Presentation posted in: General

Categorical Variables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Categorical Variables

- Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.
- Common examples are: time periods in which there is a price surge or bubble; months of the year; days of the week; gender; educational level.

- Salary is the dependent variable that is to be estimated.
- Executive is a categorical variable that has only two values: 0 – represents that the employee is not an executive; 1 – represents that the employee is an executive.
- A linear regression model may be constructed:Salary = a + b * Executive

Salary = a + b * Executive

Highly Significant

Average Salary for non-Executive:

$37,514

Average Salary for Executive:

$90,601

- The effect of a categorical variable is to add an additional constant amount to the y-intercept for the subset of points included in the category.
- Graphically, it creates a separate regression line for each category. The slope of the line is constant but the y-intercepts vary.

Salary = a + b * Gender

Not Significant

Average Salary for 0-Gender:

$52,126

Average Salary for 1-Gender:

$43,646

Salary = a + b * Education

Significant

Average Salary for 0 Education:

$11,656

Additional Value per year:

$8,448

- The Education variable has multiple values: 0, 2, 4, 6, 8. This variable was used directly in the regression estimation. Although it appeared categorical, it was in fact used as a numerical variable.
- Implicit in the use of any explanatory variable is that its effect is linearly increasing or decreasing. For the education variable, this would mean that the effect on Salary of having a two-year degree would be exactly ½ of the effect of having a four-year degree.
- This linearity may be questionable.

- Linearity would imply “incorrectly” that dropping out after three years of college, in salary terms, would result in a loss of “only” $8448, compared with finishing one’s Bachelor’s degree.
- But, since degrees are really “0” and “1”, a better approach is to consider each level of degree as a separate categorical variable.

- If the linearity of a limited value variable is questionable, then the variable may be better modeled by constructing a series of indicator or dummy variables that each represents exactly one value: Education0, Education2, Education4, Education6, Education8. In this way, the effect of each level can be considered independently.
- This technique frequently occurs with time variables, i.e. months. One should not implictly assume that the monthly effect in December (12) is 12 times as large as the monthly effect in January (10.

- The previous results illustrate some values, but also obscure other values.
- The results show that having less than a Bachelor’s degree has a significant $30,000 effect on average salary at this company.
- The lack of statistical significance for the Bachelor’s degree and Master’s degree variables obscures the fact that it is unreasonable to use these variables alone since it would conflate the salaries for Ph.D.’s with the salaries of those with no college, when it is clear that at this company the two groups do not earn anything near the same salary.