1 / 21

Categorical Models

Categorical Models. Multivariate Cross Tabulation and Analysis of Variance. Models. A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. Steps in the Process of Quantitative Analysis: Specification of the model

nansen
Download Presentation

Categorical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorical Models Multivariate Cross Tabulation and Analysis of Variance

  2. Models • A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. • Steps in the Process of Quantitative Analysis: • Specification of the model • Estimation of the model • Evaluation of the model

  3. Types of Models • Regression Models • Dependent variable is interval. • Independent variables may be interval or categorical (dummy variables) • ANOVA Models • Dependent variable is interval • Independent variables are categorical • Covariates may be interval

  4. Types of Models, for Today, cont. • Multivariate Cross Tabulated Models • Dependent variable is categorical • Independent variables are categorical • Example: • Remember our bivariate cross tabulations

  5. Let’s think about the issues again…First we simplify a bit… Combining wards 20 and 22, we have three neighborhoods. Frequencies OCC$ (rows) by NEIGH$ (columns) EASTSIDE NW SOUTHSID Total +-------------------------------+ profcler | 55 87 9 | 151 prop | 33 99 27 | 159 skilled | 16 263 90 | 369 skillpart | 12 66 13 | 91 unskilled | 12 117 175 | 304 +-------------------------------+ Total 128 632 314 1074 Chi Square is statistically significant, p <.000

  6. Row and Column Percents • Row percents • OCC$ (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +-------------------------------+ • profcler | 36.424 57.616 5.960 | 100.000 151 • prop | 20.755 62.264 16.981 | 100.000 159 • skilled | 4.336 71.274 24.390 | 100.000 369 • skillpart | 13.187 72.527 14.286 | 100.000 91 • unskilled | 3.947 38.487 57.566 | 100.000 304 • +-------------------------------+ • Total 11.918 58.845 29.236 100.000 • N 128 632 314 1074 • Column percents • OCC$ (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +-------------------------------+ • profcler | 42.969 13.766 2.866 | 14.060 151 • prop | 25.781 15.665 8.599 | 14.804 159 • skilled | 12.500 41.614 28.662 | 34.358 369 • skillpart | 9.375 10.443 4.140 | 8.473 91 • unskilled | 9.375 18.513 55.732 | 28.305 304 • +-------------------------------+ • Total 100.000 100.000 100.000 100.000 • N 128 632 314 1074

  7. We can say… • There are statistically significant differences in the distribution of occupation groups by neighborhood. • Coefficient • Phi 0.515 • Cramer V 0.364 • Contingency Coeff. 0.458

  8. Extend the analysis to examine determinants of home ownership • Is there a relationship between home ownership and neighborhood? (bivariate) • Is there a relationship between home ownership and occupational group? (bivariate) • Is homeownership affected by both one’s neighborhood and one’s occupational group? (multivariate)

  9. Homeownership and Neighborhood • Frequencies • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total • +----------------------------+ • 0 | 31 215 70 | 316 • 1 | 79 409 240 | 728 • +----------------------------+ • Total 110 624 310 1044 • Row percents • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +----------------------------+ • 0 | 9.810 68.038 22.152 | 100.000 316 • 1 | 10.852 56.181 32.967 | 100.000 728 • +----------------------------+ • Total 10.536 59.770 29.693 100.000 • N 110 624 310 1044 • Column percents • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +----------------------------+ • 0 | 28.182 34.455 22.581 | 30.268 316 • 1 | 71.818 65.545 77.419 | 69.732 728 • +----------------------------+ • Total 100.000 100.000 100.000 100.000 • N 110 624 310 1044 • Test statistic Value df Prob • Pearson Chi-square 14.090 2.000 0.001 • Likelihood ratio Chi-square 14.467 2.000 0.001

  10. Homeownership and Occupation Group • Frequencies: OCC$ (rows) by OWN (columns) • 0 1 Total • +---------------------+ • profcler | 56 85 | 141 • prop | 44 106 | 150 • skilled | 125 239 | 364 • skillpart | 28 63 | 91 • unskilled | 63 231 | 294 • +---------------------+ • Total 316 724 1040 • Row percents OCC$ (rows) by OWN (columns) • 0 1 Total N • +---------------------+ • profcler | 39.716 60.284 | 100.000 141 • prop | 29.333 70.667 | 100.000 150 • skilled | 34.341 65.659 | 100.000 364 • skillpart | 30.769 69.231 | 100.000 91 • unskilled | 21.429 78.571 | 100.000 294 • +---------------------+ • Total 30.385 69.615 100.000 • N 316 724 1040 • Column percents OCC$ (rows) by OWN (columns) • 0 1 Total N • +---------------------+ • profcler | 17.722 11.740 | 13.558 141 • prop | 13.924 14.641 | 14.423 150 • skilled | 39.557 33.011 | 35.000 364 • skillpart | 8.861 8.702 | 8.750 91 • unskilled | 19.937 31.906 | 28.269 294 • +---------------------+ • Total 100.000 100.000 100.000 • N 316 724 1040 • Test statistic Value df Prob • Pearson Chi-square 19.731 4.000 0.001 • Likelihood ratio Chi-square 20.159 4.000 0.000

  11. Three Way Table of Homeownership, Occupational Group and Neighborhood: 1 • Frequencies • OCC$ (rows) by NEIGH$ (columns) • OWN = 0 • EASTSIDE NW SOUTHSID Total • +-------------------------------+ • profcler | 11 43 2 | 56 • prop | 12 27 5 | 44 • skilled | 6 97 22 | 125 • skillpart | 2 21 5 | 28 • unskilled | 0 27 36 | 63 • +-------------------------------+ • Total 31 215 70 316 • OWN = 1 • EASTSIDE NW SOUTHSID Total • +-------------------------------+ • profcler | 34 44 7 | 85 • prop | 15 70 21 | 106 • skilled | 10 161 68 | 239 • skillpart | 10 45 8 | 63 • unskilled | 10 86 135 | 231 • +-------------------------------+ • Total 79 406 239 724

  12. Three Way Table of Homeownership, Occupational Group and Neighborhood: 2 • Frequencies: OCC$ (rows) by OWN (columns) • NEIGH$ = EASTSIDE • 0 1 Total • +---------------------+ • profcler | 11 34 | 45 • prop | 12 15 | 27 • skilled | 6 10 | 16 • skillpart | 2 10 | 12 • unskilled | 0 10 | 10 • +---------------------+ • Total 31 79 110 • NEIGH$ = NW • 0 1 Total • +---------------------+ • profcler | 43 44 | 87 • prop | 27 70 | 97 • skilled | 97 161 | 258 • skillpart | 21 45 | 66 • unskilled | 27 86 | 113 • +---------------------+ • Total 215 406 621 • NEIGH$ = SOUTHSID • 0 1 Total • +---------------------+ • profcler | 2 7 | 9 • prop | 5 21 | 26 • skilled | 22 68 | 90 • skillpart | 5 8 | 13 • unskilled | 36 135 | 171 • +---------------------+ • Total 70 239 309

  13. How to Think About the Issues • We can examine how neighborhood affects homeownership, controlling for occupational group • We can examine how occupation group affects homeownership, controlling for neighborhood

  14. Another Example: School Attendance in Milwaukee in 1910 • Data: Federal Census sample from 1910 • Cases: 1500 people organized into households, so information on everyone in the family • 390 cases are school aged children: 5-18 years old • Variables include: school attendance, ethnicity, age, sex, occupation of the household head, and more • We ask how sex, Dad’s occupation and ethnicity affect school attendance

  15. Bivariate Patterns • Sex Total • Female Male • Not in school Count 71 51 122 • % 36.2% 26.3% 31.3% • Yes, in school Count 125 143 268 • % 63.8% 73.7% 68.7% • Total Count 196 194 390 • % 100% 100% 100% • Chi_Square Tests • Value df Asymp. Sig. (2_sided) • Pearson Chi_Square 4.478(b) 1 .034 • N of Valid Cases 390 • b 0 cells (.0%) have expected count less than 5. The minimum expected count is 60.69.

  16. Bivariate Patterns • Father’s Occupation Group Total • 1 2 3 • Not in school Count 17 59 46 122 • % 23.3% 29.2% 40.0% 31.3% • Yes, in school Count 56 143 69 268 • % 76.7% 70.8% 60.0% 68.7% • Total Count 73 202 115 390 • % 100% 100% 100% 100% • Chi_Square Tests • Value df Asymp. Sig. (2_sided) • Pearson Chi_Square 6.641(a) 2 .036 • N of Valid Cases 390 • a 0 cells (.0%) have expected count less than 5. The minimum expected count is 22.84.

  17. Bivariate Patterns

  18. Some concepts: Odds and Probabilities • Odds: frequency of being in one category relative to the frequency of not being in that category. • Example: The odds that the first card dealt in a card game is a the queen of hearts is 1/51 (1 to 51). • Probability: frequency of being in one category relative to the total of all categories. • Example:The probability that the first card dealt in a card game is a queen of hearts is 1/52 (one in 52).

  19. Odds and Probabilities, cont. • Probability: Number of successes/Total Number of Trials (Cases) or pi • Odds = Frequency of the event relative to the frequency of not being in the category • Odds ratio = pi/(1-pi)

  20. So, • Probability of being a homeowner from example above: 724/(316+724) = .696 or 724 in (316 + 724) or 724 in 1040 • Odds of being a homeowner: 724 to 316 • Odds ratio for homeowners: 724/316 = 2.29 • NOTE: If the probability is > .5, the odds ratio > 1. If the probability is < .5, the odds ratio < 1.

  21. We can also calculate conditional probabilities and odds ratios • For the East Side: probability of being a homeowner is 79/110 or .718. Odds ratio is 2.57 • For the South Side: probability of being a homeowner is 239/309 or .773. Odds ratio of 3.35 • For the Northwest Side: probability of being a homeowner is 406/621 or .654. Odds ratio is 1.867

More Related