210 likes | 372 Views
Categorical Models. Multivariate Cross Tabulation and Analysis of Variance. Models. A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. Steps in the Process of Quantitative Analysis: Specification of the model
E N D
Categorical Models Multivariate Cross Tabulation and Analysis of Variance
Models • A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. • Steps in the Process of Quantitative Analysis: • Specification of the model • Estimation of the model • Evaluation of the model
Types of Models • Regression Models • Dependent variable is interval. • Independent variables may be interval or categorical (dummy variables) • ANOVA Models • Dependent variable is interval • Independent variables are categorical • Covariates may be interval
Types of Models, for Today, cont. • Multivariate Cross Tabulated Models • Dependent variable is categorical • Independent variables are categorical • Example: • Remember our bivariate cross tabulations
Let’s think about the issues again…First we simplify a bit… Combining wards 20 and 22, we have three neighborhoods. Frequencies OCC$ (rows) by NEIGH$ (columns) EASTSIDE NW SOUTHSID Total +-------------------------------+ profcler | 55 87 9 | 151 prop | 33 99 27 | 159 skilled | 16 263 90 | 369 skillpart | 12 66 13 | 91 unskilled | 12 117 175 | 304 +-------------------------------+ Total 128 632 314 1074 Chi Square is statistically significant, p <.000
Row and Column Percents • Row percents • OCC$ (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +-------------------------------+ • profcler | 36.424 57.616 5.960 | 100.000 151 • prop | 20.755 62.264 16.981 | 100.000 159 • skilled | 4.336 71.274 24.390 | 100.000 369 • skillpart | 13.187 72.527 14.286 | 100.000 91 • unskilled | 3.947 38.487 57.566 | 100.000 304 • +-------------------------------+ • Total 11.918 58.845 29.236 100.000 • N 128 632 314 1074 • Column percents • OCC$ (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +-------------------------------+ • profcler | 42.969 13.766 2.866 | 14.060 151 • prop | 25.781 15.665 8.599 | 14.804 159 • skilled | 12.500 41.614 28.662 | 34.358 369 • skillpart | 9.375 10.443 4.140 | 8.473 91 • unskilled | 9.375 18.513 55.732 | 28.305 304 • +-------------------------------+ • Total 100.000 100.000 100.000 100.000 • N 128 632 314 1074
We can say… • There are statistically significant differences in the distribution of occupation groups by neighborhood. • Coefficient • Phi 0.515 • Cramer V 0.364 • Contingency Coeff. 0.458
Extend the analysis to examine determinants of home ownership • Is there a relationship between home ownership and neighborhood? (bivariate) • Is there a relationship between home ownership and occupational group? (bivariate) • Is homeownership affected by both one’s neighborhood and one’s occupational group? (multivariate)
Homeownership and Neighborhood • Frequencies • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total • +----------------------------+ • 0 | 31 215 70 | 316 • 1 | 79 409 240 | 728 • +----------------------------+ • Total 110 624 310 1044 • Row percents • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +----------------------------+ • 0 | 9.810 68.038 22.152 | 100.000 316 • 1 | 10.852 56.181 32.967 | 100.000 728 • +----------------------------+ • Total 10.536 59.770 29.693 100.000 • N 110 624 310 1044 • Column percents • OWN (rows) by NEIGH$ (columns) • EASTSIDE NW SOUTHSID Total N • +----------------------------+ • 0 | 28.182 34.455 22.581 | 30.268 316 • 1 | 71.818 65.545 77.419 | 69.732 728 • +----------------------------+ • Total 100.000 100.000 100.000 100.000 • N 110 624 310 1044 • Test statistic Value df Prob • Pearson Chi-square 14.090 2.000 0.001 • Likelihood ratio Chi-square 14.467 2.000 0.001
Homeownership and Occupation Group • Frequencies: OCC$ (rows) by OWN (columns) • 0 1 Total • +---------------------+ • profcler | 56 85 | 141 • prop | 44 106 | 150 • skilled | 125 239 | 364 • skillpart | 28 63 | 91 • unskilled | 63 231 | 294 • +---------------------+ • Total 316 724 1040 • Row percents OCC$ (rows) by OWN (columns) • 0 1 Total N • +---------------------+ • profcler | 39.716 60.284 | 100.000 141 • prop | 29.333 70.667 | 100.000 150 • skilled | 34.341 65.659 | 100.000 364 • skillpart | 30.769 69.231 | 100.000 91 • unskilled | 21.429 78.571 | 100.000 294 • +---------------------+ • Total 30.385 69.615 100.000 • N 316 724 1040 • Column percents OCC$ (rows) by OWN (columns) • 0 1 Total N • +---------------------+ • profcler | 17.722 11.740 | 13.558 141 • prop | 13.924 14.641 | 14.423 150 • skilled | 39.557 33.011 | 35.000 364 • skillpart | 8.861 8.702 | 8.750 91 • unskilled | 19.937 31.906 | 28.269 294 • +---------------------+ • Total 100.000 100.000 100.000 • N 316 724 1040 • Test statistic Value df Prob • Pearson Chi-square 19.731 4.000 0.001 • Likelihood ratio Chi-square 20.159 4.000 0.000
Three Way Table of Homeownership, Occupational Group and Neighborhood: 1 • Frequencies • OCC$ (rows) by NEIGH$ (columns) • OWN = 0 • EASTSIDE NW SOUTHSID Total • +-------------------------------+ • profcler | 11 43 2 | 56 • prop | 12 27 5 | 44 • skilled | 6 97 22 | 125 • skillpart | 2 21 5 | 28 • unskilled | 0 27 36 | 63 • +-------------------------------+ • Total 31 215 70 316 • OWN = 1 • EASTSIDE NW SOUTHSID Total • +-------------------------------+ • profcler | 34 44 7 | 85 • prop | 15 70 21 | 106 • skilled | 10 161 68 | 239 • skillpart | 10 45 8 | 63 • unskilled | 10 86 135 | 231 • +-------------------------------+ • Total 79 406 239 724
Three Way Table of Homeownership, Occupational Group and Neighborhood: 2 • Frequencies: OCC$ (rows) by OWN (columns) • NEIGH$ = EASTSIDE • 0 1 Total • +---------------------+ • profcler | 11 34 | 45 • prop | 12 15 | 27 • skilled | 6 10 | 16 • skillpart | 2 10 | 12 • unskilled | 0 10 | 10 • +---------------------+ • Total 31 79 110 • NEIGH$ = NW • 0 1 Total • +---------------------+ • profcler | 43 44 | 87 • prop | 27 70 | 97 • skilled | 97 161 | 258 • skillpart | 21 45 | 66 • unskilled | 27 86 | 113 • +---------------------+ • Total 215 406 621 • NEIGH$ = SOUTHSID • 0 1 Total • +---------------------+ • profcler | 2 7 | 9 • prop | 5 21 | 26 • skilled | 22 68 | 90 • skillpart | 5 8 | 13 • unskilled | 36 135 | 171 • +---------------------+ • Total 70 239 309
How to Think About the Issues • We can examine how neighborhood affects homeownership, controlling for occupational group • We can examine how occupation group affects homeownership, controlling for neighborhood
Another Example: School Attendance in Milwaukee in 1910 • Data: Federal Census sample from 1910 • Cases: 1500 people organized into households, so information on everyone in the family • 390 cases are school aged children: 5-18 years old • Variables include: school attendance, ethnicity, age, sex, occupation of the household head, and more • We ask how sex, Dad’s occupation and ethnicity affect school attendance
Bivariate Patterns • Sex Total • Female Male • Not in school Count 71 51 122 • % 36.2% 26.3% 31.3% • Yes, in school Count 125 143 268 • % 63.8% 73.7% 68.7% • Total Count 196 194 390 • % 100% 100% 100% • Chi_Square Tests • Value df Asymp. Sig. (2_sided) • Pearson Chi_Square 4.478(b) 1 .034 • N of Valid Cases 390 • b 0 cells (.0%) have expected count less than 5. The minimum expected count is 60.69.
Bivariate Patterns • Father’s Occupation Group Total • 1 2 3 • Not in school Count 17 59 46 122 • % 23.3% 29.2% 40.0% 31.3% • Yes, in school Count 56 143 69 268 • % 76.7% 70.8% 60.0% 68.7% • Total Count 73 202 115 390 • % 100% 100% 100% 100% • Chi_Square Tests • Value df Asymp. Sig. (2_sided) • Pearson Chi_Square 6.641(a) 2 .036 • N of Valid Cases 390 • a 0 cells (.0%) have expected count less than 5. The minimum expected count is 22.84.
Some concepts: Odds and Probabilities • Odds: frequency of being in one category relative to the frequency of not being in that category. • Example: The odds that the first card dealt in a card game is a the queen of hearts is 1/51 (1 to 51). • Probability: frequency of being in one category relative to the total of all categories. • Example:The probability that the first card dealt in a card game is a queen of hearts is 1/52 (one in 52).
Odds and Probabilities, cont. • Probability: Number of successes/Total Number of Trials (Cases) or pi • Odds = Frequency of the event relative to the frequency of not being in the category • Odds ratio = pi/(1-pi)
So, • Probability of being a homeowner from example above: 724/(316+724) = .696 or 724 in (316 + 724) or 724 in 1040 • Odds of being a homeowner: 724 to 316 • Odds ratio for homeowners: 724/316 = 2.29 • NOTE: If the probability is > .5, the odds ratio > 1. If the probability is < .5, the odds ratio < 1.
We can also calculate conditional probabilities and odds ratios • For the East Side: probability of being a homeowner is 79/110 or .718. Odds ratio is 2.57 • For the South Side: probability of being a homeowner is 239/309 or .773. Odds ratio of 3.35 • For the Northwest Side: probability of being a homeowner is 406/621 or .654. Odds ratio is 1.867