1 / 15

Sociology 601 Lecture 11: October 6, 2009

Sociology 601 Lecture 11: October 6, 2009. No office hours Oct. 15, but available all day Oct. 16 Homework Contingency Tables for Categorical Variables (8.1) some useful probabilities and hypothesis tests based on contingency tables independence redefined.

ilaird
Download Presentation

Sociology 601 Lecture 11: October 6, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sociology 601 Lecture 11: October 6, 2009 • No office hours Oct. 15, but available all day Oct. 16 • Homework • Contingency Tables for Categorical Variables (8.1) • some useful probabilities and hypothesis tests based on contingency tables • independence redefined. • The Chi-Squared Test (8.2) [Thursday] • When to use Chi-squared tests (8.3) [Thursday] • chi-squared residuals

  2. Homework • Stata ttests: means and proportions – using categorical, dummy, interval/continuous variables • P values with the T table: t=3, n=9, what is P? • # 30 – industrial plant – part C • # 52 – random number generator • Small sample significance test • # 54 – e is incorrect

  3. 3

  4. Definitions for a 2X2 contingency table • Let X and Y denote two categorical variables • Variable X (Explanatory/Independent variable)can have one of two values: X = 1 or X = 2 • Variable Y (Response/Dependent variable) can have one of two values: Y = 1 or Y = 2 • nijdenotes the count of responses in a cell in a table

  5. Structure for a 2X2 contingency table • Values for X and Y variables are arrayed as follows:

  6. Some useful definitions • The unconditional probability P(Y = 1): = (n11 + n21 )/ (n11 + n12 + n21 + n22 ) = the marginal probability that Y equals 1 • The conditional probability P(Y = 1, given X = 1): = n11 / (n11 + n12) = P ((Y = 1) | (X = 1)) • The joint probability P(Y = 1 and X = 1): = n11 / (n11 + n12 + n21 + n22 ) = P ((Y = 1)  (X = 1)) = the cell probability for cell (1,1)

  7. Example: Support Law Enforcement? Yes No Tot Support health Yes 292 25 317 care spending? No 14 9 23 Tot 306 34 340 • What is the unconditional probability of favoring increased spending on law enforcement? • What is the conditional probability of favoring increased spending on law enforcement for respondents who opposed increased spending on health? • What is the joint probability of favoring increased spending on law enforcement and opposing increased spending on health?

  8. Hypothesis tests based on contingency tables: • Usually we ask: is the distribution of Y when X=1 different than the distribution of Y when X=2? • Null Hypothesis: the conditional distributions of Y, given X, are equal. Ho: P ((Y = 1) | (X = 1)) – P((Y = 1) | (X = 2)) = 0 alternatively, Ho: Y|X=1 - Y|X=2 = 0 • This type of question often comes up because of its causal implications. • For example: “Are childless adults more likely to vote for school funding than parents?”

  9. A confusing new definition for independence • Previously we used the term independence to refer to groups of observations. • “White and hispanic respondents were sampled independently.” • In this chapter, we use independence to refer to a property of variables, not observations. • “Political orientation is independently distributed with respect to ethnicity” • Two categorical variables are independent if the conditional distributions of one variable are identical at each category of the other variable.

  10. Contingency tables in STATA • The 1991 General Social Survey Contains data on Party Identification and Gender for 980 respondents. • See Table 8.1, page 250 in A&F • Here is a program for inputting the data into STATA interactively: input str10 gender str12 party number female democrat 279 male democrat 165 female independent 73 male independent 47 female republican 225 male republican 191 end

  11. Contingency tables in STATA • Here is a command to create a contingency table, and its output . tabulate gender party [freq=number] | party gender | democrat independe republica | Total -----------+---------------------------------+---------- female | 279 73 225 | 577 male | 165 47 191 | 403 -----------+---------------------------------+---------- Total | 444 120 416 | 980 • The following slide adds row, column, and cell %

  12. . tabulate gender party [freq=number], row column cell +-------------------+ | Key | |-------------------| | frequency | | row percentage | | column percentage | | cell percentage | +-------------------+ | party gender | democrat independe republica | Total -----------+---------------------------------+---------- female | 279 73 225 | 577 | 48.35 12.65 38.99 | 100.00 | 62.84 60.83 54.09 | 58.88 | 28.47 7.45 22.96 | 58.88 -----------+---------------------------------+---------- male | 165 47 191 | 403 | 40.94 11.66 47.39 | 100.00 | 37.16 39.17 45.91 | 41.12 | 16.84 4.80 19.49 | 41.12 -----------+---------------------------------+---------- Total | 444 120 416 | 980 | 45.31 12.24 42.45 | 100.00 | 100.00 100.00 100.00 | 100.00 | 45.31 12.24 42.45 | 100.00

  13. 8.2 Developing a new statistical significance test for contingency tables. support tax reform? Yes No Tot support Yes 150 100 250 environment? No 200 50 250 Tot 350 150 500 • “Is the level of support for the environment dependent on the level of support for tax reform.” • If so, these two measures are likely to have some causal link worth investigating.

  14. With a 2x2 table, we can use a t-test for independent-sample proportions. . prtesti 250 .6 250 .8 Two-sample test of proportion x: Number of obs = 250 y: Number of obs = 250 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .6 .0309839 .5392727 .6607273 y | .8 .0252982 .7504164 .8495836 -------------+---------------------------------------------------------------- diff | -.2 .04 -.2783986 -.1216014 | under Ho: .0409878 -4.88 0.000 ------------------------------------------------------------------------------ diff = prop(x) - prop(y) z = -4.8795 Ho: diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(Z < z) = 0.0000 Pr(|Z| < |z|) = 0.0000 Pr(Z > z) = 1.0000

  15. Moving beyond 2x2 tables: Comparing conditional probabilities is fine when there are only two comparisons and two possible outcomes for each comparison. The Chi-Square (2) test is a new technique for making comparisons more flexible. 2is like a null hypothesis that every cell should have the frequency you would expect if the variables were independently distributed. fe is the expected count for each cell. fe = total N * unconditional row probability * unconditional column probability A test for the whole table will combine tests for fe for every cell.

More Related