1 / 58

Describing Association for Discrete Variables

Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

asis
Download Presentation

Describing Association for Discrete Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Describing Association for Discrete Variables

  2. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories

  3. 1. Ordered categories e.g., “High,” “Medium,” and “Low”[both variables must be ordered]2. Non-ordered categories e.g., “Yes” and “No”

  4. Relationships between two variables may be either 1. symmetrical or 2. asymmetrical

  5. Symmetrical means that we are only interested in describing the extent to which two variables “hang around together” [non-directional] Symbolically, X  Y

  6. Asymmetrical means that we want a measure of association that yields a different description of X’s influence on Y from Y’s influence on X [directional]Symbolically,X Y Y X

  7. Ordered CategoriesAsymmetricalRelationship No Yes No Yes

  8. For symmetrical relationships between two non-ordered variables, there are two choices: 1. Yule’s Q (for 2x2 tables) 2. Cramer’s V (for larger tables)

  9. Respondents in the 1997 General Social Survey (GSS 1997) were asked:Were they strong supporters of any political party (yes or no)?; and, Did they vote in the 1996 presidential election (yes or no)?Party IdentificationNot Strong Strong TotalVotingVotedab a + bTurnoutNot Votedcd c + dTotal a + c b + d a+b+c+d

  10. Party IdentificationNot Strong Strong TotalVotingVoted615339 954Turnout Not Voted31859 377Total 933 398 1,331

  11. Q = [(339)(318) - (615)(59)] / [(339)(318) + (615)(59)] = [(107,801) - (36,285)] / [(107,801) + (36,285)] = (71,516) / (144,086) = 0.496

  12. What does this mean?Yule’s Q varies from 0.00 (statistical independence; no association) to + 1.00 (perfect direct association) and – 1.00 (perfect inverse association)

  13. Use the following rule of thumb (for now):0.00 to 0.24 "No relationship"0.25 to 0.49 "Weak relationship"0.50 to 0.74 "Moderate relationship"0.75 to 1.00 "Strong relationship"Yule’s Q = + 0.496 ". . . represents a moderate positive association between party identification strength and voting turnout."

  14. Party IdentificationNot Strong Strong TotalVotingVoted0954 954Turnout Not Voted3770 377Total 377 954 1,331

  15. What would be the value of Yule's Q?Q = [(954)(377) - (0)(0)] / [(954)(377) + (0)(0)]= [(359,658) - (0)] / [(359,658) + (0)]= (359,658) / (359,658)= 1.000

  16. Party IdentificationNot Strong Strong TotalVotingVoted477477 954Turnout Not Voted189188 377Total 666 665 1,331

  17. In this case, Yule's Q would be:Q = [(477)(189) - (477)(188)] / [(477)(189) + (477)(188)]= [(90,153) - (89,676)] / [(90,153) + (89,676)]= (477) / (179,829)= 0.003

  18. Obviously Yule's Q can only be calculated for 2 x 2 tables. For larger tables (e.g., 3 x 4 tables having three rows and four columns), most statistical programs such as SAS report the Cramer's V statistic. Cramer's V has properties similar to Yule's Q, but since it is computed from 2 it cannot take negative values: Where min(R – 1) or (C – 1) means either number of rows less one or number of columns less one, whichever is smaller, and N is sample size.

  19. In the example above,2 = 50.968 and Cramer's V is = 0.196

  20. For asymmetrical relationships between two non-ordered variables, the statistic of choice is:Lambda ()

  21. Lambda is calculated as follows: = [(Non-modal responses on Y) - (Sum of non-modalresponses for each category of X)] / (Non-modal responses on Y)

  22. Party IdentificationNot Strong Strong TotalVotingVoted 615 339 954Turnout Not Voted31859377Total 933 398 1,331

  23. In this example, = [(377) - (318 + 59)] / (377)= [(377) - (377)] / (377)= (0) / (377)= 0.00

  24. For symmetrical relationships between two variables havingordered categories, the statistic of choice is:Gamma (G)

  25. where ns are concordant pairs andndare discordant pairs

  26. The concepts of concordant and discordant pairs are simple and are based on a generalization of the diagonal and off-diagonal in the Yule’s Q statistic.

  27. To construct concordant pairs: "Starting with the upper right cell (i.e., the first row, last column in the table), add together all frequencies in cells below AND to the left of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the left) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the concordant pairs."

  28. To illustrate, take the crosstabulation below which shows the relationship between a measure of social class and respondents' satisfaction with their current financial situation:Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less 19 309 343 19 690Not at all 43 190 84 7 324Total 72 630 678 62 1,442

  29. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less19 309 343 19 690Not at all43 190 84 7 324Total 72 630 678 62 1,442

  30. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 25136 428More or less19 309343 19 690Not at all43 19084 7 324Total 72 630 678 62 1,442

  31. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 13125136 428More or less19309343 19 690Not at all4319084 7 324Total 72 630 678 62 1,442

  32. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less19 309 34319 690Not at all43 190 84 7 324Total 72 630 678 62 1,442

  33. For this table, the calculations are:36 x (343 + 309 + 19 + 84 + 190 + 43) = 35,568251 x (309 + 19 + 190 + 43) = 140,811131 x (19 + 43) = 8,12219 x (84 + 190 + 43) = 6,023343 x (190 + 43) = 79,919309 x (43) = 13,287These are NOT the value of the concordant pairs; they are the values that must be added together to determine the value of concordant pairs.ns = (35,568 + 140,811 + 8,122 + 6,023 + 79,919 + 13,287)ns = 283,730

  34. To construct discordant pairs: "Starting with the upper left cell (i.e., the first row, first column in the table), add together all frequencies in cells below AND to the right of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the right) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the discordant pairs."

  35. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well10 131 251 36 428More or less19 309 343 19 690Not at all43190 84 7 324Total 72 630 678 62 1,442

  36. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less19 309 343 19 690Not at all43 19084 7 324Total 72 630 678 62 1,442

  37. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 25136 428More or less19 309 34319 690Not at all43190847 324Total 72 630 678 62 1,442

  38. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less19 309 343 19 690Not at all43 190 84 7 324Total 72 630 678 62 1,442

  39. For the discordant pairs in this table, the calculations are:10 x (309 + 343 + 19 + 190 + 84 + 7) = 9,520131 x (343 + 19 + 84 + 7) = 59,343251 x (19 + 7) = 6,52619 x (190 + 84 + 7) = 5,339309 x (84 + 7) = 28,119343 x (7) = 2,401Again, these are NOT the value of the disconcordant pairs; they are the values that must be added together to determine the value of disconcordant pairs.nd = (9,520 + 59,343 + 6,526 + 5,339 + 28,119 + 2,401)nd = 111,248

  40. G = [(283,730) - (111,248)] / [(283,730) + (111,248)] = (172,482) / (394,978) = 0.437

  41. For asymmetrical relationships between two variables havingordered categories, the statistic of choice is:Somers’ dyx

  42. For this crosstabulation, we specify Social Class (the column variable) as the independent variable (X) and Financial Satisfaction (the row variable) as the dependent variable (Y).Social Class (X)FinanciallySatisfied (Y)Lower Working Middle Upper TotalVery well 10 131 251 36 428More or less 19 309 343 19 690Not at all 43 190 84 7 324Total 72 630 678 62 1,442

  43. Somers' dyx statistic is created by adjusting concordant and discordant pairs for tied pairs on the dependent variable (Y).In the example we have been using example, the only asymmetrical relationship that makes sense is one with the dependent variable (Y) as the row variable. Therefore Somers' dyx will be shown only for this situation, that is, for tied pairs on the row variable. (Tied pairs for the column variable follow the identical logic.)A tied pair is all respondents who are identical with respect to categories of the dependent variable but who differ on the category of the independent variable to which they belong. In the case of financial satisfaction, it is all respondents who express the same satisfaction level but who identify themselves with different social classes. In other words, for ties for a dependent row variable it is all the observations in the other cells in the same row.

  44. The computational rule is: Target the upper left hand cell (in the first row, first column); multiply its value by the sum of the cell frequencies to right in the same row; move to the cell to the right and multiply its value by the sum of the cell frequencies to right in the same row; repeat until there are no more cells to the right in the same row; then move to the first cell in the next row (first column) and repeat until there are no more cells in the table having cells to the right. Add up these products.

  45. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well10131 251 36 428More or less19 309 34319 690Not at all43190 84 7 324Total 72 630 678 62 1,442

  46. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131251 36 428More or less19 309 34319 690Not at all43190 84 7 324Total 72 630 678 62 1,442

  47. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131251 36 428More or less19 309 34319 690Not at all43190 84 7 324Total 72 630 678 62 1,442

  48. Social ClassFinanciallySatisfiedLower Working Middle Upper TotalVery well 10 131 251 36 428More or less19309 343 19 690Not at all43190 84 7 324Total 72 630 678 62 1,442

More Related