Contingency Tables

Contingency Tables • Chapters Seven, Sixteen, and Eighteen • Chapter Seven • Definition of Contingency Tables • Basic Statistics • SPSS program (Crosstabulation) • Chapter Sixteen • Basic Probability Theory Concepts • Test of Hypothesis of Independence

Basic Empirical Situation • Unit of data. • Two nominal scales measured for each unit. • Example: interview study, sex of respondent, variable such as whether or not subject has a cellular telephone. • Objective is to compare males and females with respect to what fraction have cellular telephones.

Contingency Table • One column for each value of the column variable; C is the number of columns. • One row for each value of the row variable; R is the number of rows. • R x C contingency table.

Contingency Table • Each entry is the OBSERVED COUNT O(i,j) of the number of units having the (i,j) contingency. • Column of marginal totals. • Row of marginal totals.

Basic Hypothesis • ASSUME column variable is the independent variable. • Hypothesis is independence. • That is, the conditional distribution in any column is the same as the conditional distribution in any other column.

Expected Count • Basic idea is proportional allocation of observations in a column based on column total. • Expected count in (i, j ) contingency = E(i,j)= total number in column j *total number in row i/total number in table. • Expected count need not be an integer; one expected count for each contingency.

Residual • Residual in (i,j) contingency = observed count in (i,j) contingency - expected count in (i,j) contingency. • That is, R(i,j)= O(i,j)-E(i,j) • One residual for each contingency.

Pearson Chi-squared Component • Chi-squared component for (i, j) contingency =C(i,j)= (Residual in (i, j) contingency)2/expected count in (i, j) contingency. • C(i,j)=(R(i,j))2 / E(i,j)

Assessing Pearson Component • Rough guides on whether the (i, j) contingency has an excessively large chi-squared component C(i,j): • the observed significance level of 3.84 is about 0.05. • Of 6.63 is about 0.01. • Of 10.83 is 0.001.

Pearson Chi-Squared Test • Sum C(i,j) over all contingencies. • Pearson chi-squared test has (R-1)(C-1) degrees of freedom. • Under null hypothesis • Expected value of chi-square equals its degrees of freedom. • Variance is twice its degrees of freedom

Marijuana Use at Time 4 by Marijuana Use at Time 3

Contingency Tables • Chapter Eighteen • Measures of Association • For nominal variables • For ordinal variables

Measures of Association • Measures strength of an association • usually, a dimensionless number between 0 and 1 in absolute value. • Values near 0 indicate no association, near 1 mean strong association. • Correlation coefficient is a measure of association • Chi-square test is not • depends on the number of observations.

Measures of Association for Nominal Scale Variables • Chi-square based • Phi coefficient • Coefficient of contingency • Cramer’s V • Proportional reduction in error • Lambda, symmetric • Lambda, not symmetric

Chi-squared Measure: Phi Coefficient • Definition of the Phi Coefficient

Phi Coefficient • Can be greater than one. • N is the total number of the table. • For marijuana at time 3 and 4 data, phi coefficient is (96.595/366)0.5=0.51.

Coefficient of Contingency • Definition of coefficient of contingency

Coefficient of Contingency • Can never get as large as one. • Largest value depends on number in table. • For example given, c=0.46.

Cramér’s V • Definition of statistic; k is smaller of number of rows and columns.

Interpretation of Chi-squared measures of association • An approximate observed level of significance is given for each measure. • Use this in the usual way.

Proportional Reduction in Error (PRE) Measures • Prediction is the modal category. • Predict overall • Predict used marijuana at time 4; correct for 237 and wrong for 129. • Number of misclassified is 129.

Proportional Reduction in Error (PRE) Measures • Predict for each condition of the independent variable. • Predict not use at time 4 for those not using at time 3 • correct 120 of 215 times • misclassify 95 times • Predict use at time 4 for those using at time • correct 142 of 151 times • misclassify 9 times.

Proportional Reduction in Error (PRE) Measures • Using only totals, number of misclassified is 129. • Using marijuana at time 3, number misclassified is 104. • The lambda measure is λ= (129-104) /129=0.19

Lambda PRE Measures • There is a lambda measure using marijuana use at time 4 as the independent variable. • Total: predict no usage at time 3: 151 errors. • Conditional • no usage at Time 4: predict none at 3 with 9 errors • usage at time 4: predict use at 3 with 95 errors • 104 total errors. • Lambda measure is (151-104)/151=0.31

Lambda PRE Measures • There is a symmetric lambda measure. • [(129-104)+(151-104)]/(129+151)=0.26

Text Example Data Set

Comparing Pairs of Cases • Concordant pair of cases: sign of difference on variable 1 is the same as the sign of the difference on variable 2. • Case 1 and Case 2: concordant. • Case 2 and Case 3: discordant • Case 1 and Case 3: tied • Let P be number of concordant pairs and Q be the number of discordant pairs.

Measures Based on Concordant and Discordant Pairs • Goodman and Kruskal’s Gamma • (P-Q)/(P+Q) • Kendall’s Tau-b • Kendall’s Tau-c • Somers’ d

Choosing a measure • Choose a measure “interpretable for the purpose in hand”! • Avoid data dredging (taking the measure that is largest for the data set that you have).

Other measures • Correlation based • Pearson’s correlation • Spearman correlation: replace values by ranks. • Measures of agreement • Cohen’s kappa.

Summary • Contingency table methods crucial to the analysis of market research and social science data. • Hypothesis of independence • Measures of association describe the strength of the dependence between two variables.

Contingency Tables