Contingency tables
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Contingency Tables PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Contingency Tables. Chapters Seven, Sixteen, and Eighteen Chapter Seven Definition of Contingency Tables Basic Statistics SPSS program (Crosstabulation) Chapter Sixteen Basic Probability Theory Concepts Test of Hypothesis of Independence. Basic Empirical Situation. Unit of data.

Download Presentation

Contingency Tables

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Contingency tables

Contingency Tables

  • Chapters Seven, Sixteen, and Eighteen

  • Chapter Seven

    • Definition of Contingency Tables

    • Basic Statistics

    • SPSS program (Crosstabulation)

  • Chapter Sixteen

    • Basic Probability Theory Concepts

    • Test of Hypothesis of Independence

Basic empirical situation

Basic Empirical Situation

  • Unit of data.

  • Two nominal scales measured for each unit.

    • Example: interview study, sex of respondent, variable such as whether or not subject has a cellular telephone.

    • Objective is to compare males and females with respect to what fraction have cellular telephones.

Contingency table

Contingency Table

  • One column for each value of the column variable; C is the number of columns.

  • One row for each value of the row variable; R is the number of rows.

  • R x C contingency table.

Contingency table1

Contingency Table

  • Each entry is the OBSERVED COUNT O(i,j) of the number of units having the (i,j) contingency.

  • Column of marginal totals.

  • Row of marginal totals.

Basic hypothesis

Basic Hypothesis

  • ASSUME column variable is the independent variable.

  • Hypothesis is independence.

  • That is, the conditional distribution in any column is the same as the conditional distribution in any other column.

Expected count

Expected Count

  • Basic idea is proportional allocation of observations in a column based on column total.

  • Expected count in (i, j ) contingency = E(i,j)= total number in column j *total number in row i/total number in table.

  • Expected count need not be an integer; one expected count for each contingency.



  • Residual in (i,j) contingency = observed count in (i,j) contingency - expected count in (i,j) contingency.

  • That is, R(i,j)= O(i,j)-E(i,j)

  • One residual for each contingency.

Pearson chi squared component

Pearson Chi-squared Component

  • Chi-squared component for (i, j) contingency =C(i,j)= (Residual in (i, j) contingency)2/expected count in (i, j) contingency.

  • C(i,j)=(R(i,j))2 / E(i,j)

Assessing pearson component

Assessing Pearson Component

  • Rough guides on whether the (i, j) contingency has an excessively large chi-squared component C(i,j):

    • the observed significance level of 3.84 is about 0.05.

    • Of 6.63 is about 0.01.

    • Of 10.83 is 0.001.

Pearson chi squared test

Pearson Chi-Squared Test

  • Sum C(i,j) over all contingencies.

  • Pearson chi-squared test has (R-1)(C-1) degrees of freedom.

  • Under null hypothesis

    • Expected value of chi-square equals its degrees of freedom.

    • Variance is twice its degrees of freedom

Marijuana use at time 4 by marijuana use at time 3

Marijuana Use at Time 4 by Marijuana Use at Time 3

Contingency tables1

Contingency Tables

  • Chapter Eighteen

    • Measures of Association

    • For nominal variables

    • For ordinal variables

Measures of association

Measures of Association

  • Measures strength of an association

    • usually, a dimensionless number between 0 and 1 in absolute value.

    • Values near 0 indicate no association, near 1 mean strong association.

  • Correlation coefficient is a measure of association

  • Chi-square test is not

    • depends on the number of observations.

Measures of association for nominal scale variables

Measures of Association for Nominal Scale Variables

  • Chi-square based

    • Phi coefficient

    • Coefficient of contingency

    • Cramer’s V

  • Proportional reduction in error

    • Lambda, symmetric

    • Lambda, not symmetric

Chi squared measure phi coefficient

Chi-squared Measure: Phi Coefficient

  • Definition of the Phi Coefficient

Phi coefficient

Phi Coefficient

  • Can be greater than one.

  • N is the total number of the table.

  • For marijuana at time 3 and 4 data, phi coefficient is (96.595/366)0.5=0.51.

Coefficient of contingency

Coefficient of Contingency

  • Definition of coefficient of contingency

Coefficient of contingency1

Coefficient of Contingency

  • Can never get as large as one.

  • Largest value depends on number in table.

  • For example given, c=0.46.

Cram r s v

Cramér’s V

  • Definition of statistic; k is smaller of number of rows and columns.

Interpretation of chi squared measures of association

Interpretation of Chi-squared measures of association

  • An approximate observed level of significance is given for each measure.

  • Use this in the usual way.

Proportional reduction in error pre measures

Proportional Reduction in Error (PRE) Measures

  • Prediction is the modal category.

  • Predict overall

    • Predict used marijuana at time 4; correct for 237 and wrong for 129.

  • Number of misclassified is 129.

Proportional reduction in error pre measures1

Proportional Reduction in Error (PRE) Measures

  • Predict for each condition of the independent variable.

    • Predict not use at time 4 for those not using at time 3

      • correct 120 of 215 times

      • misclassify 95 times

    • Predict use at time 4 for those using at time

      • correct 142 of 151 times

      • misclassify 9 times.

Proportional reduction in error pre measures2

Proportional Reduction in Error (PRE) Measures

  • Using only totals, number of misclassified is 129.

  • Using marijuana at time 3, number misclassified is 104.

  • The lambda measure is λ= (129-104) /129=0.19

Lambda pre measures

Lambda PRE Measures

  • There is a lambda measure using marijuana use at time 4 as the independent variable.

    • Total: predict no usage at time 3: 151 errors.

    • Conditional

      • no usage at Time 4: predict none at 3 with 9 errors

      • usage at time 4: predict use at 3 with 95 errors

      • 104 total errors.

    • Lambda measure is (151-104)/151=0.31

Lambda pre measures1

Lambda PRE Measures

  • There is a symmetric lambda measure.

  • [(129-104)+(151-104)]/(129+151)=0.26

Text example data set

Text Example Data Set

Comparing pairs of cases

Comparing Pairs of Cases

  • Concordant pair of cases: sign of difference on variable 1 is the same as the sign of the difference on variable 2.

    • Case 1 and Case 2: concordant.

    • Case 2 and Case 3: discordant

    • Case 1 and Case 3: tied

  • Let P be number of concordant pairs and Q be the number of discordant pairs.

Measures based on concordant and discordant pairs

Measures Based on Concordant and Discordant Pairs

  • Goodman and Kruskal’s Gamma

    • (P-Q)/(P+Q)

  • Kendall’s Tau-b

  • Kendall’s Tau-c

  • Somers’ d

Choosing a measure

Choosing a measure

  • Choose a measure “interpretable for the purpose in hand”!

  • Avoid data dredging (taking the measure that is largest for the data set that you have).

Other measures

Other measures

  • Correlation based

    • Pearson’s correlation

    • Spearman correlation: replace values by ranks.

  • Measures of agreement

    • Cohen’s kappa.



  • Contingency table methods crucial to the analysis of market research and social science data.

  • Hypothesis of independence

  • Measures of association describe the strength of the dependence between two variables.

  • Login