- 95 Views
- Uploaded on
- Presentation posted in: General

Contingency Tables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Chapters Seven, Sixteen, and Eighteen
- Chapter Seven
- Definition of Contingency Tables
- Basic Statistics
- SPSS program (Crosstabulation)

- Chapter Sixteen
- Basic Probability Theory Concepts
- Test of Hypothesis of Independence

- Unit of data.
- Two nominal scales measured for each unit.
- Example: interview study, sex of respondent, variable such as whether or not subject has a cellular telephone.
- Objective is to compare males and females with respect to what fraction have cellular telephones.

- One column for each value of the column variable; C is the number of columns.
- One row for each value of the row variable; R is the number of rows.
- R x C contingency table.

- Each entry is the OBSERVED COUNT O(i,j) of the number of units having the (i,j) contingency.
- Column of marginal totals.
- Row of marginal totals.

- ASSUME column variable is the independent variable.
- Hypothesis is independence.
- That is, the conditional distribution in any column is the same as the conditional distribution in any other column.

- Basic idea is proportional allocation of observations in a column based on column total.
- Expected count in (i, j ) contingency = E(i,j)= total number in column j *total number in row i/total number in table.
- Expected count need not be an integer; one expected count for each contingency.

- Residual in (i,j) contingency = observed count in (i,j) contingency - expected count in (i,j) contingency.
- That is, R(i,j)= O(i,j)-E(i,j)
- One residual for each contingency.

- Chi-squared component for (i, j) contingency =C(i,j)= (Residual in (i, j) contingency)2/expected count in (i, j) contingency.
- C(i,j)=(R(i,j))2 / E(i,j)

- Rough guides on whether the (i, j) contingency has an excessively large chi-squared component C(i,j):
- the observed significance level of 3.84 is about 0.05.
- Of 6.63 is about 0.01.
- Of 10.83 is 0.001.

- Sum C(i,j) over all contingencies.
- Pearson chi-squared test has (R-1)(C-1) degrees of freedom.
- Under null hypothesis
- Expected value of chi-square equals its degrees of freedom.
- Variance is twice its degrees of freedom

- Chapter Eighteen
- Measures of Association
- For nominal variables
- For ordinal variables

- Measures strength of an association
- usually, a dimensionless number between 0 and 1 in absolute value.
- Values near 0 indicate no association, near 1 mean strong association.

- Correlation coefficient is a measure of association
- Chi-square test is not
- depends on the number of observations.

- Chi-square based
- Phi coefficient
- Coefficient of contingency
- Cramer’s V

- Proportional reduction in error
- Lambda, symmetric
- Lambda, not symmetric

- Definition of the Phi Coefficient

- Can be greater than one.
- N is the total number of the table.
- For marijuana at time 3 and 4 data, phi coefficient is (96.595/366)0.5=0.51.

- Definition of coefficient of contingency

- Can never get as large as one.
- Largest value depends on number in table.
- For example given, c=0.46.

- Definition of statistic; k is smaller of number of rows and columns.

- An approximate observed level of significance is given for each measure.
- Use this in the usual way.

- Prediction is the modal category.
- Predict overall
- Predict used marijuana at time 4; correct for 237 and wrong for 129.

- Number of misclassified is 129.

- Predict for each condition of the independent variable.
- Predict not use at time 4 for those not using at time 3
- correct 120 of 215 times
- misclassify 95 times

- Predict use at time 4 for those using at time
- correct 142 of 151 times
- misclassify 9 times.

- Predict not use at time 4 for those not using at time 3

- Using only totals, number of misclassified is 129.
- Using marijuana at time 3, number misclassified is 104.
- The lambda measure is λ= (129-104) /129=0.19

- There is a lambda measure using marijuana use at time 4 as the independent variable.
- Total: predict no usage at time 3: 151 errors.
- Conditional
- no usage at Time 4: predict none at 3 with 9 errors
- usage at time 4: predict use at 3 with 95 errors
- 104 total errors.

- Lambda measure is (151-104)/151=0.31

- There is a symmetric lambda measure.
- [(129-104)+(151-104)]/(129+151)=0.26

- Concordant pair of cases: sign of difference on variable 1 is the same as the sign of the difference on variable 2.
- Case 1 and Case 2: concordant.
- Case 2 and Case 3: discordant
- Case 1 and Case 3: tied

- Let P be number of concordant pairs and Q be the number of discordant pairs.

- Goodman and Kruskal’s Gamma
- (P-Q)/(P+Q)

- Kendall’s Tau-b
- Kendall’s Tau-c
- Somers’ d

- Choose a measure “interpretable for the purpose in hand”!
- Avoid data dredging (taking the measure that is largest for the data set that you have).

- Correlation based
- Pearson’s correlation
- Spearman correlation: replace values by ranks.

- Measures of agreement
- Cohen’s kappa.

- Contingency table methods crucial to the analysis of market research and social science data.
- Hypothesis of independence
- Measures of association describe the strength of the dependence between two variables.