**Biostatistics** Unit 10 Categorical Data Analysis

**Categorical Data Analysis** • Categorical data analysis deals with discrete data that can be organized into categories. • The data are organized into a contingency table. The basic structure consists of two columns and two rows. • The c2distribution is used in categorical data analysis.

**Basic Contingency Table Structure** • Basic structure of a 2X2 contingency table has two columns and two rows.

**Structure of Contingency Tables** • Cells are labeled A through D. • Columns and rows are added for labels.

**Using the contingency table as a comparison table** • Comparison of outcomes in laboratory tests is studied using contingency tables.

**Absolute and Relative Risk** • Relative risk is the ratio of two proportions. In each row is an absolute risk of getting the disease. • The ratio of these two proportions is the relative risk.

**Absolute Risk**

**Relative Risk**

**Example** A total of 452 children in elementary schools in Georgia and Florida were served burritos for lunch. Among these, 304 children reported eating the burritos. Among those who ate burritos, 155 reported getting sick from bacterial contamination. There were also 148 children who did not eat burritos. Among these, 10 cases of illness were reported. A case of disease was defined as gastrointestinal upset, fever and other symptoms. The CDC studied this event using categorical data analysis. They reported relative risk, significance and a confidence interval.

**Contingency Table** • Data from the reports of the incident were entered into a contingency table.

**Absolute risk—ate burritos**

**Absolute risk—did not eat burritos**

**Relative Risk** • Relative risk is the ratio of the two absolute risk probabilities. • Conclusion: A child who ate burritos had 7.06 times the probability of getting sick as one who did not.

**Significance in relative risk** Significance in relative risk is found using the c2 distribution. The general formula is below.

**Significance in relative risk** In contingency table calculations, the values from the table are used to give a c2 value according to the formula below.

**Find significance using the TI-83** A. Matrix setup

**Find significance using the TI-83** • Calculation results Conclusion: With p this small, the result is highly significant.

**CI for a Relative Risk Calculation** The confidence interval consists of the usual components of estimator, reliability coefficient and standard error. Standard error is found using the formula

**CI for a Relative Risk Calculation** Logarithmic transformation is used because of the shape of the c2 curve 1 df which is hyperbolic. The antilog gives the boundaries of the confidence interval.

**CI for a Relative Risk Calculation**

**CI for a Relative Risk Calculation**

**CI for a Relative Risk Calculation** • Take antilog to complete the calculation. • Conclusion: The relative risk is 7.06. We are 95% confident that the true value lies between 3.575 and 13.93.

**Odds Ratio** • The odds come from the ratio of two proportions. • The odds ratio is the ratio of these two odds. • Odds ratio is generally calculated from data in a case control study. • The following gives the theoretical basis for the calculation of odds ratio. The outcome is determined as the cross-product.

**Contingency Table**

**Odds ratio and the contingency table** • The probability of being exposed and getting sick (success) is P(E). The probability of being exposed and not getting sick (failure) is 1 – P(E). • The probability of getting sick when not exposed is P(E’) while the probability of not getting sick when not exposed is 1 – P(E’).

**Determining Odds Ratio** Odds of getting sick when exposed Odds of getting sick when not exposed

**Determining Odds Ratio** Odds ratio is the ratio of these two odds The probability values are related to the cells in the contingency table.

**Determining Odds Ratio** The final ratio of cells to find odds ratio This calculation of odds ratio is the cross-product of AD divided by BC.

**Case study for odds ratio** In the case control study, 52 children were involved. There were 13 children who ate the burritos among which 8 got sick. There were also 39 children who did not eat the burritos among which 6 reported symptoms of the illness. The odds ratio was calculated.

**Odds Ratio Calculation** Conclusion: The odds ratio is 8.8

**Find significance using the TI-83** A. Matrix setup

**Find significance using the TI-83** • Calculation results Conclusion: p < .001

**CI for an Odds Ratio Calculation** Calculation for SE after logarithmic transformation

**CI for an Odds Ratio Calculation**

**CI for an Odds Ratio Calculation** Take antilog to complete the calculation. Conclusion: The odds ratio is 8.8. We are 95% confident that the true value lies between 2.14 and 36.3.

**fin**