Categorical Outcomes Making Comparisons Chapter 4
Outline • Describing: Numerical summaries Graphical summaries • One-sample comparisons: Historical controls • Multiple-sample comparisons: Dichotomous outcome Categorical outcomes • Measures of association
Categorical Outcomes • Gaps: Only limited number of values/categories possible Nothing “in-between” • Examples: Dichotomous (two categories) Nominal (categories without order) Ordinal (categories with order)
Learning Objectives • How do I describe categorical data? • How do I make comparisons? • How do I investigate associations?
Public Health Application • More than three-quartersof global malaria deaths occur in under-five children living in malarious countries in sub-Saharan Africa. 25% of all childhood mortality below the age of five is attributable to malaria. About 30–40% of all fevers seen in health centers in Africa are due to malaria with huge seasonal variability between rainy and dry seasons.
Data Description • Cross-sectional study conducted to investigate factors related to insecticide-treated net (ITN) use: 1876 households with an ITN: • Demographic variables (age of the head of household, household wealth, miles to the nearest healthcare facility, rural/urban, family size, etc.) • Children under the age of five? • Was an ITN used the previous night?
Research Question What factors are associated with ITN use?
Describing the Data • Numerical summaries: Counts, proportions, and percentages • Graphical summaries: Pie charts Bar graphs
Most Important Step in Data Analysis • Describe the data: Before making conclusions or inferences, an investigator needs to fully understand what the data looks like. • Numerical and graphical summaries cannot be skipped! Need this information to choose the most appropriate statistical method Need this information for valid statistical inferences
Bar Graphs • Provide a visual comparison among groups. Vertical axis represents the number of subjects. • The higher the bar, the more the subjects. Horizontal axis represents categories. • Ordinal: Order matters. • Nominal: Order does not matter.
Bar Graphs Ordinal Variable Nominal Variable
Bar Graphs • Graphically compare groups for some categorical outcome.
Pie Charts • Provides a visual description of how parts compare to a whole
Numerical Summaries • Categorical variables are described by reporting the number of subjects within each category. Counts Proportions Percentages
Proportion • The fraction of the subjects belonging to a particular category. • The proportion of the population is a parameter. • The proportion of the sample is a statistic
Description of the Sample • A sample 1876 of households living in a tropical region where malaria is problematic: • The majority (51%) of the households are more than 50 miles from a healthcare facility and live in a rural area (53%). • Almost half (44%) of the households have a child under the age of five. • The average age for the head of the household is 48 (SD = 7.4). • Median family size of 6 with a range of 1–12. • Most (73%) of the households did not use an ITN the previous night.
Why a One-Sample Study? • Obtaining an additional group or sample for comparisons may not be practical. Comparisons involve historical control(s).
Historical Controls • Want to compare what you found in the sample to something: Do your results differ from what has been previously published/reported? • Historical controls: Control data are not collected concurrently within the same study. • different time period • different region • different population • different kind of exposure • Seems economical—why not use historical controls all the time?
One-Sample StudyITN Utilization • Data for this study were collected during the rainy season. How do the results compare with those of the dry season? • Is the season (rainy or dry) associated with the utilization of ITN?
Inference for the One-Sample Study • Hypothesis tests • Assume the null parameter is the true parameter Historical control study: Null parameter = Historical value • Decide whether the data support this assumption • Confidence intervals • Estimate the true parameter using interval • Can use the interval estimate to determine if assumptions about the parameter are reasonable
Inference for the One-Sample StudyHistorical Controls • Research hypothesis: The true proportion (p) in the rainy season is not 0.20. • Null hypothesis: The true proportion (p) in the rainy season is 0.20.
Planning • Estimation: Width of the interval Estimate of the proportion • Comparison of proportions: Power Significance level Effect size
Exact Tests • When the sample size is large (and the proportion is not too small), the normal approximation is used. What if this is not reasonable? • Exact tests allow for comparisons without using the normal distribution. Use binomial distribution.
Comparing a dichotomous outcome between two groups MULTIPLE-SAMPLE Comparisons
Description of the Sample • Households with children under five (n = 833) and without (n = 1043): • Similar with respect to age and family size. • Those with children under five in the household report more net use than those without children under 5 (34% vs 21%).
Description of the Sample Households using ITN (n = 500) • Report a higher percentage of children under five • Are more likely to live in a thatched roof • Have a higher percentage of households living within 15 miles of a healthcare facility • Are more likely to live in a rural area • Have, on average, younger household heads • Have larger families
Why a Two-Sample Study? • Provides an independent comparator group: Treatment vs control Exposed vs unexposed • Different outcomes between the groups may mean that the group is associated with the outcome.
Conditional Probabilities • Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Makes a difference how you interpret Row proportion Column proportion
Difference in Proportions • Statistical test does not care if you are comparing differences between column proportions and row proportions. • A difference in proportions translates to the two categorical variables being dependent.
Two-Sample Study • Does having a child under the age of five impact the utilization of ITN?
Inference for the Two-Sample Study Hypothesis tests • Assume the null parameter is the true parameter • The groups have the same proportion. • The true difference between proportions is 0. • The two categorical variables are independent. • Decide whether the data support this assumption
Inference for the Two-Sample Study pU5 = The true proportion of ITN use in households with children under five pO5 = The true proportion of ITN use in households with no children under five • Null hypothesis pU5 = pO5 Using ITN and having children under the age of five are independent • Research hypothesis pU5 ≠ pO5 Using ITN and having children under the age of five are dependent.
Planning • Balanced design? • Overall test or comparison between groups? • Estimation: Width of the interval Amount of variability • Comparison of means: Power Significance level Effect size
Comparing categorical outcomes between two or more groups Multiple-Sample Comparisons
Categorical Variables • Different research questions result in different types of categorical variables. The outcome does not have to be dichotomous. There can be more than two groups to compare.
Conditional Probabilities • Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Make a difference how you interpret Row proportion Column proportion • Same as when there were only two groups and only two categories in the outcome
Inference • As categorical variables can be dichotomous, nominal, or ordinal, different hypotheses are possible. May require different tests • Hypothesis tests Assume the null hypothesis is true Decide whether the data support this assumption
Two Nominal Variables • Is there an association between the type of roof and the type of net used?