Create Presentation
Download Presentation

Download Presentation
## Categorical Outcomes Making Comparisons

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Categorical Outcomes Making Comparisons**Chapter 4**Outline**• Describing: Numerical summaries Graphical summaries • One-sample comparisons: Historical controls • Multiple-sample comparisons: Dichotomous outcome Categorical outcomes • Measures of association**Categorical Outcomes**• Gaps: Only limited number of values/categories possible Nothing “in-between” • Examples: Dichotomous (two categories) Nominal (categories without order) Ordinal (categories with order)**Learning Objectives**• How do I describe categorical data? • How do I make comparisons? • How do I investigate associations?**Public Health Application**• More than three-quartersof global malaria deaths occur in under-five children living in malarious countries in sub-Saharan Africa. 25% of all childhood mortality below the age of five is attributable to malaria. About 30–40% of all fevers seen in health centers in Africa are due to malaria with huge seasonal variability between rainy and dry seasons.**Data Description**• Cross-sectional study conducted to investigate factors related to insecticide-treated net (ITN) use: 1876 households with an ITN: • Demographic variables (age of the head of household, household wealth, miles to the nearest healthcare facility, rural/urban, family size, etc.) • Children under the age of five? • Was an ITN used the previous night?**Research Question**What factors are associated with ITN use?**Describing the Data**• Numerical summaries: Counts, proportions, and percentages • Graphical summaries: Pie charts Bar graphs**Most Important Step in Data Analysis**• Describe the data: Before making conclusions or inferences, an investigator needs to fully understand what the data looks like. • Numerical and graphical summaries cannot be skipped! Need this information to choose the most appropriate statistical method Need this information for valid statistical inferences**Bar Graphs**• Provide a visual comparison among groups. Vertical axis represents the number of subjects. • The higher the bar, the more the subjects. Horizontal axis represents categories. • Ordinal: Order matters. • Nominal: Order does not matter.**Bar Graphs**Ordinal Variable Nominal Variable**Bar Graphs**• Graphically compare groups for some categorical outcome.**Pie Charts**• Provides a visual description of how parts compare to a whole**Numerical Summaries**• Categorical variables are described by reporting the number of subjects within each category. Counts Proportions Percentages**Proportion**• The fraction of the subjects belonging to a particular category. • The proportion of the population is a parameter. • The proportion of the sample is a statistic**Description of the Sample**• A sample 1876 of households living in a tropical region where malaria is problematic: • The majority (51%) of the households are more than 50 miles from a healthcare facility and live in a rural area (53%). • Almost half (44%) of the households have a child under the age of five. • The average age for the head of the household is 48 (SD = 7.4). • Median family size of 6 with a range of 1–12. • Most (73%) of the households did not use an ITN the previous night.**Why a One-Sample Study?**• Obtaining an additional group or sample for comparisons may not be practical. Comparisons involve historical control(s).**Historical Controls**• Want to compare what you found in the sample to something: Do your results differ from what has been previously published/reported? • Historical controls: Control data are not collected concurrently within the same study. • different time period • different region • different population • different kind of exposure • Seems economical—why not use historical controls all the time?**One-Sample StudyITN Utilization**• Data for this study were collected during the rainy season. How do the results compare with those of the dry season? • Is the season (rainy or dry) associated with the utilization of ITN?**Inference for the One-Sample Study**• Hypothesis tests • Assume the null parameter is the true parameter Historical control study: Null parameter = Historical value • Decide whether the data support this assumption • Confidence intervals • Estimate the true parameter using interval • Can use the interval estimate to determine if assumptions about the parameter are reasonable**Inference for the One-Sample StudyHistorical Controls**• Research hypothesis: The true proportion (p) in the rainy season is not 0.20. • Null hypothesis: The true proportion (p) in the rainy season is 0.20.**Planning**• Estimation: Width of the interval Estimate of the proportion • Comparison of proportions: Power Significance level Effect size**Exact Tests**• When the sample size is large (and the proportion is not too small), the normal approximation is used. What if this is not reasonable? • Exact tests allow for comparisons without using the normal distribution. Use binomial distribution.**Comparing a dichotomous outcome between two groups**MULTIPLE-SAMPLE Comparisons**Description of the Sample**• Households with children under five (n = 833) and without (n = 1043): • Similar with respect to age and family size. • Those with children under five in the household report more net use than those without children under 5 (34% vs 21%).**Description of the Sample**Households using ITN (n = 500) • Report a higher percentage of children under five • Are more likely to live in a thatched roof • Have a higher percentage of households living within 15 miles of a healthcare facility • Are more likely to live in a rural area • Have, on average, younger household heads • Have larger families**Why a Two-Sample Study?**• Provides an independent comparator group: Treatment vs control Exposed vs unexposed • Different outcomes between the groups may mean that the group is associated with the outcome.**Conditional Probabilities**• Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Makes a difference how you interpret Row proportion Column proportion**Difference in Proportions**• Statistical test does not care if you are comparing differences between column proportions and row proportions. • A difference in proportions translates to the two categorical variables being dependent.**Two-Sample Study**• Does having a child under the age of five impact the utilization of ITN?**Inference for the Two-Sample Study**Hypothesis tests • Assume the null parameter is the true parameter • The groups have the same proportion. • The true difference between proportions is 0. • The two categorical variables are independent. • Decide whether the data support this assumption**Inference for the Two-Sample Study**pU5 = The true proportion of ITN use in households with children under five pO5 = The true proportion of ITN use in households with no children under five • Null hypothesis pU5 = pO5 Using ITN and having children under the age of five are independent • Research hypothesis pU5 ≠ pO5 Using ITN and having children under the age of five are dependent.**Planning**• Balanced design? • Overall test or comparison between groups? • Estimation: Width of the interval Amount of variability • Comparison of means: Power Significance level Effect size**Comparing categorical outcomes between two or more groups**Multiple-Sample Comparisons**Categorical Variables**• Different research questions result in different types of categorical variables. The outcome does not have to be dichotomous. There can be more than two groups to compare.**Conditional Probabilities**• Proportion of subjects with a category given some other condition is true • Really an issue of what is the denominator • Make a difference how you interpret Row proportion Column proportion • Same as when there were only two groups and only two categories in the outcome**Inference**• As categorical variables can be dichotomous, nominal, or ordinal, different hypotheses are possible. May require different tests • Hypothesis tests Assume the null hypothesis is true Decide whether the data support this assumption**Two Nominal Variables**• Is there an association between the type of roof and the type of net used?