1 / 16

Criteria for choosing a reference category

Criteria for choosing a reference category. Jane E. Miller, PhD. Overview. What is a reference category? For independent variables (IVs) For the dependent variable (DV) Choosing reference categories based on: Theoretical criteria Previous literature on the topic Writing patterns

abogle
Download Presentation

Criteria for choosing a reference category

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Criteria for choosing a reference category Jane E. Miller, PhD

  2. Overview • What is a reference category? • For independent variables (IVs) • For the dependent variable (DV) • Choosing reference categories based on: • Theoretical criteria • Previous literature on the topic • Writing patterns • Sample size • Joint distribution of variables

  3. What is a reference category? • For each nominal or ordinal variable, the reference category is the one against which all other categories of that variable will be compared. • A multivariate model specification will not include a dummy variable for that category. • Sometimes called the “omitted” category. • Choice of a reference category for each categorical variable in your model should NOT be arbitrary.

  4. Multivariate coefficients and the reference category • OLS coefficients will estimate the difference in the DV for each of the other categories, compared to the reference category. • Logit models will estimate odds ratios of the outcome for each of the other categories of the IV compared to the reference category.

  5. Choosing a reference categorybased on theoretical criteria • Your specific research question will often determine choice of reference category. E.g., • If you are analyzing effects of a drug compared to placebo, the placebo condition is the logical reference category. • If you are comparing other states to your home state, your home state should be the reference category.

  6. Choosing a reference category based on prior literature • If previous studies of your topic have standard conventions of a reference category, often you will use it as your reference category as well. • Doing so facilitates comparison of results. • BUT, it is important to think through whether their choice fits your study. • Identify the reasons why others have chosen that reference category. • Check those reasons against your own.

  7. Choosing a different reference category than the prior literature • If you have strong reasons to use a different reference category than a major study of your topic: • In your methods section, explain the theoretical or empirical basis why you chose a different reference category. • In the discussion section, translate your results to compare against the same reference category as other leading studies.

  8. Choosing a reference category based on writing patterns • If your sentences tend to read “compared to group X,” then group X should be your reference category. • Doing so will ensure that your statistical calculations are consistent with how you will write about the results. • But see • Empirical criteria for sample size • Precedent in the literature

  9. Choosing a reference categorybased on sample size • Lacking some other basis for selecting a reference category, choose the largest (modal) group. • Doing so maximizes statistical power for estimating coefficients. • Sometimes this will mesh with theoretical criteria, as when the majority racial ethnic group is chosen as the reference category. • Sometimes, your “natural” reference category includes very few cases. • Might need to pick a different group to provide stable statistical estimates.

  10. Choosing reference categories based on joint distribution of variables • The overall reference category for a multivariate regression model is the combination of reference categories for each of your categorical variables. • Be sure that that combinationisn’t too rare. • E.g., teenagers with at least a college degree will be pretty unusual (if not definitionally impossible!), so don’t pick teenagers as the reference category for age and college+ as the reference category for education.

  11. Reference category for dependent variables • If you are analyzing a categorical dependent variable, you also need to decide which category to model, and which category is omitted. • If the DV is dichotomous (2-category), • You will model one category. • The other will be the omitted category of the DV. • E.g., if you model having health insurance, then being uninsured is the reference category.

  12. Reference category for a multichotomous dependent variable • If the DV is multichotomous (N-category), • You will separately model (N – 1) categories. • The other category will be the omitted category, for which no model is estimated. • E.g., if type of health insurance is a 4-category variable, • You will estimate separate models for 3 (= 4 – 1) of those categories. • For instance, you might model having public insurance, self-pay, and uninsured. • The other category (in this case private insurance) is the reference category.

  13. Summary • Choice of a reference category for each categorical variable in your model should NOT be arbitrary. • Consider the following criteria when selecting a reference category for each of your variables: • Theoretical • Previous literature • Writing patterns • Sample size • Joint distribution of variables in your data • Use the same criteria for choosing a reference category for the DV as for IVs.

  14. Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 8, section on choosing a reference category • Chapter 9, section on interpreting coefficients on categorical variables

  15. Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Questions #3e and 8e in the problem set for chapter 9 • Suggested course extensions for • Chapter 8 • “Applying statistics” exercise #2 • Chapter 9 • “Reviewing” exercise #1

  16. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html

More Related