1 / 21

Clinical Research Training Program 2021

Clinical Research Training Program 2021. DUMMY VARIABLES IN REGRESSION. Fall 2004. www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html. OUTLINE. Indicator Variables (Dummy Variables) Comparing Two Straight-Line Regression Equations Test for equal slope (testing for parallelism )

aulani
Download Presentation

Clinical Research Training Program 2021

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clinical Research Training Program 2021 DUMMY VARIABLES IN REGRESSION Fall 2004 www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html

  2. OUTLINE • Indicator Variables (Dummy Variables) • Comparing Two Straight-Line Regression Equations • Test for equal slope (testing for parallelism) • Test for equal intercept • Test for coincidence • Comparing Three Regression Equations

  3. Indicator Variables in Regression • An indicator variable (or dummy variable) is any variable in a regression that takes on a finite number of values for the purpose of identifying different categories of a nominal variable. • Example:

  4. Indicator Variables in Regression • If the nominal independent variable has k levels, k-1 dummy variables must be defined to index those levels, provided the regression model contains a constant term. Example: Variables X1 and X2 work in tandem will describe the nominal variable “geographical residence”, which has three levels: residence in western, eastern or central US. Residences in central US will correspond to X1 = 0 and X2 = 0 and we will call this group as the “baseline” group or “reference” group.

  5. Indicator Variables in Regression • If an intercept is used in the regression equation, proper definition of the k-1 dummy variables automatically indexes all k categories. • If k dummy variables are used to describe a nominal variable with k categories in a model containing an intercept, all the coefficients in the model cannot be uniquely estimated because collinearity is present.

  6. Indicator Variables in Regression • The k-1 dummy variables for indexing the k categories of a given nominal variable can be properly defined in many different ways.

  7. Comparing Two Straight-line Regression Equations Example: Comparison by gender of straight-line regression of systolic blood pressure on age. Male Female

  8. Comparing Two Straight-line Regression Equations Three basic questions to consider when comparing two straight-line regression equations: • Are the two slopes the same or different (regardless of whether the intercept are different)? • Are the two intercepts the same or different (regardless of whether the slopes are different)? • Are the two lines coincident (that is, the same), or do they differ in slope and/or intercept?

  9. Methods of Comparing Two Straight Lines • Treat the male and female data separately by fitting the two separate regression equations and then make appropriate two-sample t tests. Male: Female:

  10. HYPOTHESIS TESTS Using separate regression fits to test for equal slope (testing for parallelism) • H0: 1M = 1FH1: 1M  1F • Test statistic where sp2 is the MSE for the combined data If |T| > tn-4,1-/2, then reject H0.

  11. HYPOTHESIS TESTS Using separate regression fits to test for equal intercept • H0: 0M = 0FH1: 0M  0F • Test statistic where Sp2 is the MSE for the combined data If |T| > tn-4,1-/2, then reject H0.

  12. Methods of Comparing Two Straight Lines: Using a Single Regression Equation • Define the dummy variable Z to be 0 if the subject is male and 1 if female. Then, for combined data, fit the regression model:

  13. Methods of Comparing Two Straight Lines • This regression model yields the following two models for the two values of Z: Male (Z=0): Female (Z=1):

  14. Methods of Comparing Two Straight Lines • This allows us to write the regression coefficients for the separate models in the first method in terms of the coefficients of the model in the second method as: Thus, model in the second method incorporates the two separate regression equations within a single model and allows for different slopes and different intercepts.

  15. HYPOTHESIS TESTS Using single regression fits to test for equal slope (testing for parallelism) • H0: 3 = 0H1: 3  0 • Test statistic • Decision: If F > F1, df(MSE(X,Z,XZ)), 1-, then reject H0.

  16. HYPOTHESIS TESTS Using single regression fits to test for equal intercept. • H0: 2 = 0H1: 2  0 • Test statistic • Decision: If F > F1, df(MSE(X,Z,XZ)), 1-, then reject H0.

  17. HYPOTHESIS TESTS Using single regression fits to test for coincidence • H0: 2 = 3 = 0H1: 2  0 or 3  0 • Test statistic • Decision: If F > F2, df(MSE(X,Z,XZ)), 1-, then reject H0.

  18. HYPOTHESIS TESTS Comparison of Method I and II  • The tests for parallel lines are exactly equivalent. • The tests for coincident lines differ, and the one using the dummy variable model is generally preferable. • If we test the coincidence from separate straight-line fits rather than a single test, the overall significance level for the two tests combined is greater than , that is, there is more chance of rejecting a true H0. ( is the significance level of each separate test.)

  19. Comparing Three Regression Equations Example: Y = SBP, X1 = Age, X2 = Weight Z1 = High school education Z2 = Education greater than high school (baseline = education less than high school) Regressions: Edu<HS Edu=HS Edu>HS

  20. HYPOTHESIS TESTS Test all three regression lines are parallel • H0: 5 = 6 = 7 = 8 = 0 • Test statistic • Decision: If F > F4, df(MSE(full model)), 1-, then reject H0.

  21. HYPOTHESIS TESTS Test all three regression lines are coincident • H0: 3 = 4 = 5 = 6 = 7 = 8 = 0 • Test statistic • Decision: If F > F6, df(MSE(full model)), 1-, then reject H0.

More Related