210 likes | 456 Views
Clinical Research Training Program 2021. DUMMY VARIABLES IN REGRESSION. Fall 2004. www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html. OUTLINE. Indicator Variables (Dummy Variables) Comparing Two Straight-Line Regression Equations Test for equal slope (testing for parallelism )
E N D
Clinical Research Training Program 2021 DUMMY VARIABLES IN REGRESSION Fall 2004 www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html
OUTLINE • Indicator Variables (Dummy Variables) • Comparing Two Straight-Line Regression Equations • Test for equal slope (testing for parallelism) • Test for equal intercept • Test for coincidence • Comparing Three Regression Equations
Indicator Variables in Regression • An indicator variable (or dummy variable) is any variable in a regression that takes on a finite number of values for the purpose of identifying different categories of a nominal variable. • Example:
Indicator Variables in Regression • If the nominal independent variable has k levels, k-1 dummy variables must be defined to index those levels, provided the regression model contains a constant term. Example: Variables X1 and X2 work in tandem will describe the nominal variable “geographical residence”, which has three levels: residence in western, eastern or central US. Residences in central US will correspond to X1 = 0 and X2 = 0 and we will call this group as the “baseline” group or “reference” group.
Indicator Variables in Regression • If an intercept is used in the regression equation, proper definition of the k-1 dummy variables automatically indexes all k categories. • If k dummy variables are used to describe a nominal variable with k categories in a model containing an intercept, all the coefficients in the model cannot be uniquely estimated because collinearity is present.
Indicator Variables in Regression • The k-1 dummy variables for indexing the k categories of a given nominal variable can be properly defined in many different ways.
Comparing Two Straight-line Regression Equations Example: Comparison by gender of straight-line regression of systolic blood pressure on age. Male Female
Comparing Two Straight-line Regression Equations Three basic questions to consider when comparing two straight-line regression equations: • Are the two slopes the same or different (regardless of whether the intercept are different)? • Are the two intercepts the same or different (regardless of whether the slopes are different)? • Are the two lines coincident (that is, the same), or do they differ in slope and/or intercept?
Methods of Comparing Two Straight Lines • Treat the male and female data separately by fitting the two separate regression equations and then make appropriate two-sample t tests. Male: Female:
HYPOTHESIS TESTS Using separate regression fits to test for equal slope (testing for parallelism) • H0: 1M = 1FH1: 1M 1F • Test statistic where sp2 is the MSE for the combined data If |T| > tn-4,1-/2, then reject H0.
HYPOTHESIS TESTS Using separate regression fits to test for equal intercept • H0: 0M = 0FH1: 0M 0F • Test statistic where Sp2 is the MSE for the combined data If |T| > tn-4,1-/2, then reject H0.
Methods of Comparing Two Straight Lines: Using a Single Regression Equation • Define the dummy variable Z to be 0 if the subject is male and 1 if female. Then, for combined data, fit the regression model:
Methods of Comparing Two Straight Lines • This regression model yields the following two models for the two values of Z: Male (Z=0): Female (Z=1):
Methods of Comparing Two Straight Lines • This allows us to write the regression coefficients for the separate models in the first method in terms of the coefficients of the model in the second method as: Thus, model in the second method incorporates the two separate regression equations within a single model and allows for different slopes and different intercepts.
HYPOTHESIS TESTS Using single regression fits to test for equal slope (testing for parallelism) • H0: 3 = 0H1: 3 0 • Test statistic • Decision: If F > F1, df(MSE(X,Z,XZ)), 1-, then reject H0.
HYPOTHESIS TESTS Using single regression fits to test for equal intercept. • H0: 2 = 0H1: 2 0 • Test statistic • Decision: If F > F1, df(MSE(X,Z,XZ)), 1-, then reject H0.
HYPOTHESIS TESTS Using single regression fits to test for coincidence • H0: 2 = 3 = 0H1: 2 0 or 3 0 • Test statistic • Decision: If F > F2, df(MSE(X,Z,XZ)), 1-, then reject H0.
HYPOTHESIS TESTS Comparison of Method I and II • The tests for parallel lines are exactly equivalent. • The tests for coincident lines differ, and the one using the dummy variable model is generally preferable. • If we test the coincidence from separate straight-line fits rather than a single test, the overall significance level for the two tests combined is greater than , that is, there is more chance of rejecting a true H0. ( is the significance level of each separate test.)
Comparing Three Regression Equations Example: Y = SBP, X1 = Age, X2 = Weight Z1 = High school education Z2 = Education greater than high school (baseline = education less than high school) Regressions: Edu<HS Edu=HS Edu>HS
HYPOTHESIS TESTS Test all three regression lines are parallel • H0: 5 = 6 = 7 = 8 = 0 • Test statistic • Decision: If F > F4, df(MSE(full model)), 1-, then reject H0.
HYPOTHESIS TESTS Test all three regression lines are coincident • H0: 3 = 4 = 5 = 6 = 7 = 8 = 0 • Test statistic • Decision: If F > F6, df(MSE(full model)), 1-, then reject H0.