- 115 Views
- Uploaded on
- Presentation posted in: General

Statistics Micro Mini Multiple Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Statistics Micro Mini Multiple Regression

January 5-9, 2008

Beth Ayers

- Dummy Variables
- Multiple regression
- Using quantitative and categorical explanatory variables
- Interactions among explanatory variables
- Linear regression vs. ANCOVA

- Two article critiques

- Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables
- For binary variables, the most frequently used codes are 0/1 and -1/+1
- For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1
- each subject can have a value of one for at most one of the explanatory variables

- Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:

- To test if X2 has an affect
- H0: ¯2 = 0
- H1: ¯2 ≠ 0
- This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables

- If ¯2 = 0, then there is no difference between the mean response of Tutor A and B
- If ¯2 ≠ 0, then ¯2 is the difference between the mean response for Tutor A and Tutor B

- Y = ¯0 + ¯1*X1 + ¯2*X2
- Can think of this is two equations
- When X2 = 0
- Y = ¯0 + ¯1*X1

- When X2 = 1
- Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

- When X2 = 0
- Then ¯0 + ¯1 is the new intercept for the case where X2 = 1

- Suppose we have three tutors (A, B, C). Define:
- Tutor A is considered the baseline

- Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3*X3
- Can think of this as three equations
- When X2 = 0 and X3 = 0
- Y = ¯0 + ¯1*X1

- When X2 = 1 and X3 = 0
- Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

- When X2 = 0 and X3 = 1
- Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯3) + ¯1*X1

- When X2 = 0 and X3 = 0

- ¯2 is then the difference between the mean response for Tutor A and Tutor B
- ¯3 is then the difference between the mean response for Tutor A and Tutor C
- To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline
- To informally compare them, one can look at the difference between ¯2 and ¯3

- Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different

- Want to see if there is a gender effect in predicting efficiency
- Efficiency = ¯0 + ¯1*WPM + ¯2*Gender
- where

- Step 1
- F-statistic: 791
- P-value = 0.0000

- So at least one of the two variables is important in predicting Efficiency

- Step 2
- Test words per minute
- T-statistic: -38.34
- P-value = 0.000

- Test Gender
- T-statistic: -11.32
- P-value = 0.000

- Test words per minute
- Both words per minute and gender are important in predicting efficiency

- Regression Equations
- Males
- Efficiency = 84.77 – 0.49¢WPM

- Females
- Efficiency = 84.77 – 0.49¢WPM – 3.14¢1
- Efficiency = 81.63 – 0.49¢WPM

- Males

- For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes
- For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes

- An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable)
- An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable
- In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables

- Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2
- Assume that X2 is a dummy variable and X1*X2 is the interaction
- Again, can think of this is two equations
- When X2 = 0
- Y = ¯0 + ¯1*X1

- When X2 = 1
- Y = ¯0 + ¯1*X1 + ¯2*1+ ¯3* X1*1
= (¯0 + ¯2) + (¯1 + ¯3)* X1

- Y = ¯0 + ¯1*X1 + ¯2*1+ ¯3* X1*1

- When X2 = 0
- We can think of this as a new intercept and new slope for the case where X2 = 1

- Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2
- ¯3 is called the interaction effect
- ¯1 and ¯2 are called main effects

- If ¯3 is not significant, drop the interaction and rerun the regression
- Including the interaction, when it is not significant, can alter the interpretations of the other variables

- If ¯3 is significant, do not need to check if ¯1 and ¯2 are significant. We will always keep X1 and X2 in the regression

- Suppose we have two versions of a tutor and we want to know which helps students study for a math test
- In addition, we want to know if a student’s SAT math score affects their exam score
- We know which tutor each student used and we also have their SAT score and

- Sample output

- Step 1: are any of the variables significant in predicting exam score
- F-statistic: 6025
- P-value = 0.000

- Step 2: check interaction first
- T-statistic: 15.980
- P-value = 0.000

- Do not need to check main effects since the interaction is significant

- Regression equation
- Tutor A (tutor = 1)
- Exam score = (2.62 + 6.39) + (0.06+0.05) MathSAT
- Exam score = 9.01 + 0.11 •MathSAT

- Tutor B (tutor = 0)
- Exam score = 2.62 + 0.06 •MathSAT

- On average, students using Tutor A have scores 6.39 points higher than students using tutor B
- For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11
- For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06

- Explanatory variables
- GPA (0-5 scale)
- Math SAT score
- Time on tutor (in hours)
- Tutor used (A, B, C)

- Response variable
- Exam score

- Think that time on tutor and the type of tutor may have an interaction

- Step 1
- F-stat = 769.5p-value = 0.000

- Step 2
- Test the interactions first
- Test Time * Tutor B
- T-statistic: -0.727 P-value = 0.471

- Test Time * Tutor C
- T-statistic: -0.195P-value = 0.847

- Since neither interaction is significant, I would drop those two variables and rerun the regression
- Including the interaction, when it is not significant, can alter the interpretations of the other variables

- Step 1
- F-stat = 1111p-value = 0.000

- Step 2
- Test gpa
- T-statistic: 10.28P-value = 0.000

- Test Math SAT score
- T-statistic: 70.03P-value = 0.000

- Test time on tutor
- T-statistic: -0.43P-value = 0.672

- Test Tutor B
- T-statistic: -10.52P-value = 0.000

- Test Tutor C
- T-statistic: 2.60P-value = 0.0128

- Test gpa

- Time on tutor is not significant
- Drop time and rerun

- Step 1
- F-stat = 1414p-value = 0.000

- Step 2
- Test gpa
- T-statistic: 10.51P-value = 0.000

- Test Math SAT score
- T-statistic: 70.69P-value = 0.000

- Test Tutor B
- T-statistic: -10.80P-value = 0.000

- Test Tutor C
- T-statistic: 2.67P-value = 0.011

- Test gpa

- For each addition GPA point, a student scores on average 2.1 points higher on the final exam
- For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam

- Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A
- Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A

- We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B
- However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline
- Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance

- Suppose we have the following regression
- Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB - 1.44*TutorC + 1.8*time*TutorB - 1.7*time*TutorC
- Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant.
- Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone

- Tutor A
- Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time

- Tutor B
- Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB + 1.8*time*TutorB
- Exam Score = (2.7 +1.01) + 3.21*gpa + 0.18*MathSAT + (1.3 + 1.8)*time
- Exam Score = 3.71 + 3.21*gpa + 0.18*MathSAT + 3.1*time

- Tutor C
- Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time - 1.44*TutorC - 1.7*time*TutorC
- Exam Score = (2.7 -1.44) + 3.21*gpa + 0.18*MathSAT + (1.3 - 1.7)*time
- Exam Score = 1.26 + 3.21*gpa + 0.18*MathSAT - 0.40*time

- For each additional point in GPA, a student’s exam score increases by 3.21
- For each additional point in Math SAT, a student’s exam score increases by 0.18
- Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A
- Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A

- Students using Tutor A
- For each additional minute on the tutor, students exam scores increase by 1.3

- Students using Tutor B
- For each additional minute on the tutor, students exam scores increase by 3.1

- Students using Tutor C
- For each additional minute on the tutor, students exam scores decrease by 0.40

- Analysis of Covariance
- At least one quantitative and one categorical explanatory variable
- In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable
- It is a blending of regression and ANOVA

- Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models
- Will get the same results in either case!
- Different statistical packages make one or the other easier to run
- It is a matter of preference and interpretation