# Statistics Micro Mini Multiple Regression - PowerPoint PPT Presentation

1 / 48

Statistics Micro Mini Multiple Regression. January 5-9, 2008 Beth Ayers. Tuesday 1pm-4pm Session. Dummy Variables Multiple regression Using quantitative and categorical explanatory variables Interactions among explanatory variables Linear regression vs. ANCOVA Two article critiques.

## Related searches for Statistics Micro Mini Multiple Regression

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Statistics Micro Mini Multiple Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Statistics Micro Mini Multiple Regression

January 5-9, 2008

Beth Ayers

### Tuesday 1pm-4pm Session

• Dummy Variables

• Multiple regression

• Using quantitative and categorical explanatory variables

• Interactions among explanatory variables

• Linear regression vs. ANCOVA

• Two article critiques

### Dummy Variables

• Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables

• For binary variables, the most frequently used codes are 0/1 and -1/+1

• For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1

• each subject can have a value of one for at most one of the explanatory variables

### Dummy Variables

• Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:

### Significance testing

• To test if X2 has an affect

• H0: ¯2 = 0

• H1: ¯2 ≠ 0

• This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables

• If ¯2 = 0, then there is no difference between the mean response of Tutor A and B

• If ¯2 ≠ 0, then ¯2 is the difference between the mean response for Tutor A and Tutor B

### Interpretation

• Y = ¯0 + ¯1*X1 + ¯2*X2

• Can think of this is two equations

• When X2 = 0

• Y = ¯0 + ¯1*X1

• When X2 = 1

• Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

• Then ¯0 + ¯1 is the new intercept for the case where X2 = 1

### Dummy Variables

• Suppose we have three tutors (A, B, C). Define:

• Tutor A is considered the baseline

### Interpretation

• Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3*X3

• Can think of this as three equations

• When X2 = 0 and X3 = 0

• Y = ¯0 + ¯1*X1

• When X2 = 1 and X3 = 0

• Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

• When X2 = 0 and X3 = 1

• Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯3) + ¯1*X1

### Interpretation

• ¯2 is then the difference between the mean response for Tutor A and Tutor B

• ¯3 is then the difference between the mean response for Tutor A and Tutor C

• To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline

• To informally compare them, one can look at the difference between ¯2 and ¯3

### Significance testing

• Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different

### Example

• Want to see if there is a gender effect in predicting efficiency

• Efficiency = ¯0 + ¯1*WPM + ¯2*Gender

• where

### Example

• Step 1

• F-statistic: 791

• P-value = 0.0000

• So at least one of the two variables is important in predicting Efficiency

### Example

• Step 2

• Test words per minute

• T-statistic: -38.34

• P-value = 0.000

• Test Gender

• T-statistic: -11.32

• P-value = 0.000

• Both words per minute and gender are important in predicting efficiency

### Example

• Regression Equations

• Males

• Efficiency = 84.77 – 0.49¢WPM

• Females

• Efficiency = 84.77 – 0.49¢WPM – 3.14¢1

• Efficiency = 81.63 – 0.49¢WPM

### Interpretation of the Parameters

• For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes

• For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes

### Interaction

• An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable)

• An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable

• In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables

### Interpretation

• Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

• Assume that X2 is a dummy variable and X1*X2 is the interaction

• Again, can think of this is two equations

• When X2 = 0

• Y = ¯0 + ¯1*X1

• When X2 = 1

• Y = ¯0 + ¯1*X1 + ¯2*1+ ¯3* X1*1

= (¯0 + ¯2) + (¯1 + ¯3)* X1

• We can think of this as a new intercept and new slope for the case where X2 = 1

### Interpretation

• Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

• ¯3 is called the interaction effect

• ¯1 and ¯2 are called main effects

### Interpretation

• If ¯3 is not significant, drop the interaction and rerun the regression

• Including the interaction, when it is not significant, can alter the interpretations of the other variables

• If ¯3 is significant, do not need to check if ¯1 and ¯2 are significant. We will always keep X1 and X2 in the regression

### Interaction Example

• Suppose we have two versions of a tutor and we want to know which helps students study for a math test

• In addition, we want to know if a student’s SAT math score affects their exam score

• We know which tutor each student used and we also have their SAT score and

### Interaction Example

• Sample output

### Interaction Example

• Step 1: are any of the variables significant in predicting exam score

• F-statistic: 6025

• P-value = 0.000

• Step 2: check interaction first

• T-statistic: 15.980

• P-value = 0.000

• Do not need to check main effects since the interaction is significant

### Interaction Example

• Regression equation

• Tutor A (tutor = 1)

• Exam score = (2.62 + 6.39) + (0.06+0.05) MathSAT

• Exam score = 9.01 + 0.11 •MathSAT

• Tutor B (tutor = 0)

• Exam score = 2.62 + 0.06 •MathSAT

### Interpretation of Coefficients

• On average, students using Tutor A have scores 6.39 points higher than students using tutor B

• For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11

• For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06

### Example

• Explanatory variables

• GPA (0-5 scale)

• Math SAT score

• Time on tutor (in hours)

• Tutor used (A, B, C)

• Response variable

• Exam score

### The Regression

• Think that time on tutor and the type of tutor may have an interaction

### Analysis

• Step 1

• F-stat = 769.5p-value = 0.000

• Step 2

• Test the interactions first

• Test Time * Tutor B

• T-statistic: -0.727 P-value = 0.471

• Test Time * Tutor C

• T-statistic: -0.195P-value = 0.847

### Next Steps

• Since neither interaction is significant, I would drop those two variables and rerun the regression

• Including the interaction, when it is not significant, can alter the interpretations of the other variables

### Analysis

• Step 1

• F-stat = 1111p-value = 0.000

• Step 2

• Test gpa

• T-statistic: 10.28P-value = 0.000

• Test Math SAT score

• T-statistic: 70.03P-value = 0.000

• Test time on tutor

• T-statistic: -0.43P-value = 0.672

• Test Tutor B

• T-statistic: -10.52P-value = 0.000

• Test Tutor C

• T-statistic: 2.60P-value = 0.0128

### Next step

• Time on tutor is not significant

• Drop time and rerun

### Analysis

• Step 1

• F-stat = 1414p-value = 0.000

• Step 2

• Test gpa

• T-statistic: 10.51P-value = 0.000

• Test Math SAT score

• T-statistic: 70.69P-value = 0.000

• Test Tutor B

• T-statistic: -10.80P-value = 0.000

• Test Tutor C

• T-statistic: 2.67P-value = 0.011

### Interpretation

• For each addition GPA point, a student scores on average 2.1 points higher on the final exam

• For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam

### Interpretation of Dummy Variables

• Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A

• Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A

### Interpretation of Dummy Variables

• We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B

• However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline

• Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance

### Example

• Suppose we have the following regression

• Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB - 1.44*TutorC + 1.8*time*TutorB - 1.7*time*TutorC

• Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant.

• Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone

### Interpretation

• Tutor A

• Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time

• Tutor B

• Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB + 1.8*time*TutorB

• Exam Score = (2.7 +1.01) + 3.21*gpa + 0.18*MathSAT + (1.3 + 1.8)*time

• Exam Score = 3.71 + 3.21*gpa + 0.18*MathSAT + 3.1*time

• Tutor C

• Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time - 1.44*TutorC - 1.7*time*TutorC

• Exam Score = (2.7 -1.44) + 3.21*gpa + 0.18*MathSAT + (1.3 - 1.7)*time

• Exam Score = 1.26 + 3.21*gpa + 0.18*MathSAT - 0.40*time

### Interpretation

• For each additional point in GPA, a student’s exam score increases by 3.21

• For each additional point in Math SAT, a student’s exam score increases by 0.18

• Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A

• Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A

### Interpretation

• Students using Tutor A

• For each additional minute on the tutor, students exam scores increase by 1.3

• Students using Tutor B

• For each additional minute on the tutor, students exam scores increase by 3.1

• Students using Tutor C

• For each additional minute on the tutor, students exam scores decrease by 0.40

### ANCOVA

• Analysis of Covariance

• At least one quantitative and one categorical explanatory variable

• In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable

• It is a blending of regression and ANOVA

### ANCOVA

• Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models

• Will get the same results in either case!

• Different statistical packages make one or the other easier to run

• It is a matter of preference and interpretation