Statistics micro mini multiple regression l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Statistics Micro Mini Multiple Regression PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on
  • Presentation posted in: General

Statistics Micro Mini Multiple Regression. January 5-9, 2008 Beth Ayers. Tuesday 1pm-4pm Session. Dummy Variables Multiple regression Using quantitative and categorical explanatory variables Interactions among explanatory variables Linear regression vs. ANCOVA Two article critiques.

Download Presentation

Statistics Micro Mini Multiple Regression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Statistics micro mini multiple regression l.jpg

Statistics Micro Mini Multiple Regression

January 5-9, 2008

Beth Ayers


Tuesday 1pm 4pm session l.jpg

Tuesday 1pm-4pm Session

  • Dummy Variables

  • Multiple regression

    • Using quantitative and categorical explanatory variables

    • Interactions among explanatory variables

    • Linear regression vs. ANCOVA

  • Two article critiques


Dummy variables l.jpg

Dummy Variables

  • Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables

  • For binary variables, the most frequently used codes are 0/1 and -1/+1

  • For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1

    • each subject can have a value of one for at most one of the explanatory variables


Dummy variables4 l.jpg

Dummy Variables

  • Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:


Significance testing l.jpg

Significance testing

  • To test if X2 has an affect

    • H0: ¯2 = 0

    • H1: ¯2 ≠ 0

    • This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables

  • If ¯2 = 0, then there is no difference between the mean response of Tutor A and B

  • If ¯2 ≠ 0, then ¯2 is the difference between the mean response for Tutor A and Tutor B


Interpretation l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2

  • Can think of this is two equations

    • When X2 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

  • Then ¯0 + ¯1 is the new intercept for the case where X2 = 1


Dummy variables7 l.jpg

Dummy Variables

  • Suppose we have three tutors (A, B, C). Define:

  • Tutor A is considered the baseline


Interpretation8 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3*X3

  • Can think of this as three equations

    • When X2 = 0 and X3 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1 and X3 = 0

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

    • When X2 = 0 and X3 = 1

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯3) + ¯1*X1


Interpretation9 l.jpg

Interpretation

  • ¯2 is then the difference between the mean response for Tutor A and Tutor B

  • ¯3 is then the difference between the mean response for Tutor A and Tutor C

  • To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline

    • To informally compare them, one can look at the difference between ¯2 and ¯3


Significance testing10 l.jpg

Significance testing

  • Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different


Example l.jpg

Example

  • Want to see if there is a gender effect in predicting efficiency

  • Efficiency = ¯0 + ¯1*WPM + ¯2*Gender

    • where


Example12 l.jpg

Example


Example13 l.jpg

Example


Example14 l.jpg

Example

  • Step 1

    • F-statistic: 791

    • P-value = 0.0000

  • So at least one of the two variables is important in predicting Efficiency


Example15 l.jpg

Example

  • Step 2

    • Test words per minute

      • T-statistic: -38.34

      • P-value = 0.000

    • Test Gender

      • T-statistic: -11.32

      • P-value = 0.000

  • Both words per minute and gender are important in predicting efficiency


Example16 l.jpg

Example

  • Regression Equations

    • Males

      • Efficiency = 84.77 – 0.49¢WPM

    • Females

      • Efficiency = 84.77 – 0.49¢WPM – 3.14¢1

      • Efficiency = 81.63 – 0.49¢WPM


Interpretation of the parameters l.jpg

Interpretation of the Parameters

  • For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes

  • For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes


Interaction l.jpg

Interaction

  • An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable)

  • An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable

  • In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables


Interpretation19 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

  • Assume that X2 is a dummy variable and X1*X2 is the interaction

  • Again, can think of this is two equations

    • When X2 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1

      • Y = ¯0 + ¯1*X1 + ¯2*1+ ¯3* X1*1

        = (¯0 + ¯2) + (¯1 + ¯3)* X1

  • We can think of this as a new intercept and new slope for the case where X2 = 1


Interpretation20 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

  • ¯3 is called the interaction effect

  • ¯1 and ¯2 are called main effects


Interpretation21 l.jpg

Interpretation

  • If ¯3 is not significant, drop the interaction and rerun the regression

    • Including the interaction, when it is not significant, can alter the interpretations of the other variables

  • If ¯3 is significant, do not need to check if ¯1 and ¯2 are significant. We will always keep X1 and X2 in the regression


Interaction example l.jpg

Interaction Example

  • Suppose we have two versions of a tutor and we want to know which helps students study for a math test

  • In addition, we want to know if a student’s SAT math score affects their exam score

  • We know which tutor each student used and we also have their SAT score and


Interaction example eda l.jpg

Interaction Example - EDA


Interaction example24 l.jpg

Interaction Example

  • Sample output


Interaction example25 l.jpg

Interaction Example

  • Step 1: are any of the variables significant in predicting exam score

    • F-statistic: 6025

    • P-value = 0.000

  • Step 2: check interaction first

    • T-statistic: 15.980

    • P-value = 0.000

  • Do not need to check main effects since the interaction is significant


Interaction example26 l.jpg

Interaction Example

  • Regression equation

  • Tutor A (tutor = 1)

    • Exam score = (2.62 + 6.39) + (0.06+0.05) MathSAT

    • Exam score = 9.01 + 0.11 •MathSAT

  • Tutor B (tutor = 0)

    • Exam score = 2.62 + 0.06 •MathSAT


Interpretation of coefficients l.jpg

Interpretation of Coefficients

  • On average, students using Tutor A have scores 6.39 points higher than students using tutor B

  • For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11

  • For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06


Interaction example28 l.jpg

Interaction Example


Example29 l.jpg

Example

  • Explanatory variables

    • GPA (0-5 scale)

    • Math SAT score

    • Time on tutor (in hours)

    • Tutor used (A, B, C)

  • Response variable

    • Exam score


Exploratory analysis l.jpg

Exploratory Analysis


Plots l.jpg

Plots


The regression l.jpg

The Regression

  • Think that time on tutor and the type of tutor may have an interaction


Analysis l.jpg

Analysis

  • Step 1

    • F-stat = 769.5p-value = 0.000

  • Step 2

    • Test the interactions first

    • Test Time * Tutor B

      • T-statistic: -0.727 P-value = 0.471

    • Test Time * Tutor C

      • T-statistic: -0.195P-value = 0.847


Next steps l.jpg

Next Steps

  • Since neither interaction is significant, I would drop those two variables and rerun the regression

  • Including the interaction, when it is not significant, can alter the interpretations of the other variables


Updated regression l.jpg

Updated regression


Analysis36 l.jpg

Analysis

  • Step 1

    • F-stat = 1111p-value = 0.000

  • Step 2

    • Test gpa

      • T-statistic: 10.28P-value = 0.000

    • Test Math SAT score

      • T-statistic: 70.03P-value = 0.000

    • Test time on tutor

      • T-statistic: -0.43P-value = 0.672

    • Test Tutor B

      • T-statistic: -10.52P-value = 0.000

    • Test Tutor C

      • T-statistic: 2.60P-value = 0.0128


Next step l.jpg

Next step

  • Time on tutor is not significant

    • Drop time and rerun


Analysis38 l.jpg

Analysis

  • Step 1

    • F-stat = 1414p-value = 0.000

  • Step 2

    • Test gpa

      • T-statistic: 10.51P-value = 0.000

    • Test Math SAT score

      • T-statistic: 70.69P-value = 0.000

    • Test Tutor B

      • T-statistic: -10.80P-value = 0.000

    • Test Tutor C

      • T-statistic: 2.67P-value = 0.011


Interpretation39 l.jpg

Interpretation

  • For each addition GPA point, a student scores on average 2.1 points higher on the final exam

  • For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam


Interpretation of dummy variables l.jpg

Interpretation of Dummy Variables

  • Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A

  • Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A


Interpretation of dummy variables41 l.jpg

Interpretation of Dummy Variables

  • We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B

  • However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline

  • Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance


Check assumptions l.jpg

Check Assumptions


Example43 l.jpg

Example

  • Suppose we have the following regression

  • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB - 1.44*TutorC + 1.8*time*TutorB - 1.7*time*TutorC

  • Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant.

  • Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone


Interpretation44 l.jpg

Interpretation

  • Tutor A

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time

  • Tutor B

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB + 1.8*time*TutorB

    • Exam Score = (2.7 +1.01) + 3.21*gpa + 0.18*MathSAT + (1.3 + 1.8)*time

    • Exam Score = 3.71 + 3.21*gpa + 0.18*MathSAT + 3.1*time

  • Tutor C

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time - 1.44*TutorC - 1.7*time*TutorC

    • Exam Score = (2.7 -1.44) + 3.21*gpa + 0.18*MathSAT + (1.3 - 1.7)*time

    • Exam Score = 1.26 + 3.21*gpa + 0.18*MathSAT - 0.40*time


Interpretation45 l.jpg

Interpretation

  • For each additional point in GPA, a student’s exam score increases by 3.21

  • For each additional point in Math SAT, a student’s exam score increases by 0.18

  • Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A

  • Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A


Interpretation46 l.jpg

Interpretation

  • Students using Tutor A

    • For each additional minute on the tutor, students exam scores increase by 1.3

  • Students using Tutor B

    • For each additional minute on the tutor, students exam scores increase by 3.1

  • Students using Tutor C

    • For each additional minute on the tutor, students exam scores decrease by 0.40


Ancova l.jpg

ANCOVA

  • Analysis of Covariance

    • At least one quantitative and one categorical explanatory variable

    • In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable

    • It is a blending of regression and ANOVA


Ancova48 l.jpg

ANCOVA

  • Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models

  • Will get the same results in either case!

  • Different statistical packages make one or the other easier to run

  • It is a matter of preference and interpretation


  • Login