Statistics Micro Mini Multiple Regression - PowerPoint PPT Presentation

Statistics micro mini multiple regression l.jpg
Download
1 / 48

  • 129 Views
  • Updated On :
  • Presentation posted in: General

Statistics Micro Mini Multiple Regression. January 5-9, 2008 Beth Ayers. Tuesday 1pm-4pm Session. Dummy Variables Multiple regression Using quantitative and categorical explanatory variables Interactions among explanatory variables Linear regression vs. ANCOVA Two article critiques.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Statistics Micro Mini Multiple Regression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Statistics micro mini multiple regression l.jpg

Statistics Micro Mini Multiple Regression

January 5-9, 2008

Beth Ayers


Tuesday 1pm 4pm session l.jpg

Tuesday 1pm-4pm Session

  • Dummy Variables

  • Multiple regression

    • Using quantitative and categorical explanatory variables

    • Interactions among explanatory variables

    • Linear regression vs. ANCOVA

  • Two article critiques


Dummy variables l.jpg

Dummy Variables

  • Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables

  • For binary variables, the most frequently used codes are 0/1 and -1/+1

  • For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1

    • each subject can have a value of one for at most one of the explanatory variables


Dummy variables4 l.jpg

Dummy Variables

  • Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:


Significance testing l.jpg

Significance testing

  • To test if X2 has an affect

    • H0: ¯2 = 0

    • H1: ¯2 ≠ 0

    • This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables

  • If ¯2 = 0, then there is no difference between the mean response of Tutor A and B

  • If ¯2 ≠ 0, then ¯2 is the difference between the mean response for Tutor A and Tutor B


Interpretation l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2

  • Can think of this is two equations

    • When X2 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

  • Then ¯0 + ¯1 is the new intercept for the case where X2 = 1


Dummy variables7 l.jpg

Dummy Variables

  • Suppose we have three tutors (A, B, C). Define:

  • Tutor A is considered the baseline


Interpretation8 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3*X3

  • Can think of this as three equations

    • When X2 = 0 and X3 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1 and X3 = 0

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯2) + ¯1*X1

    • When X2 = 0 and X3 = 1

      • Y = ¯0 +¯1*X1 + ¯2 * 1= (¯0 + ¯3) + ¯1*X1


Interpretation9 l.jpg

Interpretation

  • ¯2 is then the difference between the mean response for Tutor A and Tutor B

  • ¯3 is then the difference between the mean response for Tutor A and Tutor C

  • To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline

    • To informally compare them, one can look at the difference between ¯2 and ¯3


Significance testing10 l.jpg

Significance testing

  • Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different


Example l.jpg

Example

  • Want to see if there is a gender effect in predicting efficiency

  • Efficiency = ¯0 + ¯1*WPM + ¯2*Gender

    • where


Example12 l.jpg

Example


Example13 l.jpg

Example


Example14 l.jpg

Example

  • Step 1

    • F-statistic: 791

    • P-value = 0.0000

  • So at least one of the two variables is important in predicting Efficiency


Example15 l.jpg

Example

  • Step 2

    • Test words per minute

      • T-statistic: -38.34

      • P-value = 0.000

    • Test Gender

      • T-statistic: -11.32

      • P-value = 0.000

  • Both words per minute and gender are important in predicting efficiency


Example16 l.jpg

Example

  • Regression Equations

    • Males

      • Efficiency = 84.77 – 0.49¢WPM

    • Females

      • Efficiency = 84.77 – 0.49¢WPM – 3.14¢1

      • Efficiency = 81.63 – 0.49¢WPM


Interpretation of the parameters l.jpg

Interpretation of the Parameters

  • For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes

  • For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes


Interaction l.jpg

Interaction

  • An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable)

  • An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable

  • In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables


Interpretation19 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

  • Assume that X2 is a dummy variable and X1*X2 is the interaction

  • Again, can think of this is two equations

    • When X2 = 0

      • Y = ¯0 + ¯1*X1

    • When X2 = 1

      • Y = ¯0 + ¯1*X1 + ¯2*1+ ¯3* X1*1

        = (¯0 + ¯2) + (¯1 + ¯3)* X1

  • We can think of this as a new intercept and new slope for the case where X2 = 1


Interpretation20 l.jpg

Interpretation

  • Y = ¯0 + ¯1*X1 + ¯2*X2 + ¯3* X1*X2

  • ¯3 is called the interaction effect

  • ¯1 and ¯2 are called main effects


Interpretation21 l.jpg

Interpretation

  • If ¯3 is not significant, drop the interaction and rerun the regression

    • Including the interaction, when it is not significant, can alter the interpretations of the other variables

  • If ¯3 is significant, do not need to check if ¯1 and ¯2 are significant. We will always keep X1 and X2 in the regression


Interaction example l.jpg

Interaction Example

  • Suppose we have two versions of a tutor and we want to know which helps students study for a math test

  • In addition, we want to know if a student’s SAT math score affects their exam score

  • We know which tutor each student used and we also have their SAT score and


Interaction example eda l.jpg

Interaction Example - EDA


Interaction example24 l.jpg

Interaction Example

  • Sample output


Interaction example25 l.jpg

Interaction Example

  • Step 1: are any of the variables significant in predicting exam score

    • F-statistic: 6025

    • P-value = 0.000

  • Step 2: check interaction first

    • T-statistic: 15.980

    • P-value = 0.000

  • Do not need to check main effects since the interaction is significant


Interaction example26 l.jpg

Interaction Example

  • Regression equation

  • Tutor A (tutor = 1)

    • Exam score = (2.62 + 6.39) + (0.06+0.05) MathSAT

    • Exam score = 9.01 + 0.11 •MathSAT

  • Tutor B (tutor = 0)

    • Exam score = 2.62 + 0.06 •MathSAT


Interpretation of coefficients l.jpg

Interpretation of Coefficients

  • On average, students using Tutor A have scores 6.39 points higher than students using tutor B

  • For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11

  • For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06


Interaction example28 l.jpg

Interaction Example


Example29 l.jpg

Example

  • Explanatory variables

    • GPA (0-5 scale)

    • Math SAT score

    • Time on tutor (in hours)

    • Tutor used (A, B, C)

  • Response variable

    • Exam score


Exploratory analysis l.jpg

Exploratory Analysis


Plots l.jpg

Plots


The regression l.jpg

The Regression

  • Think that time on tutor and the type of tutor may have an interaction


Analysis l.jpg

Analysis

  • Step 1

    • F-stat = 769.5p-value = 0.000

  • Step 2

    • Test the interactions first

    • Test Time * Tutor B

      • T-statistic: -0.727 P-value = 0.471

    • Test Time * Tutor C

      • T-statistic: -0.195P-value = 0.847


Next steps l.jpg

Next Steps

  • Since neither interaction is significant, I would drop those two variables and rerun the regression

  • Including the interaction, when it is not significant, can alter the interpretations of the other variables


Updated regression l.jpg

Updated regression


Analysis36 l.jpg

Analysis

  • Step 1

    • F-stat = 1111p-value = 0.000

  • Step 2

    • Test gpa

      • T-statistic: 10.28P-value = 0.000

    • Test Math SAT score

      • T-statistic: 70.03P-value = 0.000

    • Test time on tutor

      • T-statistic: -0.43P-value = 0.672

    • Test Tutor B

      • T-statistic: -10.52P-value = 0.000

    • Test Tutor C

      • T-statistic: 2.60P-value = 0.0128


Next step l.jpg

Next step

  • Time on tutor is not significant

    • Drop time and rerun


Analysis38 l.jpg

Analysis

  • Step 1

    • F-stat = 1414p-value = 0.000

  • Step 2

    • Test gpa

      • T-statistic: 10.51P-value = 0.000

    • Test Math SAT score

      • T-statistic: 70.69P-value = 0.000

    • Test Tutor B

      • T-statistic: -10.80P-value = 0.000

    • Test Tutor C

      • T-statistic: 2.67P-value = 0.011


Interpretation39 l.jpg

Interpretation

  • For each addition GPA point, a student scores on average 2.1 points higher on the final exam

  • For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam


Interpretation of dummy variables l.jpg

Interpretation of Dummy Variables

  • Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A

  • Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A


Interpretation of dummy variables41 l.jpg

Interpretation of Dummy Variables

  • We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B

  • However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline

  • Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance


Check assumptions l.jpg

Check Assumptions


Example43 l.jpg

Example

  • Suppose we have the following regression

  • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB - 1.44*TutorC + 1.8*time*TutorB - 1.7*time*TutorC

  • Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant.

  • Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone


Interpretation44 l.jpg

Interpretation

  • Tutor A

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time

  • Tutor B

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB + 1.8*time*TutorB

    • Exam Score = (2.7 +1.01) + 3.21*gpa + 0.18*MathSAT + (1.3 + 1.8)*time

    • Exam Score = 3.71 + 3.21*gpa + 0.18*MathSAT + 3.1*time

  • Tutor C

    • Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time - 1.44*TutorC - 1.7*time*TutorC

    • Exam Score = (2.7 -1.44) + 3.21*gpa + 0.18*MathSAT + (1.3 - 1.7)*time

    • Exam Score = 1.26 + 3.21*gpa + 0.18*MathSAT - 0.40*time


Interpretation45 l.jpg

Interpretation

  • For each additional point in GPA, a student’s exam score increases by 3.21

  • For each additional point in Math SAT, a student’s exam score increases by 0.18

  • Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A

  • Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A


Interpretation46 l.jpg

Interpretation

  • Students using Tutor A

    • For each additional minute on the tutor, students exam scores increase by 1.3

  • Students using Tutor B

    • For each additional minute on the tutor, students exam scores increase by 3.1

  • Students using Tutor C

    • For each additional minute on the tutor, students exam scores decrease by 0.40


Ancova l.jpg

ANCOVA

  • Analysis of Covariance

    • At least one quantitative and one categorical explanatory variable

    • In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable

    • It is a blending of regression and ANOVA


Ancova48 l.jpg

ANCOVA

  • Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models

  • Will get the same results in either case!

  • Different statistical packages make one or the other easier to run

  • It is a matter of preference and interpretation


  • Login