Testing the strength of the multiple regression model
Download
1 / 25

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL. Test 1: Are Any of the x’s Useful in Predicting y?. We are asking: Can we conclude at least one of the ’s (other than  0 )  0? H 0 :  1 =  2 =  3 =  4 = 0 H A : At least one of these ’s  0  = .05.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL' - paloma-steele


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Testing the strength of the multiple regression model

TESTING THE STRENGTH

OF THE

MULTIPLE REGRESSION MODEL


Test 1 are any of the x s useful in predicting y
Test 1: Are Any of the x’s Useful in Predicting y?

We are asking: Can we conclude at least one of the ’s (other than 0)  0?

H0: 1 = 2 = 3 = 4 = 0

HA: At least one of these ’s 0

 = .05


Idea of the test
Idea of the Test

  • Measure the overall “average variability” due to changes in the x’s

  • Measure the overall “average variability” that is due to randomness (error)

  • If the overall “average variability” due to changes in the x’s IS A LOT LARGER than “average variability” due to error, we conclude at least  is non-zero, i.e. at least one factor (x) is useful in predicting y


Total variability
“Total Variability”

  • Just like with simple linear regression we have total sum of squares due to regression SSR , and total sum of squares due to error, SSE, which are printed on the EXCEL output.

    • The formulas are a more complicated (they involve matrix operations)


Average variability
“Average Variability”

  • “Average variability” (Mean variability) for a group is defined as the Total Variability divided by the degrees of freedom associated with that group:

  • Mean Squares Due to Regression

    MSR = SSR/DFR

  • Mean Squares Due to Error

    MSE = SSE/DFE


Degrees of freedom
Degrees of Freedom

  • Total number of degrees of freedom DF(Total) always = n-1

  • Degrees of freedom for regression (DFR) = the number of factors in the regression (i.e. the number of x’s in the linear regression)

  • Degrees of freedom for error (DFE) = difference between the two = DF(Total) -DFR


The f statistic
The F-Statistic

  • The F-statistic is defined as the ratio of two measures of variability. Here,

  • Recall we are saying if MSR is “large” compared to MSE, at least one β ≠ 0.

  • Thus if F is “large”, we draw the conclusion is that HA is true, i.e. at least one β ≠ 0.


The f test
The F-test

  • “Large” compared to what?

  • F-tables give critical values for given values of 

  • TEST: REJECT H0 (Accept HA) if:

    F = MSR/MSE > F,DFR,DFE


Results
RESULTS

  • If we do not get a large F statistic

    • We cannot conclude that any of the variables in this model are significant in predicting y.

  • If we do get a large F statistic

    • We can conclude at least one of the variables is significant for predicting y .

    • NATURAL QUESTION --

      • WHICH ONES?


DFR = #x’s

DFE = Total DF- DFR

Total DF = n-1

SSR

SSE

Total SS = (yi - )2


MSR = SSR/DFR

MSE = SSE/DFE

F = MSR/MSE

P-value for the F test


Results1
Results

  • We see that the F statistic is 20.89762

  • This would be compared to F.05,3,34

    • From the F.05 Table, the value of F.05,3,34 is not given.

    • But F.05,3,30 = 2.92 and F.05,3,40 = 2.84.

    • And 20.89762 > either of these numbers.

    • The actual value of F.05,3,34 can be calculated by Excel by FINV(.05,3,34) = 2.882601

  • USE SIGNIFICANCE F

    • This is the p-value for the F-Test

    • Significance F = 7.46 x 10-8 = .0000000746 < .05

    • Can conclude that at least one x is useful in predicting y


Test 2 which variables are significant in this model
Test 2: Which Variables Are Significant IN THIS MODEL?

  • The question we are asking is, “taking all the other factors (x’s) into consideration, does a change in a particular x (x3, say) value significantly affect y.

  • This is another hypothesis test (a t-test).

  • To test if the age of the house is significant:

    H0: 3 = 0 (x3 is not significant in this model)

    HA: 3  0 (x3 is significant in this model)



t-value for test of 3 = 0

p-value for test of 3 = 0


Reading printout for the t test
Reading Printout for the t-test

  • Simply look at the p-value

    • p-value for 3 = 0 is .02194 < .05

      • Thus the age of the house is significant in this model

  • The other variables

    • p-value for 1 = 0 is .0000839 < .05

      • Thus square feet is significant in this model

    • p-value for 2 = 0 is .15503 > .05

      • Thus the land (acres) is not significant in this model


Does a poor t value imply the variable is not useful in predicting y
Does A Poor t-value Imply the Variable is not Useful in Predicting y?

  • NO

  • It says the variable is not significant IN THIS MODEL when we consider all the other factors.

  • In this model – land is not significant when included with square footage and age.

  • But if we would have run this model without square footage we would have gotten the output on the next slide.


p-value for land is .00000717. Predicting y?

In this model Land is significant.


Can it even happen that F says at least one variable is significant, but none of the t’s indicate a useful variable?

  • YES

    EXAMPLES IN WHICH THIS MIGHT HAPPEN:

    • Miles per gallon vs. horsepower and engine size

    • Salary vs. GPA and GPA in major

    • Income vs. age and experience

    • HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND

  • There is a relation between the x’s –

    • Multicollinearity


Approaches that could be used when multicollinearity is detected
Approaches That Could Be Used When Multicollinearity Is Detected

  • Eliminate some variables and run again

  • Stepwise regression

    This is discussed in a future module.


Test 3 what proportion of the overall variability in y is due to changes in the x s
Test 3 --What Proportion of the Overall Variability in y Is Due to Changes in the x’s?

R2

  • R2 = .442197

  • Overall 44% of the total variation in sales price is explained by changes in square footage, land, and age of the house.


What is adjusted r 2
What is Adjusted R Due to Changes in the x’s?2?

  • Adjusted R2 adjusts R2 to take into account degrees of freedom.

  • By assuming a higher order equation for y, we can force the curve to fit this one set of data points in the model – eliminating much of the variability (See next slide).

  • But this is not what is going on!

    R2 might be higher – but adjusted R2 might be much lower

  • Adjusted R2 takes this into account

  • Adjusted R2 = 1-MSE/SST


Scatterplot Due to Changes in the x’s?

This is not what is really going on


Review
Review Due to Changes in the x’s?

  • Are any of the x’s useful in predicting y IN THIS MODEL

    • Look at p-value for F-test – Significance F

    • F = MSR/MSE would be compared to F,DFR,DFE

  • Which variables are significant in this model?

    • Look at p-values for the individual t-tests

  • What proportion of the total variance in y can be explained by changes in the x’s?

    • R2

    • Adjusted R2 takes into account the reduced degrees of freedom for the error term by including more terms in the model


4 Places to Look on Excel Printout Due to Changes in the x’s?

4- R2

What proportion of y can be

explained by changes in x?

2- Significance F

Are any variables useful?

3- p-values for t-tests

Which variables are significant

in this model?

1-regression equation


ad