- 48 Views
- Uploaded on
- Presentation posted in: General

STAT E-150 Statistical Methods. Multiple Regression.

Multiple Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

STAT E-150Statistical Methods

Multiple Regression

Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the ages of 18 and 39, a healthy body fat percent is 8% to 19%. (For women it is 21% to 32%.)

It is not easy to measure body fat percent, but we can find a model for the relationship between body fat percent and waist size and use it to find the body weight percent associated with a given waist size.

The scatterplot indicates a positive linear relationship between waist size and body fat percent:

The SPSS output shows a significant linear relationship between the two variables.

R2= .678, so we know that almost 68% of the variability in the body fat percentage is accounted for by the waist size.

What other variables might be used to predict body fat percentage?Can we improve the prediction by including additional variables?

The Multiple Linear Regression Model

We have n observations on k explanatory variables X1, X2, X3, …, Xk and a response variable, Y. The multiple regression model is:

Y = β0 + β1x1+ β2x2 + + βkxk+ ε

where ε ~ N(0, σε) and the errors are independent from one another.

The predictor variables may be higher powers or other functions of quantitative variables, coded categorical variables, or interaction terms.The main restriction is that the model is linear; that is, each term is a constant multiple of a predictor.

Fitting a Multiple Linear Regression Model

As we did in Simple Linear Regression, we will choose a possible set of predictors, estimate the coefficients based on sample data, and assess the fit. We will again use the sum of squared residuals, where the residuals are the differences between the actual Y values and the Y values predicted by the prediction equation

and use SPSS to determine the estimates of the coefficients βi that minimize the sum of the squared residuals.

- We will test the hypotheses
- H0: β1 = β2 = β3 = = βk = 0
- Ha: The slopes are not all zero.
- Our assumptions are:
- - The y-values are independent of each other
- - Y has a constant variance for any combination of predictors
- - The values of y are normally distributed for any fixed set of values for the explanatory variables

- That is, the errors are independent values from a N(0, σε) distribution.

If the null hypothesis is rejected, then test a null hypothesis for each of the coefficients:

H0: βj = 0

Ha: βj ≠ 0

Note: If the null hypothesis is not rejected, it does not mean that the corresponding predictor variable has no relationship to y; it means that the predictor variable contributes nothing to modeling y after allowing for all the other predictors.

The hypotheses for fitting a multiple linear regression model to predict body fat percentage based on waist size and height are

H0: βheight = βweight = 0

Ha: The slopes are not both zero.

Here are the scatterplots using the individual predictors:

Although this suggests a linear relationship between waist size and body fat percentage, there doesn't appear to be a linear relationship between height and body fat percentage.

Here are some of the results for a multiple regression analysis with both height and waist as predictors:

The p-value for height is close to 0, so we know that height does contribute to the multipleregression model.

The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using

Which pair of variables shows a strong linear relationship?

Which pair of variables shows a weak linear relationship?

Which pair of variables shows no linear relationship?

The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using

Which pair of variables shows a strong linear relationship?

Pct BF and Waist

Which pair of variables shows a weak linear relationship?

Height and Waist

Which pair of variables shows no linear relationship?

PctBF and Height

Residual Analysis

These plots tell us that there is no particular scatter to the residuals, and that the distribution of the residuals is close to normal.

Use the SPSS output provided to answer the questions below:

What is the fitted regression equation?

Use the SPSS output provided to answer the questions below:

What is the fitted regression equation?

%BodyFat= 1.773 waist - .601 height - 3.110

Use the SPSS output provided to answer the questions below:

%BodyFat= 1.773 waist - .601 height- 3.110

What does the value 1.773 tell you? An increase of one inch in the waist measurement is associated with an increase of 1.773 in body fat percentage.

Use the SPSS output provided to answer the questions below:

%BodyFat= 1.773 waist - .601 height- 3.110

What does the value 1.773 tell you? An increase of one inch in the waist measurement is associated with an increase of 1.773 in body fat percentage for men of a particular height.

Use the SPSS output provided to answer the questions below:

%BodyFat= 1.773 waist - .601 height- 3.110

What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight.

Use the SPSS output provided to answer the questions below:

%BodyFat= 1.773 waist - .601 height- 3.110

What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight.

Use the SPSS output provided to answer the questions below:

What is the value of R2 ? What does it tell you?

Use the SPSS output provided to answer the questions below:

What is the value of R2 ? What does it tell you? R2 = .713 which tells us that height and waist size together account for about 71.3% of the variation in the body fat percentage for men.

Use the SPSS results to complete the hypothesis test:

The value the test statistic is: 307.096 p = 0+

What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test:

The value the test statistic is: 307.096 p = 0+

What can you conclude? is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test:

The value the test statistic is: 307.096 p = 0+

What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables waist and height.is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

We also want to estimate the standard deviation of the error term, σε

As we add a new predictor to the model, we have a new coefficient to estimate, and so we lose one more degree of freedom.

The estimate for the standard error of the multiple regression model with k predictors is

Use the SPSS output to find the standard error of this regression model:

Use the SPSS output to find the standard error of this regression model:

Assessing a Multiple Regression ModelIndividual t-Tests for Coefficients in Multiple Regression

In order to determine whether any one of the predictor variables is helpful to include in the model, we test the coefficient for that predictor:

H0: βi= 0

Ha: βi≠ 0

The test statistic is with n - k - 1 degrees of freedom.

It is important to remember that the meaning of each coefficient depends on all of the predictors in the regression model.

If we fail to reject the null hypothesis, it means that the corresponding predictor variable contributes nothing to the multiple regression model after allowing for all other predictors.

Use the SPSS output to test the coefficients in our model:

H0: βheight= 0

Ha: βheight≠ 0t = p =

What is your conclusion?

Use the SPSS output to test the coefficients in our model:

H0: βheight= 0

Ha: βheight≠ 0t = -5.47 p = 0+

What is your conclusion?

Use the SPSS output to test the coefficients in our model:

H0: βheight= 0

Ha: βheight≠ 0t = -5.47 p = 0+

What is your conclusion?Since p is close to 0, we will reject the null hypothesis.

There is evidence that the percent of body fat is related to the height.

We can conclude that the body fat percentage changes as the height changes, for men with the same waist size.

Use the SPSS output to test the coefficients in our model:

H0: βwaist = 0

Ha: βwaist ≠ 0t = p =

What is your conclusion?

Use the SPSS output to test the coefficients in our model:

H0: βwaist = 0

Ha: βwaist ≠ 0t = 24.768 p = 0+

What is your conclusion?

Use the SPSS output to test the coefficients in our model:

H0: βwaist = 0

Ha: βwaist ≠ 0t = 24.768 p = 0+

What is your conclusion?

Since p is close to 0, we will reject the null hypothesis.

There is evidence that the percent of body fat is related to the waist size.

We can conclude that the body fat percentage changes as the waist size changes, for men of the same height.

Can we do a one-tailed test?

H0: βwaist = 0

Ha: βwaist> 0t = 24.768 p =

What is your conclusion?

Since p is close to 0, we will reject the null hypothesis.

There is evidence that the percent of body fat is related to the waist size.

We can conclude that the body fat percentage changes as the waist size changes, for men of the same height.

Can we do a one-tailed test?

H0: βwaist = 0

Ha: βwaist> 0t = 24.768 p = .000/2 = 0+

What is your conclusion?

Since p is close to 0, we will reject the null hypothesis.

There is evidence that the percent of body fat is related to the waist size.

We can conclude that the body fat percentage changes as the waist size changes, for men of the same height.

Can we do a one-tailed test?

H0: βwaist = 0

Ha: βwaist> 0t = 24.768 p = .000/2 = 0+

What is your conclusion?

Since p is close to 0, we will reject the null hypothesis.

There is evidence that the percent of body fat is related to the waist size.

We can conclude that the body fat percentage increasesas the waist size changes, for men of the same height.to 0, we will

Adjusted R2

The adjusted R2 is an adjustment to R2that takes the sample size and the number of parameters (βj) into consideration.

The adjusted R2increases as more predictors are added to the model, and so it can be useful in comparing regression models with different numbers of predictor variables.

Creating a Scatterplot Matrix

Click on Graphs > Chart Builder.

Select Scatter/Dot from the list of charts.Drag the Scatterplot Matrix to the window.

Drag the matrix variables to the horizontal axis.

Click on OK.

The scatterplot matrix will appear in the Output Viewer.

Estimating the Model

Click on Analyze > Regression > Linear

Drag the dependent variable and all independent variables to the appropriate locations. Click on OK.

This will produce several tables:

If you click on Plots in the Linear Regression dialog box, you will get this dialog box:

- Plot the *ZRESIDS on the Y axis against the *ZPRED values on the X axis.
- You may also choose to create a Normal Probability Plot and/or histogram of the residuals.

Click on Continue and then OK. Here are the results: