Chapter 11 (Continued)

1 / 43

# Chapter 11 (Continued) - PowerPoint PPT Presentation

Chapter 11 (Continued). Regression and Correlation methods. Linear Multiple Regression Model. Types of Regression Models. Learning Objectives:. This part focuses on Linear Multiple Regression Model : After studying the materials in this section, you should be able to:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Chapter 11 (Continued)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 11 (Continued)

Regression and Correlation methods

EPI809/Spring 2008

### Linear Multiple Regression Model

EPI809/Spring 2008

Types of Regression Models

EPI809/Spring 2008

Learning Objectives:

This part focuses on Linear Multiple Regression Model: After studying the materials in this section, you should be able to:

• Understand the general concepts behind Linear Multiple Regression Model
• Fit and Interpret Linear Multiple Regression Computer Output
• Perform model diagnosis: Test Overall and partial Significance of a multiple Regression Model, Perform Residual Analysis
• Describe Linear Regression Pitfalls

EPI809/Spring 2008

Regression Modeling Steps
• Specify the model and estimate all unknown parameters
• Evaluate Model
• Use Model for Prediction & Estimation

EPI809/Spring 2008

### Linear regression Model specification:

Decide what you want to do and select the dependent variable

List all potential independent variables for your model

EPI809/Spring 2008

Linear Multiple Regression Model

1. Relationship between 1 dependent & 2 or more independent variables is a linear function

Population Y-intercept

Population slopes

Random error

Dependent (response) variable

Independent (explanatory) variables

EPI809/Spring 2008

Linear Regression Assumptions
• Mean of Distribution of Error Is 0
• Distribution of Error Has Constant Variance
• Distribution of Error is Normal
• Errors Are Independent

Extremely Important

EPI809/Spring 2008

PopulationMultiple Regression Model

Bivariate model

EPI809/Spring 2008

### Parameter Estimation:You gather the observations for all variables and estimate model parameters

EPI809/Spring 2008

Multiple Linear Regression Equations

Too complicated by hand!

Ouch!

EPI809/Spring 2008

Sample Multiple Regression Model

Bivariate model

EPI809/Spring 2008

Interpretation of Estimated Coefficients

1. Slope (k)

• Estimated averaged Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant
• Example from textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2)

^

^

^

EPI809/Spring 2008

Interpretation of Estimated Coefficients

^

1. Slope (k)

• Estimated Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant
• Example form textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2)

2. Y-Intercept (0), predicted average value of Y When all Xk’s are set 0

^

^

^

EPI809/Spring 2008

Variance of Error estimate
• Assuming model is correctly specified…
• Best (unbiased) estimator ofis
• It is used in formula for computing
• Exact formula is too complicated to show
• But higher value for s leads to higher

EPI809/Spring 2008

Parameter Estimation Example
• You’re a Vet epidemiologist for the county cooperative. You gather the following data:

MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7

2 6 4

4 10 6

• What is the linear relationshipbetween cows’ food intake, weight and milk yield?

EPI809/Spring 2008

### Model Specification Example

Dependent variable is milk yield (lb)

Independent variables for our model are Food intake (lb.) and weight (X100 lb.)

EPI809/Spring 2008

Sample SAS codes for plotting DATA

Data Cow; /*Reading data in SAS*/

input Milk Food weight@@;

cards;

1 1 2 4 8 8 1 3 1

3 5 7 2 6 4 4 10 6

;

run;

• procgplot;

plot milk*food milk*weight;

run;

EPI809/Spring 2008

Some plots

EPI809/Spring 2008

Sample SAS codes for fitting a multiple linear regression

PROCREG data=Cow;

model milk = food weight;

run;

EPI809/Spring 2008

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.06397 0.25986 0.25 0.8214

Food 1 0.20492 0.05882 3.48 0.0399

weight 1 0.28049 0.06860 4.09 0.0264

ParameterEstimation SAS Output

^

P

^

0

^

^

s

^

1

2

p

EPI809/Spring 2008

Parameter Estimates

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 2 9.24974 4.62487 55.44 0.0043

Error 3 0.25026 0.08342

Corrected Total 5 9.50000

Root MSE 0.28883 R-Square 0.9737

Dependent Mean 2.50000 Adj R-Sq 0.9561

Coeff Var 11.55309

ParameterEstimation SAS Output

S

EPI809/Spring 2008

Interpretation of Coefficients Solution

^

1. Slope (1)

• Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant

EPI809/Spring 2008

Interpretation of Coefficients Solution

^

1. Slope (1)

• Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant
• Slope (2)

-Milk yield Is Expected to Increase by .2805 for Each 1 unit (x100 lb.) Increase in weight Holding the food intake Constant

^

EPI809/Spring 2008

### Model Evaluation

EPI809/Spring 2008

Evaluating Multiple Regression Models

1. Examine Variation Measures

2. Test Significance of Overall Model, portions of overall model and Individual Coefficients

3. Check conditions of a multiple linear regression model using Residuals

4. Assess Multicollinearity among ind. variables

EPI809/Spring 2008

### Variation Measures

EPI809/Spring 2008

Coefficient of Multiple Determination
• Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together

EPI809/Spring 2008

• If you add a variable to the model, how will that affect the R-squared value for the model?

EPI809/Spring 2008

• R2 Never Decreases When New X Variable Is Added to Model (Disadvantage When Comparing Models)
• Each additional variable reduces adjusted R2, unless SSE goes up enough to compensate

EPI809/Spring 2008

Using the Vet example: If you add a variable to the model, How will that affect R-squared and the estimate of standard deviation (of the error term)?

EPI809/Spring 2008

• Model with food intake only:

S = 0.64126, R-Square = 0.8269

• Model with food intake and weight:

S = 0.28883, R-Square =0.9737

EPI809/Spring 2008

Thinking challenge
• 18 variables
• N=20
• R-squared=.95

EPI809/Spring 2008

### Testing Overall Significance of regression parameters

EPI809/Spring 2008

Testing Overall Significance
• Tests if there is a Linear Relationship Between AllX Variables Together & Y
• Hypotheses
• H0: 1 = 2 = ... = k = 0
• No Linear Relationship
• Ha: At Least One Coefficient Is Not 0
• At Least One X Variable linearly Affects Y
• Uses F test statistic

EPI809/Spring 2008

Overall Significance Test statistic
• Test statistic:
• Denotation in SAS:

EPI809/Spring 2008

Overall SignificanceRejection Rule
• Reject H0 in favor of Ha if fcalc falls in colored area
• Reject H0 for Ha if P-value = P(F>fcalc)<α

Reject H

0

Do Not

Reject H

0

F

0

F

(

k

,

n

-K-1

, 1-α)

EPI809/Spring 2008

Testing Overall Significance Example
• You’re a Vet epidemiologist for the county cooperative. You gather the following data:

MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7

2 6 4

4 10 6

• Are cows’ food intake and weight both linearly related to cows’ milk yield? Test at 5% significance level

EPI809/Spring 2008

### Testing Overall Significance Example

Model:

Hypotheses

H0: 1 = 2 = 0 (No Linear Relationship)

Ha: At Least One Coefficient Is Not 0

EPI809/Spring 2008

Parameter Estimates

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 2 9.24974 4.62487 55.44 0.0043

Error 3 0.25026 0.08342

Corrected Total 5 9.50000

Testing Overall SignificanceSAS Computer Output

MS(Model) MS(Error)

k

n - k -1

n - 1

P-Value

EPI809/Spring 2008

Thinking Challenge
• k=18, n=20, R-squared=.95
• Would need an F-value >247.3 to reject the null hypothesis!

EPI809/Spring 2008

Thinking challenge
• F-test for model is significant
• Does the model have the best available predictors for y?
• Are all the terms in the model important for predicting y?
• Or what?

EPI809/Spring 2008