chapter 11 continued l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 11 (Continued) PowerPoint Presentation
Download Presentation
Chapter 11 (Continued)

Loading in 2 Seconds...

play fullscreen
1 / 43

Chapter 11 (Continued) - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Chapter 11 (Continued). Regression and Correlation methods. Linear Multiple Regression Model. Types of Regression Models. Learning Objectives:. This part focuses on Linear Multiple Regression Model : After studying the materials in this section, you should be able to:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 11 (Continued)' - gil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 11 continued

Chapter 11 (Continued)

Regression and Correlation methods

EPI809/Spring 2008

types of regression models
Types of Regression Models

EPI809/Spring 2008

learning objectives
Learning Objectives:

This part focuses on Linear Multiple Regression Model: After studying the materials in this section, you should be able to:

  • Understand the general concepts behind Linear Multiple Regression Model
  • Fit and Interpret Linear Multiple Regression Computer Output
  • Perform model diagnosis: Test Overall and partial Significance of a multiple Regression Model, Perform Residual Analysis
  • Describe Linear Regression Pitfalls

EPI809/Spring 2008

regression modeling steps
Regression Modeling Steps
  • Specify the model and estimate all unknown parameters
  • Evaluate Model
  • Use Model for Prediction & Estimation

EPI809/Spring 2008

linear regression model specification

Linear regression Model specification:

Decide what you want to do and select the dependent variable

List all potential independent variables for your model

EPI809/Spring 2008

linear multiple regression model7
Linear Multiple Regression Model

1. Relationship between 1 dependent & 2 or more independent variables is a linear function

Population Y-intercept

Population slopes

Random error

Dependent (response) variable

Independent (explanatory) variables

EPI809/Spring 2008

linear regression assumptions
Linear Regression Assumptions
  • Mean of Distribution of Error Is 0
  • Distribution of Error Has Constant Variance
  • Distribution of Error is Normal
  • Errors Are Independent

Extremely Important

EPI809/Spring 2008

population multiple regression model
PopulationMultiple Regression Model

Bivariate model

EPI809/Spring 2008

parameter estimation you gather the observations for all variables and estimate model parameters

Parameter Estimation:You gather the observations for all variables and estimate model parameters

EPI809/Spring 2008

multiple linear regression equations
Multiple Linear Regression Equations

Too complicated by hand!

Ouch!

EPI809/Spring 2008

sample multiple regression model
Sample Multiple Regression Model

Bivariate model

EPI809/Spring 2008

interpretation of estimated coefficients14
Interpretation of Estimated Coefficients

1. Slope (k)

  • Estimated averaged Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant
    • Example from textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2)

^

^

^

EPI809/Spring 2008

interpretation of estimated coefficients15
Interpretation of Estimated Coefficients

^

1. Slope (k)

  • Estimated Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant
    • Example form textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2)

2. Y-Intercept (0), predicted average value of Y When all Xk’s are set 0

^

^

^

EPI809/Spring 2008

variance of error estimate
Variance of Error estimate
  • Assuming model is correctly specified…
  • Best (unbiased) estimator ofis
  • It is used in formula for computing
    • Exact formula is too complicated to show
    • But higher value for s leads to higher

EPI809/Spring 2008

parameter estimation example
Parameter Estimation Example
  • You’re a Vet epidemiologist for the county cooperative. You gather the following data:

MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7

2 6 4

4 10 6

  • What is the linear relationshipbetween cows’ food intake, weight and milk yield?

© 1984-1994 T/Maker Co.

EPI809/Spring 2008

model specification example

Model Specification Example

Dependent variable is milk yield (lb)

Independent variables for our model are Food intake (lb.) and weight (X100 lb.)

EPI809/Spring 2008

sample sas codes for plotting data
Sample SAS codes for plotting DATA

Data Cow; /*Reading data in SAS*/

input Milk Food weight@@;

cards;

1 1 2 4 8 8 1 3 1

3 5 7 2 6 4 4 10 6

;

run;

  • procgplot;

plot milk*food milk*weight;

run;

EPI809/Spring 2008

some plots
Some plots

EPI809/Spring 2008

sample sas codes for fitting a multiple linear regression
Sample SAS codes for fitting a multiple linear regression

PROCREG data=Cow;

model milk = food weight;

run;

EPI809/Spring 2008

parameter estimation sas output
Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.06397 0.25986 0.25 0.8214

Food 1 0.20492 0.05882 3.48 0.0399

weight 1 0.28049 0.06860 4.09 0.0264

ParameterEstimation SAS Output

^

P

^

0

^

^

s

^

1

2

p

EPI809/Spring 2008

parameter estimation sas output23
Parameter Estimates

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 2 9.24974 4.62487 55.44 0.0043

Error 3 0.25026 0.08342

Corrected Total 5 9.50000

Root MSE 0.28883 R-Square 0.9737

Dependent Mean 2.50000 Adj R-Sq 0.9561

Coeff Var 11.55309

ParameterEstimation SAS Output

S

EPI809/Spring 2008

interpretation of coefficients solution
Interpretation of Coefficients Solution

^

1. Slope (1)

  • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant

EPI809/Spring 2008

interpretation of coefficients solution25
Interpretation of Coefficients Solution

^

1. Slope (1)

    • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant
  • Slope (2)

-Milk yield Is Expected to Increase by .2805 for Each 1 unit (x100 lb.) Increase in weight Holding the food intake Constant

^

EPI809/Spring 2008

model evaluation

Model Evaluation

EPI809/Spring 2008

evaluating multiple regression models
Evaluating Multiple Regression Models

1. Examine Variation Measures

2. Test Significance of Overall Model, portions of overall model and Individual Coefficients

3. Check conditions of a multiple linear regression model using Residuals

4. Assess Multicollinearity among ind. variables

EPI809/Spring 2008

variation measures

Variation Measures

EPI809/Spring 2008

coefficient of multiple determination
Coefficient of Multiple Determination
  • Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together

EPI809/Spring 2008

check your understanding
Check Your Understanding
  • If you add a variable to the model, how will that affect the R-squared value for the model?

EPI809/Spring 2008

adjusted r 2
Adjusted R2
  • R2 Never Decreases When New X Variable Is Added to Model (Disadvantage When Comparing Models)
  • Solution: Adjusted R2
    • Each additional variable reduces adjusted R2, unless SSE goes up enough to compensate

EPI809/Spring 2008

check your understanding32
Check Your Understanding

Using the Vet example: If you add a variable to the model, How will that affect R-squared and the estimate of standard deviation (of the error term)?

EPI809/Spring 2008

check your understanding solution
Check Your Understanding: solution
  • Model with food intake only:

S = 0.64126, R-Square = 0.8269

& Adj R-Sq = 0.7836

  • Model with food intake and weight:

S = 0.28883, R-Square =0.9737

& Adj R-Sq =0.9561

EPI809/Spring 2008

thinking challenge
Thinking challenge
  • 18 variables
  • N=20
  • R-squared=.95

EPI809/Spring 2008

testing overall significance
Testing Overall Significance
  • Tests if there is a Linear Relationship Between AllX Variables Together & Y
  • Hypotheses
    • H0: 1 = 2 = ... = k = 0
      • No Linear Relationship
    • Ha: At Least One Coefficient Is Not 0
      • At Least One X Variable linearly Affects Y
  • Uses F test statistic

EPI809/Spring 2008

overall significance test statistic
Overall Significance Test statistic
  • Test statistic:
  • Denotation in SAS:

EPI809/Spring 2008

overall significance rejection rule
Overall SignificanceRejection Rule
  • Reject H0 in favor of Ha if fcalc falls in colored area
  • Reject H0 for Ha if P-value = P(F>fcalc)<α

Reject H

0

Do Not

Reject H

0

F

0

F

(

k

,

n

-K-1

, 1-α)

EPI809/Spring 2008

testing overall significance example
Testing Overall Significance Example
  • You’re a Vet epidemiologist for the county cooperative. You gather the following data:

MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7

2 6 4

4 10 6

  • Are cows’ food intake and weight both linearly related to cows’ milk yield? Test at 5% significance level

© 1984-1994 T/Maker Co.

EPI809/Spring 2008

testing overall significance example40

Testing Overall Significance Example

Model:

Hypotheses

H0: 1 = 2 = 0 (No Linear Relationship)

Ha: At Least One Coefficient Is Not 0

EPI809/Spring 2008

testing overall significance sas computer output
Parameter Estimates

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 2 9.24974 4.62487 55.44 0.0043

Error 3 0.25026 0.08342

Corrected Total 5 9.50000

Testing Overall SignificanceSAS Computer Output

MS(Model) MS(Error)

k

n - k -1

n - 1

P-Value

EPI809/Spring 2008

thinking challenge42
Thinking Challenge
  • k=18, n=20, R-squared=.95
  • Would need an F-value >247.3 to reject the null hypothesis!

EPI809/Spring 2008

thinking challenge43
Thinking challenge
  • F-test for model is significant
    • Does the model have the best available predictors for y?
    • Are all the terms in the model important for predicting y?
    • Or what?

EPI809/Spring 2008