440 likes | 661 Views
Chapter 11 (Continued). Regression and Correlation methods. Linear Multiple Regression Model. Types of Regression Models. Learning Objectives:. This part focuses on Linear Multiple Regression Model : After studying the materials in this section, you should be able to:
E N D
Chapter 11 (Continued) Regression and Correlation methods EPI809/Spring 2008
Linear Multiple Regression Model EPI809/Spring 2008
Types of Regression Models EPI809/Spring 2008
Learning Objectives: This part focuses on Linear Multiple Regression Model: After studying the materials in this section, you should be able to: • Understand the general concepts behind Linear Multiple Regression Model • Fit and Interpret Linear Multiple Regression Computer Output • Perform model diagnosis: Test Overall and partial Significance of a multiple Regression Model, Perform Residual Analysis • Describe Linear Regression Pitfalls EPI809/Spring 2008
Regression Modeling Steps • Specify the model and estimate all unknown parameters • Evaluate Model • Use Model for Prediction & Estimation EPI809/Spring 2008
Linear regression Model specification: Decide what you want to do and select the dependent variable List all potential independent variables for your model EPI809/Spring 2008
Linear Multiple Regression Model 1. Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables EPI809/Spring 2008
Linear Regression Assumptions • Mean of Distribution of Error Is 0 • Distribution of Error Has Constant Variance • Distribution of Error is Normal • Errors Are Independent Extremely Important EPI809/Spring 2008
PopulationMultiple Regression Model Bivariate model EPI809/Spring 2008
Parameter Estimation:You gather the observations for all variables and estimate model parameters EPI809/Spring 2008
Multiple Linear Regression Equations Too complicated by hand! Ouch! EPI809/Spring 2008
Sample Multiple Regression Model Bivariate model EPI809/Spring 2008
Interpretation of Estimated Coefficients EPI809/Spring 2008
Interpretation of Estimated Coefficients 1. Slope (k) • Estimated averaged Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant • Example from textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2) ^ ^ ^ EPI809/Spring 2008
Interpretation of Estimated Coefficients ^ 1. Slope (k) • Estimated Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant • Example form textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2) 2. Y-Intercept (0), predicted average value of Y When all Xk’s are set 0 ^ ^ ^ EPI809/Spring 2008
Variance of Error estimate • Assuming model is correctly specified… • Best (unbiased) estimator ofis • It is used in formula for computing • Exact formula is too complicated to show • But higher value for s leads to higher EPI809/Spring 2008
Parameter Estimation Example • You’re a Vet epidemiologist for the county cooperative. You gather the following data: MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 • What is the linear relationshipbetween cows’ food intake, weight and milk yield? © 1984-1994 T/Maker Co. EPI809/Spring 2008
Model Specification Example Dependent variable is milk yield (lb) Independent variables for our model are Food intake (lb.) and weight (X100 lb.) EPI809/Spring 2008
Sample SAS codes for plotting DATA Data Cow; /*Reading data in SAS*/ input Milk Food weight@@; cards; 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 ; run; • procgplot; plot milk*food milk*weight; run; EPI809/Spring 2008
Some plots EPI809/Spring 2008
Sample SAS codes for fitting a multiple linear regression PROCREG data=Cow; model milk = food weight; run; EPI809/Spring 2008
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.06397 0.25986 0.25 0.8214 Food 1 0.20492 0.05882 3.48 0.0399 weight 1 0.28049 0.06860 4.09 0.0264 ParameterEstimation SAS Output ^ P ^ 0 ^ ^ s ^ 1 2 p EPI809/Spring 2008
Parameter Estimates Sum of Mean Source DF Squares Square F Value Pr > F Model 2 9.24974 4.62487 55.44 0.0043 Error 3 0.25026 0.08342 Corrected Total 5 9.50000 Root MSE 0.28883 R-Square 0.9737 Dependent Mean 2.50000 Adj R-Sq 0.9561 Coeff Var 11.55309 ParameterEstimation SAS Output S EPI809/Spring 2008
Interpretation of Coefficients Solution ^ 1. Slope (1) • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant EPI809/Spring 2008
Interpretation of Coefficients Solution ^ 1. Slope (1) • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant • Slope (2) -Milk yield Is Expected to Increase by .2805 for Each 1 unit (x100 lb.) Increase in weight Holding the food intake Constant ^ EPI809/Spring 2008
Model Evaluation EPI809/Spring 2008
Evaluating Multiple Regression Models 1. Examine Variation Measures 2. Test Significance of Overall Model, portions of overall model and Individual Coefficients 3. Check conditions of a multiple linear regression model using Residuals 4. Assess Multicollinearity among ind. variables EPI809/Spring 2008
Variation Measures EPI809/Spring 2008
Coefficient of Multiple Determination • Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together EPI809/Spring 2008
Check Your Understanding • If you add a variable to the model, how will that affect the R-squared value for the model? EPI809/Spring 2008
Adjusted R2 • R2 Never Decreases When New X Variable Is Added to Model (Disadvantage When Comparing Models) • Solution: Adjusted R2 • Each additional variable reduces adjusted R2, unless SSE goes up enough to compensate EPI809/Spring 2008
Check Your Understanding Using the Vet example: If you add a variable to the model, How will that affect R-squared and the estimate of standard deviation (of the error term)? EPI809/Spring 2008
Check Your Understanding: solution • Model with food intake only: S = 0.64126, R-Square = 0.8269 & Adj R-Sq = 0.7836 • Model with food intake and weight: S = 0.28883, R-Square =0.9737 & Adj R-Sq =0.9561 EPI809/Spring 2008
Thinking challenge • 18 variables • N=20 • R-squared=.95 EPI809/Spring 2008
Testing Overall Significance of regression parameters EPI809/Spring 2008
Testing Overall Significance • Tests if there is a Linear Relationship Between AllX Variables Together & Y • Hypotheses • H0: 1 = 2 = ... = k = 0 • No Linear Relationship • Ha: At Least One Coefficient Is Not 0 • At Least One X Variable linearly Affects Y • Uses F test statistic EPI809/Spring 2008
Overall Significance Test statistic • Test statistic: • Denotation in SAS: EPI809/Spring 2008
Overall SignificanceRejection Rule • Reject H0 in favor of Ha if fcalc falls in colored area • Reject H0 for Ha if P-value = P(F>fcalc)<α Reject H 0 Do Not Reject H 0 F 0 F ( k , n -K-1 , 1-α) EPI809/Spring 2008
Testing Overall Significance Example • You’re a Vet epidemiologist for the county cooperative. You gather the following data: MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 • Are cows’ food intake and weight both linearly related to cows’ milk yield? Test at 5% significance level © 1984-1994 T/Maker Co. EPI809/Spring 2008
Testing Overall Significance Example Model: Hypotheses H0: 1 = 2 = 0 (No Linear Relationship) Ha: At Least One Coefficient Is Not 0 EPI809/Spring 2008
Parameter Estimates Sum of Mean Source DF Squares Square F Value Pr > F Model 2 9.24974 4.62487 55.44 0.0043 Error 3 0.25026 0.08342 Corrected Total 5 9.50000 Testing Overall SignificanceSAS Computer Output MS(Model) MS(Error) k n - k -1 n - 1 P-Value EPI809/Spring 2008
Thinking Challenge • k=18, n=20, R-squared=.95 • Would need an F-value >247.3 to reject the null hypothesis! EPI809/Spring 2008
Thinking challenge • F-test for model is significant • Does the model have the best available predictors for y? • Are all the terms in the model important for predicting y? • Or what? EPI809/Spring 2008