1 / 46

Linear Regression

Linear Regression. (Lesson - 06/A) Building a Model for the Relationship. Dependent and Independent Variables. A dependent variable is the variable to be predicted or explained in a regression model. This variable is assumed to be functionally related to the independent variable.

ryckman
Download Presentation

Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression (Lesson - 06/A) Building a Model for the Relationship Dr. C. Ertuna

  2. Dependent and Independent Variables A dependent variable is the variable to be predicted or explained in a regression model. This variable is assumed to be functionally related to the independent variable. Dr. C. Ertuna

  3. Dependent and Independent Variables An independent variable is the variable related to the dependent variable in a regression equation. The independent variable is used in a regression model to estimate the value of the dependent variable. Dr. C. Ertuna

  4. Two Variable Relationships Y X (a) Linear Dr. C. Ertuna

  5. Two Variable Relationships Y X (b) Linear Dr. C. Ertuna

  6. Two Variable Relationships Y X (c) Curvilinear Dr. C. Ertuna

  7. Two Variable Relationships Y X (d) Curvilinear Dr. C. Ertuna

  8. Two Variable Relationships Y X (e) No Relationship Dr. C. Ertuna

  9. Correlation The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from + 1.0 to - 1.0. A correlation of  1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship. Dr. C. Ertuna

  10. Correlation SAMPLE CORRELATION COEFFICIENT where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable Dr. C. Ertuna

  11. Correlation TEST STATISTIC FOR CORRELATION where: t = Number of standard deviations that r is away from 0 r = Simple correlation coefficient n = Sample size Dr. C. Ertuna

  12. A C B Correlation Spurious correlation occurs when there is a correlation between two otherwise unrelated variables. Dr. C. Ertuna

  13. Linear Regression Analysis Simple Linear Regressionanalyzes the linear relationship that exists between a dependent variable and a single independent variable. Multiple Linear Regression analyzes the linear relationship that exists between a dependent variable and two or more independent variables. Dr. C. Ertuna

  14. Linear Regression Analysis SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL) where: y = Value of the dependent variable x = Value of the independent variable = Population’s y-intercept = Slope of the population regression line = Error term, or residual Dr. C. Ertuna

  15. Linear Regression Analysis The linear regression model has four assumptions: • The mean of the dependent variable (y), for all specified values of the independent variable, can be connected by a straight line (linear) called the population regression model. • The error terms, i, are statistically independent of one another (for time-series data). • The distribution of error terms, , is normal. • The distributions of possible i values have equal variances for all value of x. Dr. C. Ertuna

  16. Linear Regression Analysis REGRESSION COEFFICIENTS In the simple regression model, there are two coefficients: the intercept and the slope. In the multiple regression model, there are more than two coefficients: the intercept and regression coefficient for each independent variable. Dr. C. Ertuna

  17. Linear Regression Analysis The interpretation of the regression coefficient is that it gives the average change in the dependent variable for a unit change in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the dependent and the particular independent variable. Dr. C. Ertuna

  18. Linear Regression Analysis A residual is the difference between the actual value of the dependent variable and the value predicted by the regression model. Dr. C. Ertuna

  19. Linear Regression Analysis The least squares criterion is used for determining a regression line that minimizes the sum of squared residuals (SSE). Dr. C. Ertuna

  20. Linear Regression Analysis Y 390 400 Sales in Thousands 300 312 200 Residual = 312 - 390 = -78 100 X 4 Years with Company Dr. C. Ertuna

  21. Linear Regression Analysis ESTIMATED REGRESSION MODEL (SAMPLE MODEL) where: = Estimated, or predicted, y value b0 = Unbiased estimate of the regression intercept b1 = Unbiased estimate of the regression slope x = Value of the independent variable Dr. C. Ertuna

  22. Least Squares Regression Properties • The sum of the residuals from the least squares regression line is 0. • The sum of the squared residuals is a minimum. • The simple regression line always passes through the mean of the y variable and the mean of the x variable. • The least squares coefficients are unbiased estimates of 0 and i . Dr. C. Ertuna

  23. Linear Regression Analysis SUM OF RESIDUALS SUM OF SQUARED RESIDUALS Dr. C. Ertuna

  24. Linear Regression Analysis TOTAL SUM OF SQUARES where: TSS = Total sum of squares n = Sample size y = Values of the dependent variable = Average value of the dependent variable Dr. C. Ertuna

  25. Linear Regression Analysis SUM OF SQUARED ERROR (RESIDUALS) where: SSE = Sum of squared error n = Sample size y = Values of the dependent variable = Estimated value for the average of y for the given x value Dr. C. Ertuna

  26. Linear Regression Analysis SUM OF SQUARES REGRESSION where: SSR = Sum of squares regression = Average value of the dependent variable y = Values of the dependent variable = Estimated value for the average of y for the given x value Dr. C. Ertuna

  27. å - 2 ˆ ( y y ) i Linear Regression Analysis SUMS OF SQUARES Dr. C. Ertuna

  28. Linear Regression Analysis The coefficient of determination is the portion of the total variation in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R2. Dr. C. Ertuna

  29. Linear Regression Analysis COEFFICIENT OF DETERMINATION (R2) Dr. C. Ertuna

  30. Regression Analysis COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT VARIABLE CASE where: R2 = Coefficient of determination r = Simple correlation coefficient Dr. C. Ertuna

  31. Linear Regression Analysis MEAN SQUARE REGRESSION where: SSR = Sum of squares regression k = Number of independent variables in the model Dr. C. Ertuna

  32. Linear Regression Analysis MEAN SQUARE ERROR where: SSE = Sum of squares error n = Sample size k = Number of independent variables in the model Dr. C. Ertuna

  33. Regression Steps • Develop a scatter plot of y and each of x’s. Check for linearity. • Compute the least squares regression line for the sample data (save residuals). • Run, Independence (if necessary), Normality, Equality of Variance tests on residuals. • Check significance of coefficients. • Check significance of overall regression. • Check importance of Coefficient of Determination. Dr. C. Ertuna

  34. Running Regression on SPSS Analyze / Regression / Linear / • Method: Stepwise • Statistics: Estimates; Model Fit; Collinearity; Covariance Matrix Part and Partial Correlation; Descriptive Casewise diagnostics (3) • Save: Unstandardized Residuals Unstandardized Predicted Cook’s (Distances) Standardized DfBeta Dr. C. Ertuna

  35. Residual Analysis Before using a regression model for description or prediction, you should do a check to see if the assumptions concerning the normal distribution, independence and constant variance of the error terms have been satisfied. Dr. C. Ertuna

  36. Checking the Assumptions • There are assumptions that need to be met to accept the results of Regression analysis and use the model for future decision making: • Linearity • Independence of errors (No autocorrelation), • Normality of errors, • Constant Variance of errors. Dr. C. Ertuna

  37. Tests for Linearity Linearity: • Plot dependent variable against each of the independent variables separately. • Decide whether linear regression is a “Reasonable” description of the tendency in the data. • Consider curvilinear pattern, • Consider undue influence of one data point on the regression line, etc. Dr. C. Ertuna

  38. Tests for Independence Independence of Errors: (valid only for time series data) • Ljung-Box Test on Residuals Graphs / Time Series / Autocorrelations • If there is no spikes at any lag of Partial Autocorrelation Function (PAF) then the errors are independent. Dr. C. Ertuna

  39. Tests for Normality Normal Distribution of Errors: • Shapiro-Wilk’s test on Residuals Analyze / Descriptive Statistics / Explore / Plots (check: Normality Plots with Tests) Dr. C. Ertuna

  40. Tests for Constant Variance Checking Constant Variance of Errors in SPSS: • Create grouping variable for residuals: • Transform / Categorize Variables / {copy residuals into “Create Categories for” pane; in “Number of Categories” box insert [2] / Ok • The name of the new grouping variable is same as the variable name just an “n” attached in front of it. • Analyze / Descriptive Statistics / Explore {copy residuals into “Dependent List” pane; copy nresiduals into “Factor List” pane; click “Plots”, check “Untransformed” / Ok • Check the significance (Based on Means) in the Table “Test of Homogeneity of Variance” • If p-value >α Equality of Variance Dr. C. Ertuna

  41. Tests for Constant Variance Checking Constant Variance of Errors in Excel: • Copy residuals into Excel • Divide residuals into two equal groups • Compute Standard Deviation and number of observations for each group • Name the group with largest StDev as Group-1 • Run 2-Sample 2-tailed Variance test PHStat / Two Sample Tests / F-test for Differences in two Variances 6. If p-value >α Equality of Variance Dr. C. Ertuna

  42. Regression Results What is the Regression model? Do the independent variables have significant effect on dependent variable? Do the independent variables exhibit collinearity? Which independent variable has more influence on dependent variable? Data: Levine-K-B; Advertise Dr. C. Ertuna

  43. Regression Results Is the overall regression model significant? Dr. C. Ertuna

  44. How good is the explanatory power of the independent variables? Regression Results Dr. C. Ertuna

  45. Correlation Table • Coefficient of Determination: R2= (a+b+c)/(a+b+c+e) • Part Correlation: ryx1.x2 = a/(a+e) • Partial Correlation: ryx1.x2 = a/(a+b+c+e) • Zero Order Correlation: ryx1 = (a+c)/(a+b+c+e) Dr. C. Ertuna

  46. Next Lesson (Lesson - 06/B) Multiple Linear Regression Dr. C. Ertuna

More Related