1 / 71

Spreadsheet Modeling Decision Analysis

Regression Analysis. . Introduction to Regression Analysis (RA). Regression Analysis is used to estimate a function f( ) that describes the relationship between a continuous dependent variable and one or more independent variables. Y = f(X1, X2, X3,, Xn) eNote: f( ) describes systematic varia

xandy
Download Presentation

Spreadsheet Modeling Decision Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science

    2. Regression Analysis

    3. Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f( ) that describes the relationship between a continuous dependent variable and one or more independent variables. Y = f(X1, X2, X3,…, Xn) + e Note: f( ) describes systematic variation in the relationship. e represents the unsystematic variation (or random error) in the relationship.

    4. In other words, the observations that we have interest can be separated into two parts: Y = f(X1, X2, X3,…, Xn) + e Observations = Model + Error Observations = Signal + Noise Ideally, the noise shall be very small, comparing to the model.

    10. An Example Consider the relationship between advertising (X1) and sales (Y) for a company. There probably is a relationship... ...as advertising increases, sales should increase. But how would we measure and quantify this relationship?

    11. A Scatter Plot of the Data

    12. The Nature of a Statistical Relationship

    13. A Simple Linear Regression Model The scatter plot shows a linear relation between advertising and sales.

    14. Determining the Best Fit Numerical values must be assigned to b0 and b1

    16. Using Solver...

    17. The Estimated Regression Function The estimated regression function is:

    18. Using the Regression Tool Excel also has a built-in tool for performing regression that: is easier to use provides a lot more information about the problem

    19. The TREND() Function TREND(Y-range, X-range, X-value for prediction) where: Y-range is the spreadsheet range containing the dependent Y variable, X-range is the spreadsheet range containing the independent X variable(s), X-value for prediction is a cell (or cells) containing the values for the independent X variable(s) for which we want an estimated value of Y. Note: The TREND( ) function is dynamically updated whenever any inputs to the function change. However, it does not provide the statistical information provided by the regression tool. It is best two use these two different approaches to doing regression in conjunction with one another.

    20. Evaluating the “Fit”

    21. The R2 Statistic The R2 statistic indicates how well an estimated regression function fits the data. 0<= R2 <=1 It measures the proportion of the total variation in Y around its mean that is accounted for by the estimated regression equation. To understand this better, consider the following graph...

    22. Error Decomposition

    23. Partition of the Total Sum of Squares

    29. An Example of Inappropriate Interpretation A study shows that, in elementary schools, the ability of spelling is stronger for the students with larger feet. ? Could we conclude that the size of foot can influence the ability of spelling? ? Or there exists another factor that can influence the foot size and the spelling ability?

    30. Making Predictions

    31. The Standard Error

    33. An Approximate Prediction Interval An approximate 95% prediction interval for a new value of Y when X1=X1h is given by

    34. An Exact Prediction Interval A (1-a)% prediction interval for a new value of Y when X1=X1h is given by

    35. Example If $65,000 is spent on advertising: 95% lower prediction interval = 397.092 - 2.306*21.489 = 347.556 95% upper prediction interval = 397.092 + 2.306*21.489 = 446.666 If we spend $65,000 on advertising we are 95% confident actual sales will be between $347,556 and $446,666. This interval is only about $20,000 wider than the approximate one calculated earlier but was much more difficult to create. The greater accuracy is not always worth the trouble.

    36. Comparison of Prediction Interval Techniques

    37. Confidence Intervals for the Mean A (1-a)% confidence interval for the true mean value of Y when X1=X1h is given by

    38. A Note About Extrapolation Predictions made using an estimated regression function may have little or no validity for values of the independent variables that are substantially different from those represented in the sample.

    39. What Does “Regression” Mean?

    40. What Does “Regression” Mean? Draw “best-fit” line free hand Find mother’s height = 60”, find average daughter’s height Repeat for mother’s height = 62”, 64”… 70”; draw “best-fit” line for these points Draw line daughter’s height = mother’s height For a given mother’s height, daughter’s height tends to be between mother’s height and mean height: “regression toward the mean”

    41. What Does “Regression” Mean?

    45. Multiple Regression Analysis Most regression problems involve more than one independent variable.

    46. Example Regression Surface for Two Independent Variables

    47. Multiple Regression Example: Real Estate Appraisal A real estate appraiser wants to develop a model to help predict the fair market values of residential properties. Three independent variables will be used to estimate the selling price of a house: total square footage number of bedrooms size of the garage

    48. Selecting the Model We want to identify the simplest model that adequately accounts for the systematic variation in the Y variable. Arbitrarily using all the independent variables may result in overfitting. A sample reflects characteristics: representative of the population specific to the sample We want to avoid fitting sample specific characteristics -- or overfitting the data.

    49. Models with One Independent Variable With simplicity in mind, suppose we fit three simple linear regression functions:

    50. Important Software Note When using more than one independent variable, all variables for the X-range must be in one contiguous block of cells (that is, in adjacent columns).

    51. Models with Two Independent Variables Now suppose we fit the following models with two independent variables:

    52. The Adjusted R2 Statistic As additional independent variables are added to a model: The R2 statistic can only increase. The Adjusted-R2 statistic can increase or decrease.

    53. A Comment On Multicollinearity It should not be surprising that adding X3 (# of bedrooms) to the model with X1 (total square footage) did not significantly improve the model. Both variables represent the same (or very similar) things -- the size of the house. These variables are highly correlated (or collinear). Multicollinearity should be avoided.

    55. Model with Three Independent Variables Now suppose we fit the following model with three independent variables:

    56. Making Predictions Let’s estimate the avg selling price of a house with 2,100 square feet and a 2-car garage:

    57. Binary Independent Variables Other types of non-quantitative factors could independent variables could be included in the analysis using binary variables.

    58. Polynomial Regression Sometimes the relationship between a dependent and independent variable is not linear.

    59. The Regression Model An appropriate regression function in this case might be,

    60. Implementing the Model

    61. Graph of Estimated Quadratic Regression Function

    62. Fitting a Third Order Polynomial Model We could also fit a third order polynomial model,

    63. Graph of Estimated Third Order Polynomial Regression Function

    64. Overfitting When fitting polynomial models, care must be taken to avoid overfitting. The adjusted-R2 statistic can be used for this purpose here also.

    65. Example: Programmer Salary Survey A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s programmer aptitude test. The years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown on the next slide.

    66. Example: Programmer Salary Survey Exper. Score Salary Exper. Score Salary 4 78 24 9 88 38 7 100 43 2 73 26.6 1 86 23.7 10 75 36.2 5 82 34.3 5 81 31.6 8 86 35.8 6 74 29 10 84 38 8 87 34 0 75 22.2 4 79 30.1 1 80 23.1 6 94 33.9 6 83 30 3 70 28.2 6 91 33 3 89 30

    67. Example: Programmer Salary Survey Multiple Regression Model Suppose we believe that salary (y) is related to the years of experience (x1) and the score on the programmer aptitude test (x2) by the following regression model: y = ?0 + ?1 x1 + ?2 x2 + ?? where y = annual salary ($000) x1 = years of experience x2 = score on programmer aptitude test

    68. Example: Programmer Salary Survey Multiple Regression Equation Using the assumption E (?) = 0, we obtain E(y ) = ?0 + ?1 x1 + ?2 x2 Estimated Regression Equation b0, b1, b2 are the least squares estimates of ?0, ?1, ?2. ? Thus y = b0 + b1x1 + b2x2.

    69. Example: Programmer Salary Survey Solving for the Estimates of ?0, ?1, ?2

    70. Example: Programmer Salary Survey Data Analysis Output The regression is Salary = 3.17 + 1.40 Exper + 0.251 Score Predictor Coef Stdev t-ratio p Constant 3.174 6.156 .52 .613 Exper 1.4039 .1986 7.07 .000 Score .25089 .07735 3.24 .005 s = 2.419 R-sq = 83.4% R-sq(adj) = 81.5%

    71. Example: Programmer Salary Survey Computer Output (continued) Analysis of Variance SOURCE DF SS MS F P Regression 2 500.33 250.16 42.76 0.000 Error 17 99.46 5.85 Total 19 599.79

More Related