1 / 23

Lecture 10: Intro to Multiple Regression

Lecture 10: Intro to Multiple Regression. February 17, 2014. Question. Download and open: mall_sales.csv Predict Sales by Median Household Income (in thousands of $) of the area

keegan
Download Presentation

Lecture 10: Intro to Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10: Intro to Multiple Regression February 17, 2014

  2. Question Download and open: mall_sales.csv Predict Sales by Median Household Income (in thousands of $) of the area • The estimated marginal effect of Income is approximately 6.46 ($/SqFt), but is not statistically different than zero • The estimated marginal effect of Income is approximately 6.46 ($/SqFt), and is statistically different than zero • The estimated marginal effect of Income is approximately 97.07 ($/SqFt), but is not statistically different than zero • I have no idea.

  3. Administrative • Problem Set 5 up • Due Monday at 9am • Workshop on p-values/t-statistics and hypothesis testing • Exam not graded yet (will be at least a few more days)

  4. Multiple Regression Model • The SRM is very useful but often too simple. • Many things often influence a variable of interest (sales quantity, economic growth, etc) • We can usually make better predictions with more information. • We’ll now model a response variable Y by k predictors, x1, x2, … , xk. • Many things will look familiar • Several addition complications and assumptions • Multiple Regression (MRM) generalizes the SRM and separates the effects of each explanatory variable

  5. Multiple Regression Model Still a linear model:  • In the SRM anything besides the explanatory variable was assumed to be part of the error term ε • Now, in the MRM, we can include those variables. Notice the similar structure, the errors are still assumed: • Independent of one another • Have equal variance • Normally distributed around the regression line • A departure from normality might suggest an omitted variable that we could/should possibly include.

  6. Multiple Regression Model • Example: mall_sales.csv File contains data on • Amount of sales (per sqft of space) • Median Income of the surrounding area (in thousands of $) • Number of Competitors in the same mall • Number of mall visitors per month (in thousands) • What you would hypothesize the relationship to be between • Income and Sales: • Positive? Wealthier families can shop more. • Number of Competitors and Sales • Negative? They might steal your business. • Number of Visitors and Sales? • Weakly Positive?

  7. Multiple Regression Model Example: mall_sales.csv • Predict Sales by number of Competitors in the same mall. • The each additional Competitor decreases sales on average by 12.84 ($/SqFt), and is statistically different than zero. • The each additional Competitor increases sales on average by 502.20 ($/SqFt), but is not statistically different than zero. • The each additional Competitor decreases sales on average by 105.78 ($/SqFt), and is statistically different than zero. • The each additional Competitor increases sales on average by 4.64 ($/SqFt), but is not statistically different than zero. • I have no idea.

  8. Multiple Regression Model Example: mall_sales.csv • Fit these simple regression models • Predict Sales by Median Household Income (in thousands of $) of the area • r2 =0.501 • se = 74.87 • Predict Sales by Number of Competing stores in the same mall. • r2 = 0.004 • se = 105.79 • So competitors don’t matter for sales?

  9. Multiple Regression Model Multiple Regression: use both Income and # of Competitors. • Like simple regression, it’s often helpful to initially look at scatterplots. But with multiple variables, it’s harder • Scatterplot matrix • Easy with some software, • but not so easy with Excel • Correlation matrix • Not as good but often helpful Sales Income Competition

  10. Multiple Regression Model Multiple Regression: • Fit the model using both Income and # of Competitors. • Fit the Regression Model in Excel: • se = $68.03, and same interpretation as SRM. • R2 indicates that the fitted line explains 59.47% of the Store Sales variation • The adjusted R2: • Adjusts for sample size and number of predictors.

  11. Multiple Regression Model Why adjusted R2? • R2 is the square of the correlation between the actual Y and the fitted Y • R2 will weakly increase if you add another predictor. Whether it actually predicts the Response variable or not. • The adjusted R2 penalizes you for adding predictors. You don’t want to throw everything into a model just to improve R2 • Degrees of Freedom: n – k – 1 • A generalization from the SRM (n-2)

  12. Multiple Regression Model • Interpreting slopes from the fitted model • Similar to SRM • It’s still a slope; it’s the change in Y given a 1-unit change in the xk • But now it’s statistically holding all of the other variables constant, or “controlling for” the other variables. • Coefficient on Competitors: in areas that have the same level of median income, with one additional competitor we would expect sales to drop on average $24.17 per sqft. • Coefficient on Income: if we could force the number of competitors to be the same and increase Median Income by 10K, we’d expect a increase in sales of $79.66 per sqft.

  13. Multiple Regression Model • Interpreting slopes from the fitted model • Partial vs Marginal slopes • Book calls SRM slopes marginal slopes, whereas MRM slopes are called partial because they’re trying to statistically exclude the effects from the other predictors. • If Income and Competitors were uncorrelated, then MRM estimates and SRM estimates would be the same.

  14. Multiple Regression Model • Compare the Simple Regression models to the Multivariate model: • Predict Sales by Median Household Income (in thousands of $) • r2 =0.501 • se = 74.87 • Predict Sales by Number of Competing stores in the same mall. • r2 = 0.004 • se = 105.79 • Predict Sales by both Income and # of Competitors: • Why the change in Competitors?

  15. Multiple Regression Model • Path diagrams: schematic drawing of the relationships between the variables. • We hypothesized a positive relationship between Income and Sales, and a negative one between Competition and Sales • But what about Income and Competition? • High median income means it’s attractive for you to set up shop, but that means that it’s also attractive for your competition.

  16. Multiple Regression Model • Path diagrams: schematic drawing of the relationships between the variables. • We hypothesized a positive relationship between Income and Sales, and a negative one between Competition and Sales • But what about Income and Competition? • High median income means it’s attractive for you to set up shop, but that means that it’s also attractive for your competition.  Positive relationship between Income and Competition

  17. Multiple Regression Model • Direct and Indirect effects: • Income has both a positive direct effect on sales but an indirect negative effect, via the number of competitors • Collinearity: correlation between the explanatory variables • High collinearity is a problem (we’ll talk about later). • It could cause an issue in interpreting your results. • Think about the “controlling for” aspect of interpreting the slopes. • In our example: income and competition are collinear

  18. Multiple Regression Model Check the Residuals. Same assumptions from before: • Independent of one another • Have equal variance • Normally distributed around the regression line Similar to before: • Now multiple plots of residuals • versus fitted values – the ‘y-hats’: • versus each of the explanatory variables • Identify outliers • Check similar variances condition • quantile plot of residuals to check normality.

  19. Multiple Regression Model Check the Residuals. • Plot of residuals versus fitted values of y ( ):

  20. Multiple Regression Model Check the Residuals. • Plot of residuals versus Income:

  21. Multiple Regression Model Check the Residuals. • Plot of residuals versus Competitors:

  22. Multiple Regression Model Check the Residuals. • Quantile plot of residuals

  23. Multiple Regression Model Calibration plot: another common plot to examine in Multiple regression • Scatterplot of actual response, y, on the fitted values, • Remember R2 is the correlation between these two • The tighter the data cluster along the line in the calibration plot, the larger the R2

More Related