1 / 36

Introduction to Regression Analysis for Predicting Sales in Social Sciences

This lecture introduces the concept of regression analysis for predicting sales, using correlation, regression lines, and equations. The lecture also covers the calculation of coefficients of correlation and determination, as well as the standard error of the estimate.

popp
Download Presentation

Introduction to Regression Analysis for Predicting Sales in Social Sciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistics for the Social SciencesSBS200, COMM200, GEOG200, PA200, POL200, or SOC200Lecture Section 001, Spring 2016Room 150 Harvill Building9:00 - 9:50 Mondays, Wednesdays & Fridays Welcome

  2. Homework • On class website: • No Homework Due: Wednesday, April 20th

  3. By the end of lecture today4/18/16 Simple and Multiple Regression Using correlation for predictions r versus r2 Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)Coefficient of correlation is name for “r”Coefficient of determination is name for “r2”(remember it is always positive – no direction info)Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation) Coefficient of regression will “b” for each variable (like slope)

  4. Schedule of readings Before our fourth and final exam (May 2nd) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

  5. Lab sessions Labs will meet this week Project 4

  6. Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least)productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold. Review

  7. Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation What are we predicting? Review

  8. Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation Review

  9. Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation Review

  10. Rory’s Regression: Predicting sales from number of visits (sales calls) Describe relationship Regression line (and equation) r = 0.71 Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Predict using regression line (and regression equation) b= 11.579 (slope) Slope: as sales calls increase by 1, sales should increase by 11.579 Dependent Variable Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems a = 20.526 (intercept) Independent Variable Review

  11. You should sell 32.105 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Madison Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Joshua If make one sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 What should you expect from a salesperson who makes 1 calls? They should sell 32.105 systems If they sell more  over performing If they sell fewer  underperforming Review

  12. You should sell 43.684 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Isabella Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Jacob If make two sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 What should you expect from a salesperson who makes 2 calls? They should sell 43.68 systems If they sell more  over performing If they sell fewer  underperforming Review

  13. You should sell 55.263 systems Ava Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Emma Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make three sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(3) Y’ = 55.263 What should you expect from a salesperson who makes 3 calls? They should sell 55.263 systems If they sell more  over performing If they sell fewer  underperforming Review

  14. You should sell 66.84 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Emily Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make four sales calls Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(4) Y’ = 66.842 What should you expect from a salesperson who makes 4 calls? They should sell 66.84 systems If they sell more  over performing If they sell fewer  underperforming Review

  15. Does the prediction line perfectly the predicted variable when using the predictor variable? No, we are wrong sometimes… How can we estimate how much “error” we have? Exactly? Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) 14.7 How would we find our “average residual”? -23.7 The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions Review

  16. Residual scores How do we find the average amount of error in our prediction Ava is 14.7 Jacob is -23.7 Emily is -6.8 Madison is 7.9 The average amount by which actual scores deviate on either side of the predicted score Step 1: Find error for each value (just the residuals) Y – Y’ Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Step 2: Add up the residuals Big problem Σ(Y – Y’) = 0 Square the deviations Σ(Y – Y’) 2 How would we find our “average residual”? Square root Σ(Y – Y’) 2 Σx The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions N Divide by df n - 2 Review

  17. How do we find the average amount of error in our prediction Deviation scores Diallo is 0” Preston is 2” Mike is -4” Step 1: Find error for each value (just the residuals) Hunter is -2 Y – Y’ Sound familiar?? Step 2: Find average √ Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) ∑(Y – Y’)2 n - 2 How would we find our “average residual”? Σx The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions N Review

  18. Standard error of the estimate (line) = These would be helpful to know by heart – please memorize these formula Review

  19. How well does the prediction line predict the predicted variable when using the predictor variable? What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate (line) Standard error of the estimate: • a measure of the average amount of predictive error • the average amount that Y’ scores differ from Y scores • a mean of the lengths of the green lines • Slope doesn’t give “variability” info • Intercept doesn’t give “variability” info • Correlation “r” does give “variability” info • Residuals do give “variability” info

  20. How well does the prediction line predict the Ys from the Xs? Residuals • Shorter green lines suggest better prediction – smaller error • Longer green lines suggest worse prediction – larger error • Why are green lines vertical? • Remember, we are predicting the variable on the Y axis • So, error would be how we are wrong about Y (vertical)

  21. Does the prediction line perfectly the predicted variable when using the predictor variable? No, we are wrong sometimes… How can we estimate how much “error” we have? 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) -23.7 The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

  22. Regression Analysis – Least Squares Principle When we calculate the regression line we try to: • minimize distance between predicted Ys and actual (data) Y points (length of green lines) • remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) • so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

  23. Which minimizes error better? Is the regression line better than just guessing the mean of the Y variable?How much does the information about the relationship actually help? How much better does the regression line predict the observed results? r2 Wow!

  24. What is r2? r2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? .64 because (.8)2 = .64

  25. What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? .36 because (1.0 - .64) = .36 or 36% because 100% - 64% = 36%

  26. What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? .25 because (.5)2 = .25

  27. What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? .75 because (1.0 - .25) = .75 or 75% because 100% - 25% = 75%

  28. Some useful terms • Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) • Coefficient of correlation is name for “r” • Coefficient of determination is name for “r2”(remember it is always positive – no direction info) • Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation)

  29. Thank you! See you next time!!

More Related