1 / 34

Chapters 8, 9, 10 Linear Regression

Chapters 8, 9, 10 Linear Regression. Fitting a Line to Bivariate Data. Basic Terminology. Explanatory variable : explains or causes changes in the other variable; the x variable. (independent variable)

juana
Download Presentation

Chapters 8, 9, 10 Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapters 8, 9, 10Linear Regression Fitting a Line to Bivariate Data

  2. Basic Terminology • Explanatory variable: explains or causes changes in the other variable; the x variable. (independent variable) • Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)

  3. Simplest Relationship • Simplest equation that describes the dependence of variable y on variable x y = b0 + b1x • linear equation • graph is line with slope b1 and y-intercept b0

  4. Graph y=b0 +b1x y rise Slope b=rise/run b0 run 0 x

  5. Notation • (x1, y1), (x2, y2), . . . , (xn, yn) • draw the line y= b0 + b1x through the scatterplot , the point on the line corresponding to xi is

  6. Observed y, Predicted y predicted y when x=2.7 = b0 + b1x = b0 + b1*2.7 2.7

  7. Scatterplot: Fuel Consumption vs Car Weight “Best” line?

  8. Scatterplot with least squares prediction line

  9. How do we determine the line? Use residuals.

  10. Residuals: graphically

  11. Criterion for choosing what line to draw: method of least squares • The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible • This line has slope b1 and intercept b0 that minimizes

  12. Least Squares Line y = b0 + b1x: Slope b1 and Intercept b0

  13. Car Weight, Fuel Consumption Example, cont. (xi, yi): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

  14. col. sum

  15. Calculations

  16. Scatterplot with least squares prediction line

  17. The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)

  18. Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)

  19. Be Careful! Fuel consumption of 500 lb car? (x = .5) x = .5 is outside the range of the x-data that we used to determine the least squares line

  20. Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot

  21. r2 : The Variation Accounted For • The square of the correlation coefficient r gives important information about the usefulness of the least squares line

  22. r2: important information for evaluating the usefulness of the least squares line -1 ≤ r ≤ 1 implies 0 ≤ r2 ≤ 1 The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the least squares regression of y on x. The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by differences in x.

  23. March Madness: S(k) Sagarin rating of kth seeded team; Yij =Vegas point spread between seed i and seed j, i<j 94.8% of the variation in point spreads is explained by the variation in Sagarin rating differences

  24. SAT scores: result r2 = (-.86845)2= .7542 Approx. 75.4% of the variation in mean SAT math scores is explained by differences in the percent of seniors taking the SAT.

  25. Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot

  26. Residuals • residual =observed y - predicted y = y - y • Properties of residuals • The residuals always sum to 0 (therefore the mean of the residuals is 0) • The least squares line always goes through the point (x, y)

  27. Graphicallyresidual = y - y y yi yi ei=yi - yi X xi

  28. Residual Plot • Residuals help us determine if fitting a least squares line to the data makes sense • When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind • We make a scatterplot of the residuals in the hope of finding… NOTHING!

  29. Residuals: Sagarin Ratings and Point Spreads Yij Predicted Yij Residuals 20 23.48573586 -3.485735859 24 21.3717734 2.628226598 18 13.96719139 4.032808608 11 11.52185104 -0.521851036 6 5.774158519 0.225841481 8.5 7.613877198 0.886122802 4 1.683355495 2.316644505 4 2.186135755 1.813864245 28 27.26801463 0.731985367 16 15.53266629 0.467333708 11.5 10.56199781 0.938002187 12 10.11635167 1.883648327 4 5.397073324 -1.397073324 7 6.836853159 0.163146841 -1.5 1.500526309 -3.000526309 2 1.946172449 0.053827551 Yij Predicted Yij Residuals 25 23.58857728 1.411422725 18.5 18.34366502 0.156334982 10.5 12.85878945 -2.358789455 11.5 10.95050983 0.549490168 4.5 2.597501422 1.902498578 5 6.631170326 -1.631170326 4 3.203123099 0.796876901 -3.5 0.095026946 -3.595026946 23 24.15991848 -1.15991848 20.5 21.24607834 -0.746078337 18 20.0919691 -2.091969104 10.5 11.62469245 -1.124692453 9 6.836853159 2.163146841 7 5.979841353 1.020158647 2 3.283110867 -1.283110867 5 6.745438567 -1.745438567

  30. Plot of Sagarin Residuals

  31. Linear Relationship?

  32. Garbage In Garbage Out

  33. Residual Plot – Clue to GIGO

More Related