1 / 33

Chapter 20

Chapter 20. Linear Regression. What if…. We believe that an important relation between two measures exists? For example, we ask 5 people about their salary and education level For each observation we have two measures, and those two measures came from the same person.

analu
Download Presentation

Chapter 20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 20 Linear Regression

  2. What if… • We believe that an important relation between two measures exists? • For example, we ask 5 people about their salary and education level • For each observation we have two measures, and those two measures came from the same person

  3. What would we “predict”? • Does more education mean more salary? • Does more salary mean more education? • Does more education mean less salary? • Does more salary mean less education? • Are salary and education related?

  4. Regression • Descriptive vs. Inferential • Bivariate data - measurements on two variables for each observation • Heights (X) and weights (Y) • IQ (X) and SAT(Y) scores • Years of educ. (X) and Annual salary (Y) • Number of Policemen (X) and Number of crimes (Y) in US cities

  5. Regression • How are the two sets of scores related? • Using a scatterplot we can “look” at the relationship • Constructed by plotting each of the bivariate observations (X, Y)

  6. Regression • Which one’s X and which one’s Y? • That’s up to you, but… • Generally, the X variable is thought of as the “predictor” variable • We try to predict a Y score given an X score

  7. Regression • If the scores seem to “line up,” we call this a “linear relationship”

  8. Interpreting Scatterplots • If the following relations hold: low x - high y mid x - mid y high x - low y, • “A negative linear relationship”

  9. Interpreting Scatterplots • If the following relations hold: low x - low y mid x - mid y high x - high y, • “A positive linear relationship”

  10. Interpreting Scatterplots • However, there also can be “no relation” also

  11. Interpreting Scatterplots • Curvelinear

  12. Measuring Linear Relationships • The first measure of a linear relationship (not in the book) is COVARIANCE (sXY)

  13. Or SPXYis known as the “Sum of Products” or the sum of the products of the deviations of X and Y from their means

  14. Easy Calculation

  15. Covariance • Interpretation: • positive = positive linear relationship • negative = negative linear relationship • zero = no relationship • Magnitude (strength of the relationship)? • Uninterpretable • for example, a large covariance does not necessarily mean strong relationship

  16. But, we can use covariance • Which line best fits our data? • Do we just draw one that looks good? • No, we can use something called “least squares regression” to find the equation of the best-fit line (“Best-fit linear regression”)

  17. Linear Equations • Yi = mXi + b • m = slope • b = y-intercept

  18. Finding the Slope

  19. Or…

  20. Finding the y-intercept (b) • After finding the slope (m), find b using:

  21. Least Squares Criterion • The best line has the property of least squares • The sum of the squared deviations of the points from the line are a minimum

  22. What’s the “least” again? • What are we trying to minimize? • The best fit line will be described by the function Yi = mXi + b • Thus, for any Xi, we can estimate a corresponding Yi value • Problem: for some Xi’s we already have Yi’s • So, let’s call the estimated value (“Y-sub-I-hat”), to differentiate it from the “real” Yi

  23. Our estimated Y value is 44,000 Least Squares Criterion A “real” Y value of 35,000 • For example, when Xi = 15 we would estimate that = 44,000 But, we have a “real”Yivalue corresponding to Xi =15 (35,000) When Xi = 15

  24. Minimize this… • For every Xi, we have the a value Yi, and an estimate of Yi ( ) • Consider the quantity: • Which is the deviation of the real score from the estimated score, for any give Xi value • The sum of these deviations will be zero

  25. But, by squaring those deviations and summing, • We want the line that makes the above quantity the minimum (the least squares criterion) • This is also called the sums of squares error or SSE (how much do our estimates “err” from our real values?)

  26. How accurate are our Estimates? • Two ways to measure how “good” our estimates are: • Standard Error of the Estimate • Coefficient of Determination (not covered in our book, yet)

  27. Standard Error of the Estimate but, this term is very hard to interpret. (Hurrah, there are better ways to measure the goodness of the fit!)

  28. Coefficient of Determination cd = r2

  29. Now You:

  30. Practice:

  31. Practice:

  32. Practice:

More Related