1 / 33

4. Regression Analysis

4. Regression Analysis. Finding an equation that best explains a functional relationship between two or more related random variables. Remember Graphing Functions In High School…. Y is said to be a function of X; that is; if you know the value of X, you know exactly what Y is.

todd
Download Presentation

4. Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4. Regression Analysis Finding an equation that best explains a functional relationship between two or more related random variables.

  2. Remember Graphing Functions In High School…. Y is said to be a function of X; that is; if you know the value of X, you know exactly what Y is. Y = mX + b Y m = slope = rise/run = ∆Y/∆X You usually start with a function, then graph it. b X

  3. Imperfect Functional Relationships What if you had data points that suggested a statistical, functional relationship? Now you would like to draw a line that inexactly “fits” the data points, similar to what you did with a function. more staff > more sales

  4. The function would look something like this … We still have the same math relationships as before….. Y Y = m X + b …but instead of Y = mX + b; an exact functional relationship… Ŷ = b0 + b1X …where the “hat” on Y indicates it is a predicted value only! …we have… X Ŷ = b0 + b1X

  5. The Estimated Function is… The function that relates the actual data points is… Y = β0 + β1X + ε Y …where the “ε” term is the vertical error in the estimate. Ŷ = b0 + b1X …where the “hat” on Y indicates it is a predicted value only! X ε = Y - Ŷ

  6. Statistical Estimates of β0, β1

  7. Statistical Estimates from Previous Example So, b1 = 12.5/10 = 1.25, and b1 = 7 - 1.25(4) = 2 Question: In terms of statistics (rather than calculations), what is the estimate of b1?

  8. Statistics Review

  9. Sum of Squares

  10. Sum of Squares: Example Note: SST = SSE + SSR

  11. Sum of Squares Graphically

  12. Coefficient of Determination “r2” The SSR is sometimes called the explained variability in Y while the SSE is the unexplained variability in Y. The proportion of the variability in Y that is explained by the regression is called the coefficient of determination, or “r2”. In the example, r2 = 15.625/22.5 = 0.6944, meaning that about 69% of the variability in Y was explained by the regression.

  13. Correlation Coefficient The correlation coefficient “r” is simply the (positive or negative) square root of the coefficient of determination. • - 1 ≤ r ≤ + 1 • r > 0 when slope is positive, and r < 0 when slope is negative.

  14. Assumptions of the Regression Model

  15. Example of a “Good” Error Pattern

  16. Estimating the Variance The variance σ2 is typically not known. Its estimate is known as the Mean Squared Error (MSE), and is denoted by s2. Where n is the number of observations, and k is the number of independent variables In the example, MSE = 6.8750/(6-1-1) = 1.7188

  17. Standard Error The square root of the MSE is called the standard error of the estimate, or the standard deviation of the regression. The standard error is used in many tests of the model. In the example, s = √(1.7188) = 1.31

  18. Hypothesis Testing In statistics, rather than trying to prove that a relationship is important, we try to disprove, or reject, the idea that the relationship is not important. Null Hypothesis (H0): no relationship between X and Y. Alternate Hypothesis (H1): there is a relationship between X and Y. H0: β1 = 0; H1: β1 ≠ 0 We want to test to see if we can reject H0; if we reject H0, we would accept H1.

  19. The F-Test for Significance (reject H0) Define the Mean Squared Regression MSR: MSR = SSR/k, where “k” is the number of independent variables in the model. The “F-statistic” is then computed as: F = MSR/MSE where 0 ≤ F ≤ 1 = explained variability/unexplained variability In the example, MSR = SSR/k = 15.625/1 = 15.625 And F = 15.625/1.7188 = 9.0909

  20. Analysis of Variance (ANOVA) Table Regression analysis will generate several statistics that can be used to test significance and other important aspects of the regression. Using statistical software (like SASS), these statistics will be summarized in an ANOVA table like the one below.

  21. Multiple Regression Analysis In multiple regression analysis, there is more than one independent variable. The technique expands naturally for k > 1 independent variables as: Y = β0 + β1X1 + 21X2 + …+ βkXk + ε with sample estimate: Ŷ = b0 + b1X1 + b2X2 + …+ bkXk

  22. Textbook Example of Multiple Regression: House selling price is a function of size (square footage) and age. (Condition will be introduced later)

  23. Example (cont.): Multiple Regression Estimate Ŷ = b0 + b1X1 + b2X2 where X1 = sqr. Ft. and X2 = age Regression: Ŷ = 60,815 + 22X1 – 1,449X2 Thus, for example, a 1,900 sqr. ft. house that is 10 years old is estimated to cost = 60,801 + 22(1,900) – 1,449(10) = $117,105

  24. Multiple Regression Notes

  25. Binary “Dummy” Variables

  26. House-Price Example (cont.) With Dummy Variables for Condition Same example and data as before, but add the house condition into the regression with dummy variables: X3 = 1 for “excellent”; X3 = 0 otherwise X4= 1 for “mint” (i.e. perfect); X4= 0 otherwise *Note: By implication: X3 = X4 = 0 for “good” Regression: Ŷ = 48,329 + 28.2X1 – 1,981X2 + 16,581X3 + 23,684X4

  27. Non-Linear Models

  28. Textbook Example of Non-Linearity: How far a car can go with its petrol is a non-linear function of its weight. (Heavier cars don’t go as far!) MPG = b0 + b1(weight) + b2X2 where X2 = (weight)2 Regression: Ŷ = 79.8 – 30.2X1 + 3.4X2 “MPG” = miles per gallon; i.e. how far can the car go with petrol.

  29. Caution About Regression Analysis

More Related