Statistical Inference and Regression Analysis: GB.3302.30 - PowerPoint PPT Presentation

statistical inference and regression analysis gb 3302 30 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistical Inference and Regression Analysis: GB.3302.30 PowerPoint Presentation
Download Presentation
Statistical Inference and Regression Analysis: GB.3302.30

play fullscreen
1 / 117
Statistical Inference and Regression Analysis: GB.3302.30
157 Views
Download Presentation
wyatt
Download Presentation

Statistical Inference and Regression Analysis: GB.3302.30

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Statistical Inference and Regression Analysis: GB.3302.30 Professor William Greene Stern School of Business IOMS Department Department of Economics

  2. Statistics and Data Analysis Part 6 – Regression Model-1 Conditional Mean

  3. U.S. Gasoline Price 6 Months 5 Years

  4. Impact of Change in Gasoline Price on Consumer Demand? • Demand for gasoline • Long term vs. short term • Income • Elasticity concepts • Demand for food

  5. Movie Success vs. Movie Online Buzz Before Release (2009)

  6. Internet Buzz and Movie Success Box office sales vs. Can’t wait votes 3 weeks before release

  7. Is There Really a Relationship? BoxOffice is obviously not equal to f(Buzz) for some function. But, they do appear to be “related,” perhaps statistically – that is, stochastically. There is a covariance. The linear regression summarizes it. A predictor would be Box Office = a + b Buzz. Is b really > 0? What would be implied by b > 0?

  8. Covariation – Education and Life Expectancy Causality? Covariation? Does more education make people live longer? Is there a hidden driver of both? (Per capita GDP?)

  9. Using Regression to Predict The equation would not predict Titanic. Predictor: Overseas box office = a + b Domestic box officeThe prediction will not be perfect. We construct a range of “uncertainty.”

  10. Conditional Variation and Regression • Conditional distribution of a pair of random variables • f(y|x) or P(y|x) • Mean function, E[y|x] = Regression of y on x.

  11. Expected Income Depends on Household Size X=4 X=3 X=2 X=1 y|x ~ Normal[ 20 + 3x, 42 ], x = 1,2,3,4; Poisson

  12. Average Box Office by Internet Buzz Index= Average Box Office for Buzz in Interval

  13. Linear Regression?Fuel Bills vs. Number of Rooms

  14. Independent vs. Dependent Variables • Y in the model • Dependent variable • Response variable • X in the model • Independent variable: Meaning of ‘independent’ • Regressor • Covariate • Conditional vs. joint distribution

  15. Linearity and Functional Form • y = g(x) • h(y) =  + f(x) • y =  + x • y = exp( + x); logy =  + x • y =  +  (1/x) =  + f(x) • y = e x, logy =  + log x. • Etc.

  16. Inference and Regression Least Squares

  17. Fitting a Line to a Set of Points Yi Gauss’s methodof least squares. Residuals Predictionsa + bxi Choose and tominimize the sum of squared residuals Xi

  18. Least Squares Regression

  19. Least Squares Algebra

  20. Least Squares

  21. Normal Equations

  22. Computing the Least Squares Parameters a and b (We will use sy2 later.)

  23. Least Absolute Deviations

  24. Least Squares vs. LAD

  25. Inference and Regression Regression Model

  26. b Measures Covariation Predictor Box Office = a + b Buzz.

  27. Interpreting the Function a = the life expectancy associated with 0 years of education. No country has 0 average years of education. The regression only applies in the range of experience. b = the increase in life expectancy associated with each additional year of average education. b a The range of experience (education)

  28. Covariation and Causality Does more education make you live longer (on average)?

  29. Causality? Correlation = 0.84 (!) Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Estimated Income = -451 + 50.2 Height

  30. Inference and Regression Analysis of Variance

  31. Regression Fits Regression of salary vs. years Regression of fuel bill vs. number of experience of rooms for a sample of homes

  32. Regression Arithmetic

  33. Variance Decomposition

  34. Fit of the Equation to the Data

  35. Regression vs. Residual SS

  36. Analysis of Variance Table

  37. Explained Variation • The proportion of variation “explained” by the regression is called R-squared (R2) • It is also called the Coefficient of Determination • (It is the square of something – to be shown later)

  38. ANOVA Table

  39. Movie Madness Fit

  40. Regression Fits R2=0.360 R2=0.522 R2=0.424 R2=0.880

  41. R Squared Benchmarks • Aggregate time series: expect .9+ • Cross sections, .5 is good. Sometimes we do much better. • Large survey data sets, .2 is not bad. R2 = 0.924 in this cross section.

  42. Correlation Coefficient

  43. Correlations rxy = 0.6 rxy = 0.723 rxy = +1.000 rxy = -.402

  44. R-Squared is rxy2 • R-squared is the square of the correlation between yi and the predicted yi which is a + bxi. • The correlation between yi and (a+bxi) is the same as the correlation between yi and xi. • Therefore,…. • A regression with a high R2 predicts yi well.

  45. Squared Correlations rxy2 = 0.36 rxy2 = 0.522 rxy2 = .924 rxy2 = .161

  46. Movie Madness Estimated equation Estimated coefficients a and b S = se = estimated std. deviation of ε Square of the sample correlation between x and y N-2 = degrees of freedom Sum of squared residuals, Σiei2 S2 = se2

  47. Software