Statistics and Data Analysis

1. Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

2. Statistics and Data Analysis

3. Multiple Regression Models Using Minitab To Compute A Multiple Regression Basic Multiple Regression Using Binary Variables Logs and Elasticities Hedonic Regression and Interpretation Trends in Time Series Data Using Quadratic Terms to Improve the Model

4. Application: WHO Data Used in Assignment 1: WHO data on 191 countries in 1995-1999. Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education PCHexp = Per capita health expenditure DALE = a + �1EDUC + �2HealthExp + e

5. The (Famous) WHO Data

7. Specify the Variables in the Model

8.

9. Graphs? Maybe

10. Regression Results

11. Practical Model Building Understanding the regression: The left out variable problem Using different kinds of variables Dummy variables Logs Time trend Quadratic

12. A Fundamental Result What happens when you leave a crucial variable out of your model? (Bad things)

13. Using Dummy Variables Dummy variable = binary variable= a variable that takes values 0 and 1. E.g. OECD Life Expectancies compared to the rest of the world: DALE = a + �1 EDUC + �2 PCHexp + �3 OECD + e

14. OECD Life Expectancy

15. Binary Variable in Regression

16. Plotting

17. Two Plots

18. Dummy Variable in Log Regression E.g., Monet�s signature equation Log$Price = a + �1 logArea + �2 Signed Unsigned: PriceU = exp(a) Area�1 Signed: PriceS = exp(a) Area�1 exp(�2) Signed/Unsigned = exp(�2) %Difference = 100%(Signed-Unsigned)/Unsigned = 100%[exp(�2) � 1]

19. The Signature Effect: 253%

20. Monet Paintings in Millions

21.

22. Dummy Variable for One Observation Single out one observation for special attention. The equation will predict that observation perfectly. For the other coefficients, it is the same as removing that observation from the sample.

23. A �London Effect� on UK Electronic Store Sales?

25. Logs in Regression

26. Elasticity The coefficient on log(Area) is 1.346 For each 1% increase in area, price goes up by 1.34% - even accounting for the signature effect. The elasticity is +1.34 Remarkable. Not only does price increase with area, it increases faster than area.

27. Monet: By the Square Inch

28. Elasticities of Demand for Gasoline

29. Logs and Elasticities Theory: In the equationy = a + �1x1 + �2x2 + � �KxK + e � = (change in y) / (unit change in x) Elasticity = � * mean of x / mean of y When the variables are in logs: change in logx = %change in x log y = a + �1 log x1 + �2 log x2 + � �K log xK + e Elasticity = � These will often give approximately the same answer. When in doubt, use logs.

30. Elasticities

31. A Set of Dummy Variables Complete set of dummy variables divides the sample into groups. Fit the regression with �group� effects. Need to drop one (any one) of the variables to compute the regression. (Avoid the �dummy variable trap.�)

32. Rankings of 132 U.S.Liberal Arts Colleges

33. Minitab to the Rescue

34. Unordered Categorical Variables

35. Transform Style to Types

37. House Price Regression

38. Ordered Categories Health Satisfaction:1=Poor, 2=So_so, 3=OK, 4=Good, 5=Great How to handle such a variable? Just use as is? No, So_so � Poor =1, but this is not equal to Great � Good = 1 (necessarily) Use 4 of the indicator variables. Coding. It is not useful to consider modifications of the variable, such as -2,-1,0,1,2 or 2,4,6,8,10. None make sense as this is just a label. Could also use 1,4,8,17,26 which would also make no sense. This needs a special kind of model if it is the dependent variable � not a regression equation.

39. Hedonic Regression A theory of prices Price = sum of prices for components House price Land size Rooms: Fixed amount per room Swimming pool View N car garage Etc. Computers Speed Screen size Other features�

40. Fumiro Computer Data

41. Transform Manufacturer Names to Indicator Variables

42. Hedonic Regression

43. Time Trends in Regression y = a + �1x + �2t + e �2 is the year to year increase not explained by anything else. log y = a + �1log x + �2t + e (not log t, just t) 100�2 is the year to year % increase not explained by anything else.

44. Time Trend Regression

45. Nonlinear Equation Using a quadratic (like using logs) y = a + �1x + �2x2 + e Usually �1 > 0. If �2 > 0 If �2 < 0

46. A Quadratic Income vs. Age Regression

47. Implied By The Model

48. Case Study: A Huge Sports Contract Alex Rodriguez hired by the Texas Rangers for something like $25 million per year. Costs � the salary plus and minus some fine tuning of the numbers Benefits � more fans in the stands. How to determine if the benefits exceed the costs? Use a regression model.

49. PDV of the Costs Using 8% discount factor Accounting for all costs Roughly $21M to $28M in each year from 2001 to 2010, then the deferred payments from 2010 to 2020 Total costs: About $165 Million in 2001 (Present discounted value)

50. Benefits More fans in the seats Gate Parking Merchandise Increased chance at playoffs and world series Sponsorships (Loss to revenue sharing) Franchise value

51. How Many New Fans? Projected 8 more wins per year. What is the relationship between wins and attendance? Not known precisely Many empirical studies (The Journal of Sports Economics) Use a regression model to find out.

52. A Regression Model Based on 10 years of baseball data on wins and attendance Approximately (depends on your model) This year�s attendance = team specific constant + 20,000 * Number of Wins + 6,000 * Last Years Number of Wins + .42 * Last Year�s Attendance + error

53. Marginal Value of a Win Roughly, increase in this year�s attendance if the team wins one more game: (20,000 + 6,000) / (1 - .42) About 45,000 fans per year per win

54. Marginal Value of an A Rod 8 games * 45,000 fans = 360,000 fans 360,000 fans * $18 per ticket $2.50 parking etc. $1.80 stuff (hats, bobble head dolls,�) $8.0 Million per year !!!!! It�s not close. (Marginal cost is about $16.5M / year)

55. Postscripts (1) Texas was not out of last place for a single day while A-Rod was on the team. Was it worth it? You make the call. (2) What about the Yankees � they now pay most of the same costs. Is it worth it? How would you find out? (3) What about that David Beckham contract with Major League Soccer?

56. Summary Using Minitab To Compute a Regression Building a Model Logs Dummy variables Qualitative variables Trends Quadratics Effects across time All Assuming You Know the Right Variables!

Statistics and Data Analysis

Statistics and Data Analysis

Presentation Transcript

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis

Statistics and Data Analysis