1 / 55

Statistics and Data Analysis

Statistics and Data Analysis. . Part 18

raleigh
Download Presentation

Statistics and Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

    2. Statistics and Data Analysis

    3. Multiple Regression Models Using Minitab To Compute A Multiple Regression Basic Multiple Regression Using Binary Variables Logs and Elasticities Hedonic Regression and Interpretation Trends in Time Series Data Using Quadratic Terms to Improve the Model

    4. Application: WHO Data Used in Assignment 1: WHO data on 191 countries in 1995-1999. Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education PCHexp = Per capita health expenditure DALE = a + ß1EDUC + ß2HealthExp + e

    5. The (Famous) WHO Data

    7. Specify the Variables in the Model

    8.

    9. Graphs? Maybe

    10. Regression Results

    11. Practical Model Building Understanding the regression: The left out variable problem Using different kinds of variables Dummy variables Logs Time trend Quadratic

    12. A Fundamental Result What happens when you leave a crucial variable out of your model? (Bad things)

    13. Using Dummy Variables Dummy variable = binary variable = a variable that takes values 0 and 1. E.g. OECD Life Expectancies compared to the rest of the world: DALE = a + ß1 EDUC + ß2 PCHexp + ß3 OECD + e

    14. OECD Life Expectancy

    15. Binary Variable in Regression

    16. Plotting

    17. Two Plots

    18. Dummy Variable in Log Regression E.g., Monet’s signature equation Log$Price = a + ß1 logArea + ß2 Signed Unsigned: PriceU = exp(a) Areaß1 Signed: PriceS = exp(a) Areaß1 exp(ß2) Signed/Unsigned = exp(ß2) %Difference = 100%(Signed-Unsigned)/Unsigned = 100%[exp(ß2) – 1]

    19. The Signature Effect: 253%

    20. Monet Paintings in Millions

    21.

    22. Dummy Variable for One Observation Single out one observation for special attention. The equation will predict that observation perfectly. For the other coefficients, it is the same as removing that observation from the sample.

    23. A “London Effect” on UK Electronic Store Sales?

    25. Logs in Regression

    26. Elasticity The coefficient on log(Area) is 1.346 For each 1% increase in area, price goes up by 1.34% - even accounting for the signature effect. The elasticity is +1.34 Remarkable. Not only does price increase with area, it increases faster than area.

    27. Monet: By the Square Inch

    28. Elasticities of Demand for Gasoline

    29. Logs and Elasticities Theory: In the equation y = a + ß1x1 + ß2x2 + … ßKxK + e ß = (change in y) / (unit change in x) Elasticity = ß * mean of x / mean of y When the variables are in logs: change in logx = %change in x log y = a + ß1 log x1 + ß2 log x2 + … ßK log xK + e Elasticity = ß These will often give approximately the same answer. When in doubt, use logs.

    30. Elasticities

    31. A Set of Dummy Variables Complete set of dummy variables divides the sample into groups. Fit the regression with “group” effects. Need to drop one (any one) of the variables to compute the regression. (Avoid the “dummy variable trap.”)

    32. Rankings of 132 U.S.Liberal Arts Colleges

    33. Minitab to the Rescue

    34. Unordered Categorical Variables

    35. Transform Style to Types

    37. House Price Regression

    38. Ordered Categories Health Satisfaction: 1=Poor, 2=So_so, 3=OK, 4=Good, 5=Great How to handle such a variable? Just use as is? No, So_so – Poor =1, but this is not equal to Great – Good = 1 (necessarily) Use 4 of the indicator variables. Coding. It is not useful to consider modifications of the variable, such as -2,-1,0,1,2 or 2,4,6,8,10. None make sense as this is just a label. Could also use 1,4,8,17,26 which would also make no sense. This needs a special kind of model if it is the dependent variable – not a regression equation.

    39. Hedonic Regression A theory of prices Price = sum of prices for components House price Land size Rooms: Fixed amount per room Swimming pool View N car garage Etc. Computers Speed Screen size Other features…

    40. Fumiro Computer Data

    41. Transform Manufacturer Names to Indicator Variables

    42. Hedonic Regression

    43. Time Trends in Regression y = a + ß1x + ß2t + e ß2 is the year to year increase not explained by anything else. log y = a + ß1log x + ß2t + e (not log t, just t) 100ß2 is the year to year % increase not explained by anything else.

    44. Time Trend Regression

    45. Nonlinear Equation Using a quadratic (like using logs) y = a + ß1x + ß2x2 + e Usually ß1 > 0. If ß2 > 0 If ß2 < 0

    46. A Quadratic Income vs. Age Regression

    47. Implied By The Model

    48. Case Study: A Huge Sports Contract Alex Rodriguez hired by the Texas Rangers for something like $25 million per year. Costs – the salary plus and minus some fine tuning of the numbers Benefits – more fans in the stands. How to determine if the benefits exceed the costs? Use a regression model.

    49. PDV of the Costs Using 8% discount factor Accounting for all costs Roughly $21M to $28M in each year from 2001 to 2010, then the deferred payments from 2010 to 2020 Total costs: About $165 Million in 2001 (Present discounted value)

    50. Benefits More fans in the seats Gate Parking Merchandise Increased chance at playoffs and world series Sponsorships (Loss to revenue sharing) Franchise value

    51. How Many New Fans? Projected 8 more wins per year. What is the relationship between wins and attendance? Not known precisely Many empirical studies (The Journal of Sports Economics) Use a regression model to find out.

    52. A Regression Model Based on 10 years of baseball data on wins and attendance Approximately (depends on your model) This year’s attendance = team specific constant + 20,000 * Number of Wins + 6,000 * Last Years Number of Wins + .42 * Last Year’s Attendance + error

    53. Marginal Value of a Win Roughly, increase in this year’s attendance if the team wins one more game: (20,000 + 6,000) / (1 - .42) About 45,000 fans per year per win

    54. Marginal Value of an A Rod 8 games * 45,000 fans = 360,000 fans 360,000 fans * $18 per ticket $2.50 parking etc. $1.80 stuff (hats, bobble head dolls,…) $8.0 Million per year !!!!! It’s not close. (Marginal cost is about $16.5M / year)

    55. Postscripts (1) Texas was not out of last place for a single day while A-Rod was on the team. Was it worth it? You make the call. (2) What about the Yankees – they now pay most of the same costs. Is it worth it? How would you find out? (3) What about that David Beckham contract with Major League Soccer?

    56. Summary Using Minitab To Compute a Regression Building a Model Logs Dummy variables Qualitative variables Trends Quadratics Effects across time All Assuming You Know the Right Variables!

More Related