E N D
1. Statistics and Data Analysis Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
2. Statistics and Data Analysis
3. Multiple Regression Models Using Minitab To Compute A Multiple Regression
Basic Multiple Regression
Using Binary Variables
Logs and Elasticities
Hedonic Regression and Interpretation
Trends in Time Series Data
Using Quadratic Terms to Improve the Model
4. Application: WHO Data Used in Assignment 1: WHO data on 191 countries in 1995-1999.
Analysis of Disability Adjusted Life Expectancy = DALE
EDUC = average years of education
PCHexp = Per capita health expenditure
DALE = a + ß1EDUC + ß2HealthExp + e
5. The (Famous) WHO Data
7. Specify the Variables in the Model
8.
9. Graphs? Maybe
10. Regression Results
11. Practical Model Building Understanding the regression: The left out variable problem
Using different kinds of variables
Dummy variables
Logs
Time trend
Quadratic
12. A Fundamental Result What happens when you leave a crucial variable out of your model? (Bad things)
13. Using Dummy Variables Dummy variable = binary variable= a variable that takes values 0 and 1.
E.g. OECD Life Expectancies compared to the rest of the world:
DALE = a + ß1 EDUC + ß2 PCHexp + ß3 OECD + e
14. OECD Life Expectancy
15. Binary Variable in Regression
16. Plotting
17. Two Plots
18. Dummy Variable in Log Regression E.g., Monet’s signature equation
Log$Price = a + ß1 logArea + ß2 Signed
Unsigned: PriceU = exp(a) Areaß1
Signed: PriceS = exp(a) Areaß1 exp(ß2)
Signed/Unsigned = exp(ß2)
%Difference = 100%(Signed-Unsigned)/Unsigned
= 100%[exp(ß2) – 1]
19. The Signature Effect: 253%
20. Monet Paintings in Millions
21.
22. Dummy Variable for One Observation Single out one observation for special attention.
The equation will predict that observation perfectly.
For the other coefficients, it is the same as removing that observation from the sample.
23. A “London Effect” on UK Electronic Store Sales?
25. Logs in Regression
26. Elasticity The coefficient on log(Area) is 1.346
For each 1% increase in area, price goes up by 1.34% - even accounting for the signature effect.
The elasticity is +1.34
Remarkable. Not only does price increase with area, it increases faster than area.
27. Monet: By the Square Inch
28. Elasticities of Demand for Gasoline
29. Logs and Elasticities Theory: In the equationy = a + ß1x1 + ß2x2 + … ßKxK + e
ß = (change in y) / (unit change in x)
Elasticity = ß * mean of x / mean of y
When the variables are in logs: change in logx = %change in x
log y = a + ß1 log x1 + ß2 log x2 + … ßK log xK + e
Elasticity = ß
These will often give approximately the same answer.
When in doubt, use logs.
30. Elasticities
31. A Set of Dummy Variables Complete set of dummy variables divides the sample into groups.
Fit the regression with “group” effects.
Need to drop one (any one) of the variables to compute the regression. (Avoid the “dummy variable trap.”)
32. Rankings of 132 U.S.Liberal Arts Colleges
33. Minitab to the Rescue
34. Unordered Categorical Variables
35. Transform Style to Types
37. House Price Regression
38. Ordered Categories Health Satisfaction:1=Poor, 2=So_so, 3=OK, 4=Good, 5=Great
How to handle such a variable?
Just use as is? No, So_so – Poor =1, but this is not equal to Great – Good = 1 (necessarily)
Use 4 of the indicator variables.
Coding. It is not useful to consider modifications of the variable, such as -2,-1,0,1,2 or 2,4,6,8,10. None make sense as this is just a label. Could also use 1,4,8,17,26 which would also make no sense.
This needs a special kind of model if it is the dependent variable – not a regression equation.
39. Hedonic Regression A theory of prices
Price = sum of prices for components
House price
Land size
Rooms: Fixed amount per room
Swimming pool
View
N car garage
Etc.
Computers
Speed
Screen size
Other features…
40. Fumiro Computer Data
41. Transform Manufacturer Names to Indicator Variables
42. Hedonic Regression
43. Time Trends in Regression y = a + ß1x + ß2t + e ß2 is the year to year increase not explained by anything else.
log y = a + ß1log x + ß2t + e (not log t, just t) 100ß2 is the year to year % increase not explained by anything else.
44. Time Trend Regression
45. Nonlinear Equation Using a quadratic (like using logs)
y = a + ß1x + ß2x2 + e
Usually ß1 > 0.
If ß2 > 0 If ß2 < 0
46. A Quadratic Income vs. Age Regression
47. Implied By The Model
48. Case Study: A Huge Sports Contract Alex Rodriguez hired by the Texas Rangers for something like $25 million per year.
Costs – the salary plus and minus some fine tuning of the numbers
Benefits – more fans in the stands.
How to determine if the benefits exceed the costs? Use a regression model.
49. PDV of the Costs Using 8% discount factor
Accounting for all costs
Roughly $21M to $28M in each year from 2001 to 2010, then the deferred payments from 2010 to 2020
Total costs: About $165 Million in 2001 (Present discounted value)
50. Benefits More fans in the seats
Gate
Parking
Merchandise
Increased chance at playoffs and world series
Sponsorships
(Loss to revenue sharing)
Franchise value
51. How Many New Fans? Projected 8 more wins per year.
What is the relationship between wins and attendance?
Not known precisely
Many empirical studies (The Journal of Sports Economics)
Use a regression model to find out.
52. A Regression Model Based on 10 years of baseball data on wins and attendance
Approximately (depends on your model)
This year’s attendance =
team specific constant
+ 20,000 * Number of Wins
+ 6,000 * Last Years Number of Wins
+ .42 * Last Year’s Attendance
+ error
53. Marginal Value of a Win Roughly, increase in this year’s attendance if the team wins one more game:
(20,000 + 6,000) / (1 - .42)
About 45,000 fans per year per win
54. Marginal Value of an A Rod 8 games * 45,000 fans = 360,000 fans
360,000 fans *
$18 per ticket
$2.50 parking etc.
$1.80 stuff (hats, bobble head dolls,…)
$8.0 Million per year !!!!! It’s not close. (Marginal cost is about $16.5M / year)
55. Postscripts (1) Texas was not out of last place for a single day while A-Rod was on the team. Was it worth it? You make the call.
(2) What about the Yankees – they now pay most of the same costs. Is it worth it? How would you find out?
(3) What about that David Beckham contract with Major League Soccer?
56. Summary Using Minitab To Compute a Regression
Building a Model
Logs
Dummy variables
Qualitative variables
Trends
Quadratics
Effects across time
All Assuming You Know the Right Variables!