1 / 113

# Discrete Choice Modeling

William Greene Stern School of Business New York University. Discrete Choice Modeling. Econometric Methodology. Objectives in Model Building. Specification : guided by underlying theory Modeling framework Functional forms Estimation : coefficients, partial effects, model implications

## Discrete Choice Modeling

E N D

### Presentation Transcript

1. William Greene Stern School of Business New York University Discrete Choice Modeling

2. Econometric Methodology

3. Objectives in Model Building • Specification: guided by underlying theory • Modeling framework • Functional forms • Estimation: coefficients, partial effects, model implications • Statistical inference: hypothesis testing • Prediction: individual and aggregate • Model assessment (fit, adequacy) and evaluation • Model extensions • Interdependencies, multiple part models • Heterogeneity • Endogeneity • Exploration: Estimation and inference methods

4. Regression Basics The “MODEL” • Modeling the conditional mean – Regression Other features of interest • Modeling quantiles • Conditional variances or covariances • Modeling probabilities for discrete choice • Modeling other features of the population

5. Application: Health Care Usage German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice.  This is a large data set.  There are altogether 27,326 observations.  The number of observations ranges from 1 to 7.  (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).  (Downloaded from the JAE Archive) Variables in the file areDOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT =  health satisfaction, coded 0 (low) - 10 (high)   DOCVIS =  number of doctor visits in last three months HOSPVIS =  number of hospital visits in last calendar yearPUBLIC =  insured in public health insurance = 1; otherwise = 0 ADDON =  insured by add-on insurance = 1; otherswise = 0 HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC =  years of schooling AGE = age in years MARRIED = marital status

6. Household Income Kernel Density Estimator Histogram

7. Regression – Income on Education ---------------------------------------------------------------------- Ordinary least squares regression ............ LHS=LOGINC Mean = -.92882 Standard deviation = .47948 Number of observs. = 887 Model size Parameters = 2 Degrees of freedom = 885 Residuals Sum of squares = 183.19359 Standard error of e = .45497 Fit R-squared = .10064 Adjusted R-squared = .09962 Model test F[ 1, 885] (prob) = 99.0(.0000) Diagnostic Log likelihood = -559.06527 Restricted(b=0) = -606.10609 Chi-sq [ 1] (prob) = 94.1(.0000) Info criter. LogAmemiya Prd. Crt. = -1.57279 --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- Constant| -1.71604*** .08057 -21.299 .0000 EDUC| .07176*** .00721 9.951 .0000 10.9707 --------+------------------------------------------------------------- Note: ***, **, * = Significance at 1%, 5%, 10% level. ----------------------------------------------------------------------

8. Specification and Functional Form ---------------------------------------------------------------------- Ordinary least squares regression ............ LHS=LOGINC Mean = -.92882 Standard deviation = .47948 Number of observs. = 887 Model size Parameters = 3 Degrees of freedom = 884 Residuals Sum of squares = 183.00347 Standard error of e = .45499 Fit R-squared = .10157 Adjusted R-squared = .09954 Model test F[ 2, 884] (prob) = 50.0(.0000) Diagnostic Log likelihood = -558.60477 Restricted(b=0) = -606.10609 Chi-sq [ 2] (prob) = 95.0(.0000) Info criter. LogAmemiya Prd. Crt. = -1.57158 --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- Constant| -1.68303*** .08763 -19.207 .0000 EDUC| .06993*** .00746 9.375 .0000 10.9707 FEMALE| -.03065 .03199 -.958 .3379 .42277 --------+-------------------------------------------------------------

9. Interesting Partial Effects ---------------------------------------------------------------------- Ordinary least squares regression ............ LHS=LOGINC Mean = -.92882 Standard deviation = .47948 Number of observs. = 887 Model size Parameters = 5 Degrees of freedom = 882 Residuals Sum of squares = 171.87964 Standard error of e = .44145 Fit R-squared = .15618 Adjusted R-squared = .15235 Model test F[ 4, 882] (prob) = 40.8(.0000) Diagnostic Log likelihood = -530.79258 Restricted(b=0) = -606.10609 Chi-sq [ 4] (prob) = 150.6(.0000) Info criter. LogAmemiya Prd. Crt. = -1.62978 --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- Constant| -5.26676*** .56499 -9.322 .0000 EDUC| .06469*** .00730 8.860 .0000 10.9707 FEMALE| -.03683 .03134 -1.175 .2399 .42277 AGE| .15567*** .02297 6.777 .0000 50.4780 AGE2| -.00161*** .00023 -7.014 .0000 2620.79 --------+-------------------------------------------------------------

10. Modeling Categorical Variables • Theoretical foundations • Econometric methodology • Models • Statistical bases • Econometric methods • Applications

11. Categorical Variables • Observed outcomes • Inherently discrete: number of occurrences, e.g., family size • Multinomial: The observed outcome indexes a set of unordered labeled choices. • Implicitly continuous: The observed data are discrete by construction, e.g., revealed preferences; our main subject • Implications • For model building • For analysis and prediction of behavior

12. Simple Binary Choice: Insurance

13. Multinomial Unordered Choice

14. Ordered OutcomeSelf Reported Health Satisfaction

15. Binary Choice Models

16. A Random Utility Approach Underlying Preference Scale, U*(choices) Revelation of Preferences: U*(choices) < 0 Choice “0” U*(choices) > 0 Choice “1”

17. Binary Outcome: Visit Doctor

18. A Model for Binary Choice Yes or No decision (Buy/NotBuy, Do/NotDo) Example, choose to visit physician or not Model: Net utility of visit at least once Uvisit = +1Age + 2Income + Sex +  Choose to visit if net utility is positive Net utility = Uvisit – Unot visit Data: X = [1,age,income,sex] y = 1 if choose visit,  Uvisit > 0, 0 if not. Random Utility

19. Modeling the Binary Choice Uvisit =  + 1 Age + 2 Income + 3 Sex +  Chooses to visit: Uvisit > 0  + 1 Age + 2 Income + 3 Sex +  > 0  > -[ + 1 Age + 2 Income + 3 Sex ] Choosing Between the Two Alternatives

20. Probability Model for Choice Between Two Alternatives Probability is governed by , the random part of the utility function.  > -[ + 1Age + 2Income + 3Sex]

21. Application 27,326 Observations 1 to 7 years, panel 7,293 households observed We use the 1994 year, 3,337 household observations

22. Binary Choice Data

23. An Econometric Model Choose to visit iff Uvisit > 0 Uvisit =  + 1 Age + 2 Income + 3 Sex +  Uvisit > 0   > -( + 1 Age + 2 Income + 3 Sex)  <  + 1 Age + 2 Income + 3 Sex Probability model: For any person observed by the analyst, Prob(visit) = Prob[ < + 1 Age + 2 Income + 3 Sex] Note the relationship between the unobserved  and the outcome

24. +1Age + 2 Income + 3 Sex

25. Modeling Approaches Nonparametric – “relationship” Minimal Assumptions Minimal Conclusions Semiparametric – “index function” Stronger assumptions Robust to model misspecification (heteroscedasticity) Still weak conclusions Parametric – “Probability function and index” Strongest assumptions – complete specification Strongest conclusions Possibly less robust. (Not necessarily)

26. Nonparametric Regressions P(Visit)=f(Age) P(Visit)=f(Income)

27. Klein and Spady SemiparametricNo specific distribution assumed Note necessary normalizations. Coefficients are relative to FEMALE. Prob(yi = 1 | xi ) =G(’x) G is estimated by kernel methods

28. Fully Parametric • Index Function: U* = β’x + ε • Observation Mechanism: y = 1[U* > 0] • Distribution: ε ~ f(ε); Normal, Logistic, … • Maximum Likelihood Estimation:Max(β) logL = Σi log Prob(Yi = yi|xi)

29. Parametric: Logit Model What do these mean?

30. Parametric Model Estimation How to estimate , 1, 2, 3? The technique of maximum likelihood Prob[y=1] = Prob[ > -( + 1 Age + 2 Income + 3 Sex)] Prob[y=0] = 1 - Prob[y=1] Requires a model for the probability

31. Completing the Model: F() The distribution Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric Others… Does it matter? Yes, large difference in estimates Not much, quantities of interest are more stable.

32. Estimated Binary Choice Models Ignore the t ratios for now.

33. Effect on Predicted Probability of an Increase in Age  + 1 (Age+1) + 2 (Income) + 3Sex (1 is positive)

34. Partial Effects in Probability Models Prob[Outcome] = some F(+1Income…) “Partial effect” = F(+1Income…) / ”x” (derivative) Partial effects are derivatives Result varies with model Logit: F(+1Income…) /x = Prob * (1-Prob)  Probit:  F(+1Income…)/x = Normal density  Extreme Value:  F(+1Income…)/x = Prob * (-log Prob)  Scaling usually erases model differences

35. Estimated Partial Effects

36. Partial Effect for a Dummy Variable Prob[yi = 1|xi,di] = F(’xi+di) = conditional mean Partial effect of d Prob[yi = 1|xi, di=1] - Prob[yi = 1|xi, di=0] Probit:

37. Partial Effect – Dummy Variable

38. Computing Partial Effects Compute at the data means? Simple Inference is well defined. Average the individual effects More appropriate? Asymptotic standard errors are problematic.

39. Average Partial Effects

40. Average Partial Effects vs. Partial Effects at Data Means

41. Practicalities of Nonlinearities The software does not know that agesq = age2. PROBIT ; Lhs=doctor ; Rhs=one,age,agesq,income,female ; Partial effects \$ PROBIT ; Lhs=doctor ; Rhs=one,age,age*age,income,female \$ PARTIALS ; Effects : age \$ The software knows that age * age is age2.

42. Partial Effect for Nonlinear Terms

43. Average Partial Effect: Averaged over Sample Incomes and Genders for Specific Values of Age

44. Measures of Fit in Binary Choice Models

45. How Well Does the Model Fit? There is no R squared. Least squares for linear models is computed to maximize R2 There are no residuals or sums of squares in a binary choice model The model is not computed to optimize the fit of the model to the data How can we measure the “fit” of the model to the data? “Fit measures” computed from the log likelihood “Pseudo R squared” = 1 – logL/logL0 Also called the “likelihood ratio index” Others… - these do not measure fit. Direct assessment of the effectiveness of the model at predicting the outcome

46. Log Likelihoods • logL = ∑i log density (yi|xi,β) • For probabilities • Density is a probability • Log density is < 0 • LogL is < 0 • For other models, log density can be positive or negative. • For linear regression, logL=-N/2(1+log2π+log(e’e/N)] • Positive if s2 < .058497

47. Likelihood Ratio Index

48. The Likelihood Ratio Index Bounded by 0 and 1-ε Rises when the model is expanded Values between 0 and 1 have no meaning Can be strikingly low. Should not be used to compare models Use logL Use information criteria to compare nonnested models

49. Fit Measures Based on LogL ---------------------------------------------------------------------- Binary Logit Model for Binary Choice Dependent variable DOCTOR Log likelihood function -2085.92452 Full model LogL Restricted log likelihood -2169.26982 Constant term only LogL0 Chi squared [ 5 d.f.] 166.69058 Significance level .00000 McFadden Pseudo R-squared .0384209 1 – LogL/logL0 Estimation based on N = 3377, K = 6 Information Criteria: Normalization=1/N Normalized Unnormalized AIC 1.23892 4183.84905 -2LogL + 2K Fin.Smpl.AIC 1.23893 4183.87398 -2LogL + 2K + 2K(K+1)/(N-K-1) Bayes IC 1.24981 4220.59751 -2LogL + KlnN Hannan Quinn 1.24282 4196.98802 -2LogL + 2Kln(lnN) --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- |Characteristics in numerator of Prob[Y = 1] Constant| 1.86428*** .67793 2.750 .0060 AGE| -.10209*** .03056 -3.341 .0008 42.6266 AGESQ| .00154*** .00034 4.556 .0000 1951.22 INCOME| .51206 .74600 .686 .4925 .44476 AGE_INC| -.01843 .01691 -1.090 .2756 19.0288 FEMALE| .65366*** .07588 8.615 .0000 .46343 --------+-------------------------------------------------------------

More Related