Discrete Choice Modeling

William Greene Stern School of Business IFS at UCL February 11-13, 2004 Discrete Choice Modeling

Discrete Choice Modeling • Econometric Methodology • Binary Choice Models • Multinomial Choice • Model Building • Specification • Estimation • Analysis • Applications • NLOGIT Software

Our Agenda • Methodology • Discrete Choice Models • Binary Choice Models • Panel Data Models for Binary Choice • Introduction to NLOGIT • Discrete Choice Settings • The Multinomial Logit Model • Heteroscedasticity in Utility Functions • Nested Logit Modeling • Latent Class Models • Mixed Logit Models and Simulation Based Estimation • Revealed and Stated Preference Data Sets

Part 1 Methodology

Measurement as Observation Population Measurement Theory Characteristics Behavior Patterns

Individual Behavioral Modeling • Assumptions about behavior • Common elements across individuals • Unique elements • Prediction • Population aggregates • Individual behavior

Modeling Choice • Activity as choices • Preferences • Behavioral axioms • Choice as utility maximization

Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices

Econometric Frameworks • Nonparametric • Parametric • Classical (Sampling Theory) • Bayesian

Likelihood Based Inference Methods Behavioral Theory Likelihood Function Statistical Theory Observed Measurement The likelihood function embodies the theoretical description of the population. Characteristics of the population are inferred from the characteristics of the likelihood function. (Bayesian and Classical)

Modeling Discrete Choice • Theoretical foundations • Econometric methodology • Models • Statistical bases • Econometric methods • Estimation with econometric software • Applications

Part 2 Basics of Discrete Choice Modeling

Modeling Consumer Choice:Continuous Measurement Example: Travel expenditure based on price and income Expenditure Low price • • What do we measure? • What is revealed by the data? • What is the underlying model? • What are the empirical tools? High price Income

Discrete Choice • Observed outcomes • Inherently discrete: number of occurrences (e.g., family size; considered separately) • Implicitly continuous: the observed data are discrete by construction (e.g., revealed preferences; our main subject) • Implications • For model building • For analysis and prediction of behavior

Two Fundamental Building Blocks • Underlying Behavioral Theory: Random utility model The link between underlying behavior and observed data • Empirical Tool: Stochastic, parametric model for binary choice A platform for models of discrete choice

Random Utility A Theoretical Proposition About Behavior • Consumer making a choice among several alternatives • Example, brand choice (car, food) • Choice setting for a consumer: Notation Consumer i, i = 1, …, N Choice setting t, t = 1, …, Ti (may be one) Choice set j, j = 1,…, Ji (may be fixed)

Behavioral Assumptions • Preferences are transitive and complete wrt choice situations • Utility is defined over alternatives: Uijt • Utility maximization assumption If Ui1t > Ui2t, consumer chooses alternative 1, not alternative 2. • Revealed preference (duality) If the consumer chooses alternative 1 and not alternative 2, then Ui1t> Ui2t.

Random Utility Functions Uitj = j+i ’xitj+ i’zit + ijt j = Choice specific constant xitj = Attributes of choice presented to person i = Person specific taste weights zit = Characteristics of the person i = Weights on person specific characteristics ijt = Unobserved random component of utility Mean: E[ijt] = 0, Var[ijt] = 1

Part 3 Modeling Binary Choice

A Model for Binary Choice • Yes or No decision (Buy/Not buy) • Example, choose to fly or not to fly to a destination when there are alternatives. • Model: Net utility of flying Ufly = +1Cost + 2Time + Income +  Choose to fly if net utility is positive • Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, Ufly > 0, 0 if not.

What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) • Are the attributes “relevant?” • Predicting behavior • Individual • Aggregate • Analyze changes in behavior when • attributes change

Application • 210 Commuters Between Sydney and Melbourne • Available modes = Air, Train, Bus, Car • Observed: • Choice • Attributes: Cost, terminal time, other • Characteristics: Household income • First application: Fly or other

Binary Choice Data Choose Air Gen.Cost Term Time Income 1.0000 86.000 25.000 70.000 .00000 67.000 69.000 60.000 .00000 77.000 64.000 20.000 .00000 69.000 69.000 15.000 .00000 77.000 64.000 30.000 .00000 71.000 64.000 26.000 .00000 58.000 64.000 35.000 .00000 71.000 69.000 12.000 .00000 100.00 64.000 70.000 1.0000 158.00 30.000 50.000 1.0000 136.00 45.000 40.000 1.0000 103.00 30.000 70.000 .00000 77.000 69.000 10.000 1.0000 197.00 45.000 26.000 .00000 129.00 64.000 50.000 .00000 123.00 64.000 70.000

An Econometric Model • Choose to fly iff UFLY> 0 • Ufly = +1Cost + 2Time + Income +  • Ufly> 0   > -(+1Cost + 2Time + Income) • Probability model: For any person observed by the analyst, Prob(fly) = Prob[ > -(+1Cost + 2Time + Income)] • Note the relationship between the unobserved  and the outcome

+1Cost + 2TTime + Income

Econometrics • How to estimate , 1, 2, ? • It’s not regression • The technique of maximum likelihood • Prob[y=1] = Prob[ > -(+1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1] • Requires a model for the probability

Completing the Model: F() • The distribution • Normal: PROBIT, natural for behavior • Logistic: LOGIT, allows “thicker tails” • Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice • Does it matter? • Yes, large difference in estimates • Not much, quantities of interest are more stable.

Estimated Binary Choice Models LOGITPROBITEXTREMEVALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant 1.78458 1.40591 0.438772 0.702406 1.45189 1.34775 GC 0.0214688 3.15342 0.012563 3.41314 0.0177719 3.14153 TTME -0.098467 -5.9612 -0.0477826 -6.65089 -0.0868632 -5.91658 HINC 0.0223234 2.16781 0.0144224 2.51264 0.0176815 2.02876 Log-L -80.9658 -84.0917 -76.5422 Log-L(0) -123.757 -123.757 -123.757

Effect on predicted probability of an increase in income +1Cost + 2Time + (Income+1) ( is positive)

Marginal Effects in Probability Models • Prob[Outcome] = some F(+1Cost…) • “Partial effect” =  F(+1Cost…) / ”x” (derivative) • Partial effects are derivatives • Result varies with model • Logit:  F(+1Cost…) / x = Prob * (1-Prob) *  • Probit:  F(+1Cost…) / x = Normal density  • Scaling usually erases model differences

The Delta Method

Marginal Effects for Binary Choice • Logit • Probit

Estimated Marginal Effects Logit Probit Extreme Value

Marginal Effect for a Dummy Variable • Prob[yi = 1|xi,di] = F(’xi+di) =conditional mean • Marginal effect of d Prob[yi = 1|xi,di=1]=Prob[yi= 1|xi,di=0] • Logit:

(Marginal) Effect – Dummy Variable • HighIncm = 1(Income > 50) +-------------------------------------------+ | Partial derivatives of probabilities with | | respect to the vector of characteristics. | | They are computed at the means of the Xs. | | Observations used are All Obs. | +-------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Characteristics in numerator of Prob[Y = 1] Constant .4750039483 .23727762 2.002 .0453 GC .3598131572E-02 .11354298E-02 3.169 .0015 102.64762 TTME -.1759234212E-01 .34866343E-02 -5.046 .0000 61.009524 Marginal effect for dummy variable is P|1 - P|0. HIGHINCM .8565367181E-01 .99346656E-01 .862 .3886 .18571429 (Autodetected)

Computing Effects • Compute at the data means? • Simple • Inference is well defined • Average individual effects • More appropriate? • Asymptotic standard errors. (Not done correctly in the literature – terms are correlated!)

Elasticities • Elasticity = • How to compute standard errors? • Delta method • Bootstrap • Bootstrap the individual elasticities? (Will neglect variation in parameter estimates.) • Bootstrap model estimation?

Estimated Income Elasticity for Air Choice Model +------------------------------------------+ | Results of bootstrap estimation of model.| | Model has been reestimated 25 times. | | Statistics shown below are centered | | around the original estimate based on | | the original full sample of observations.| | Result is ETA = .71183 | | bootstrap samples have 840 observations.| | Estimate RtMnSqDev Skewness Kurtosis | | .712 .266 -.779 2.258 | | Minimum = .125 Maximum = 1.135 | +------------------------------------------+ Mean Income = 34.55, Mean P = .2716, Estimated ME = .004539, Estimated Elasticity=0.5774.

Odds Ratio – Logit Model Only • Effect Measure? “Effect of a unit change in the odds ratio.”

Inference for Odds Ratios • Logit coefficient = , estimate = b • Coefficient = exp(), estimate = exp(b) • Standard error = exp(b) times se(b) • t ratio is the same

How Well Does the Model Fit? • There is no R squared • “Fit measures” computed from log L • “pseudo R squared = 1 – logL0/logL • Others… - these do not measure fit. • Direct assessment of the effectiveness of the model at predicting the outcome

Fit Measures for Binary Choice • Likelihood Ratio Index • Bounded by 0 and 1 • Rises when the model is expanded • Cramer (and others)

Fit Measures for the Logit Model +----------------------------------------+ | Fit Measures for Binomial Choice Model | | Probit model for variable MODE | +----------------------------------------+ | Proportions P0= .723810 P1= .276190 | | N = 210 N0= 152 N1= 58 | | LogL = -84.09172 LogL0 = -123.7570 | | Estrella = 1-(L/L0)^(-2L0/n) = .36583 | +----------------------------------------+ | Efron | McFadden | Ben./Lerman | | .45620 | .32051 | .75897 | | Cramer | Veall/Zim. | Rsqrd_ML | | .40834 | .50682 | .31461 | +----------------------------------------+ | Information Akaike I.C. Schwarz I.C. | | Criteria .83897 189.57187 | +----------------------------------------+

Predicting the Outcome • Predicted probabilities P = F(a + b1Cost + b2Time + cIncome) • Predicting outcomes • Predict y=1 if P is large • Use 0.5 for “large” (more likely than not) • Count successes and failures

Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] 81 .00000 .00000 .0000 -3.3944 .0325 85 .00000 .00000 .0000 -2.1901 .1006 89 1.0000 .00000 1.0000 -2.6766 .0644 93 1.0000 1.0000 .0000 .8113 .6924 97 1.0000 1.0000 .0000 2.6845 .9361 101 1.0000 1.0000 .0000 2.4457 .9202 105 1.0000 .00000 1.0000 -3.2204 .0384 109 1.0000 1.0000 .0000 .0311 .5078 113 .00000 .00000 .0000 -2.1704 .1024 117 .00000 .00000 .0000 -3.3729 .0332 445 .00000 1.0000 -1.0000 .0295 .5074 Note two types of errors and two types of successes.

Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*

ROC Curve • Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted • 450 is no fit. Curvature implies fit. • Area under the curve compares models

Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 151 1 | 152 1 20 38 | 58 ------ ---------- + ----- Total 171 39 | 210

Discrete Choice Modeling