Loading in 5 sec....

Models with limited dependent variablesPowerPoint Presentation

Models with limited dependent variables

- 100 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Models with limited dependent variables' - vincent-warner

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Choice

Models

Truncated/

Censored

Regr.Models

Duration

(Hazard)

Models

Limited dependent variablesDiscrete dependent variable

Continuous dependent variable

Truncated,

Censored

Discrete choice models

- Choice between different options (j)
- Single Choice (binary choice models)
e.g. Buy a product or not, follow higher education or not, ...

- j=1 (yes/accept) or 0 (no/reject)
- Multiple Choice (multinomial choice models),
e.g. cars, stores, transportation modes

- j=1(opt.1), 2(opt.2), ....., J(opt.J)

- Single Choice (binary choice models)

Truncated/censored regression models

- Truncated variable:
observed only beyond a certain threshold level (‘truncation point’)

e.g. store expenditures, income

- Censored variables:
values in a certain range are all transformed to (or reported as) a single value (Greene, p.761)

e.g. demand (stockouts, unfullfilled demand), hours worked

Duration/Hazard models

- Time between two events, e.g.
- Time between two purchases
- Time until a consumer becomes inactive/cancels a subscription
- Time until a consumer responds to direct mail/ a questionnaire
- ...

Need to use adjusted models: Illustration

Frances and Paap (2001)

Overview

- Part I. Discrete Choice Models
- Part II. Censored and Truncated Regression Models
- Part III. Duration Models

Recommended Literature

- Kenneth Train, Discrete Choice Methods with Simulation, Cambridge University Press, 2003 (Part I)
- Ph.H.Franses and R.Paap, Quantitative Models in Market Research, Cambridge University Press, 2001 (Part I-II-III; Data: www.few.eur.nl/few/people/paap)
- D.A.Hensher, J.M.Rose and W.H.Greene, Applied Choice Analysis, Cambridge University Press, 2005 (Part I)

Overview Part I, DCM

- Properties of DCM
- Estimation of DCM
- Types of Discrete Choice Models
- Binary Logit Model
- Multinomial Logit Model
- Nested logit model
- Probit Model
- Ordered Logit Model

- Heterogeneity

Notation

- n = decision maker
- i,j = choice options
- y = decision outcome
- x = explanatory variables
- = parameters
- = error term
- I[.] = indicator function, equal to 1 if expression within brackets is true, 0 otherwise
e.g. I[y=j|x] = 1 if j was selected (given x), equal to 0 otherwise

A. Properties of DCM

Kenneth Train

- Characteristics of the choice set
- Alternatives must be mutually exclusive
no combination of choice alternatives

(e.g. different brands, combination of diff. transportation modes)

- Choice set must be exhaustive
i.e., include all relevant alternatives

- Finite number of alternatives

- Alternatives must be mutually exclusive

A. Properties of DCM

Kenneth Train

- Random utility maximization
Ass: decision maker selects the alternative that provides the highest utility,

i.e. Selects i if Uni > Unj j i

Decomposition of utility into a deterministic (observed) and random (unobserved) part

Unj = Vnj + nj

A. Properties of DCM

Kenneth Train

- Identification problems
- Only differences in utility matter
Choice probabilities do not change when a constant is added to each alternative’s utility

- Implication
Some parameters cannot be identified/estimatedAlternative-specific constants; Coefficients of variables that change over decision makers but not over alternatives

Normalization of parameter(s)

- Only differences in utility matter

A. Properties of DCM

Kenneth Train

- Identification problems
- Overall scale of utility is irrelevant
Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor

- Implication
Coefficients of models (data sets) are not directly comparable

Normalization (var.of error terms)

- Overall scale of utility is irrelevant

A. Properties of DCM

Kenneth Train

- Aggregation
Biased estimates when aggregate values of the explanatory variables are used as input

Consistent estimates can be obtained by sample enumeration

- compute prob./elasticity for each dec.maker

- compute (weighted) average of these values

Swait and Louvière(1993), Andrews and Currim (2002)

B. Estimation DCM

- Numerical maximization (ML-estimation)
- Simulation-assisted estimation
- Bayesian estimation

(see Train)

B. ML-estimation

- Objective: “find those parameter values most likely to have produced the sample observations” (Judge et al.)
- Likelihood for one observation: Pn(X,)
- Likelihoodfunction
L() = nPn(X,)

- Loglikelihood
LL() = n ln(Pn(X,))

B. ML Estimation

Determine for which LL() reaches its max

- First derivative = 0 no closed-form solution
- Iterative procedure:
- Starting values 0
- Determine new value t+1 for which LL(t+1) > LL(t)
- Repeat procedure ii until convergence (small change in LL())

B. ML Estimation

- Direction and step size t → t+1 ?

based on taylor approximation of LL(t+1) (with base (t))

LL(t+1) = LL(t)+(t+1- t)’gt+1/2(t+1- t)’Ht (t+1- t) [1]

with

B. ML Estimation

- Direction and step size t → t+1 ?

Optimization of [1] leads to:

Computation of the Hessian may cause problems

B. ML Estimation

Alternatives procedures:

- Approximations to the Hessian
- Other procedures, such as steepest-ascent

See e.g. Train, Judge et al.(1985)

B. ML Estimation

Properties ML estimator

Consistency

Asymptotic Normality

Asymptotic Efficiency

See e.g. Greene (ch.17), Judge et al.

B.Diagnostics and Model Selection

- Goodness-of-Fit
- Joint significance of explanatory var’s
LR-test : LR = -2(LL(0) - LL())

LR ~ ²(k)

- Pseudo R² = 1 - LL()
LL(0)

- Joint significance of explanatory var’s

B.Diagnostics and Model Selection

- Goodness-of-Fit
- Akaike Information Criterion
AIC = 1/N (-2LL() +2k)

- CAIC = -2LL() + k(log(N)+1)
- BIC = 1/N (-2LL() + k log(N))
- sometimes conflicting results

- Akaike Information Criterion

B.Diagnostics and Model Selection

- Model selection based on GoF
- Nested models : LR-test
LR = -2(LL(r) - LL(ur))

r=restricted model; ur=unrestricted (full) model

LR ~ ²(k) (k=difference in # of parameters)

- Non-nested models
AIC, CAIC, BIC lowest value

- Nested models : LR-test

C. Discrete Choice Models

- Binary Logit Model
- Multinomial Logit Model
- Nested logit model
- Probit Model
- Ordered Logit Model

1. Binary Logit Model

- Choice between 2 alternatives
- Often ‘accept/reject’ or ‘yes/no’ decisions
- E.g. Purchase incidence: make a purchase in the category or not

- Dep. var. yn = 1, if option is selected
= 0, if option is not selected

- Model: P(yn=1| xn)

1. Binary Logit Model

- Based on the general RUM-model
- Ass.: error terms are iid and follow an extreme value or Gumbel distribution

1. Binary Logit Model

- Based on the general RUM-model
- Pn = I[β’xn + εn > 0] f(ε) dε
= I[εn > -β’xn] f(ε) dε

= ε=-β’x f(ε) dε

= 1 – F(- β’xn)

= 1 – 1/(1+exp(β’xn))

= exp(β’xn)/(1+exp(β’xn))

Ass.: error terms are iid and follow an extreme value/Gumbel distr.

1. Binary Logit Model

- Leads to the following expression for the logit choice probability

1. Binary Logit Model

Properties

- Nonlinear effect of explanatory var’s on dependent variable
- Logistic curve with inflection point at P=0.5

1. Binary Logit Model

Estimation: ML

- Likelihoodfunction L()
= nP(yn=1|x,)yn (1- P(yn=1|x,))1-yn

- Loglikelihood LL()
= n yn ln(P(yn=1|x,) )+

(1-yn) ln(1- P(yn=1|x,))

1. Binary Logit Model

- Forecasting accuracy
- Predictions : yn=1 if F(Xn ) > c (e.g. 0.5)
yn=0 if F(Xn ) c

- Compute hit rate = % of correct predictions

- Predictions : yn=1 if F(Xn ) > c (e.g. 0.5)

1. Binary Logit Model

Example: Purchase Incidence Model

ptn(inc) = probability that household n engages

in a category purchase in the store

on purchase occasion t,

Wtn = the utility of the purchase option.

Bucklin and Gupta (1992)

1. Binary Logit Model

Example: Purchase Incidence Model

With

CRn = rate of consumption for household n

INVnt = inventory level for household n, time t

CVnt= category value for household n, time t

Bucklin and Gupta (1992)

1. Binary Logit Model

- Data
- A.C.Nielsen scanner panel data
- 117 weeks: 65 for initialization, 52 for estimation
- 565 households: 300 selected randomly for estimation, remaining hh = holdout sample for validation
- Data set for estimation: 30.966 shopping trips, 2275 purchases in the category (liquid laundry detergent)
- Estimation limited to the 7 top-selling brands (80% of category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)

Bucklin and Gupta (1992)

1. Binary Logit Model

Goodness-of-Fit

1. Binary Logit Model

Parameter estimates

Variable Coefficient Std. Error z-Statistic Prob.

C 0.222121 0.668483 0.332277 0.7397

DISPLHEINZ 0.573389 0.239492 2.394186 0.0167

DISPLHUNTS -0.557648 0.247440 -2.253674 0.0242

FEATHEINZ 0.505656 0.313898 1.610896 0.1072

FEATHUNTS -1.055859 0.349108 -3.024445 0.0025

FEATDISPLHEINZ 0.428319 0.438248 0.977344 0.3284

FEATDISPLHUNTS -1.843528 0.468883 -3.931748 0.0001

PRICEHEINZ -135.1312 10.34643 -13.06066 0.0000

PRICEHUNTS 222.6957 19.06951 11.67810 0.0000

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)

Mean dependent var 0.890279 S.D. dependent var 0.312598

S.E. of regression 0.271955 Akaike info criterion 0.504027

Sum squared resid 206.2728 Schwarz criterion 0.523123

Log likelihood -696.1344Hannan-Quinn criter. 0.510921

Restr. log likelihood -967.918Avg. log likelihood -0.248797

LR statistic (8 df) 543.5673 McFadden R-squared 0.280792

Probability(LR stat) 0.000000

Obs with Dep=0 307 Total obs 2798

Obs with Dep=1 2491

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)

2. Multinomial Logit Model

- Choice between J>2 categories
- Dependent variable yn = 1, 2, 3, .... J
- Explanatory variables
- Different across individuals, not across categories (standard MNL model)
- Different across (individuals and) categories (conditional MNL model)

- Model: P(yn=j|Xn)

2. Multinomial Logit Model

- Based on the general RUM-model
- Ass.: error terms are iid following an extreme value or Gumbel distribution

2. Multinomial Logit Model

- Identification problem select reference category and set coeffients equal to 0

- Conditional MNL model

2. Multinomial Logit Model

- Interpretation of parameters
- Derivative (marginal effect)
- Cross-effects

(Traditional MNL model, see Franses en Paap p.80)

2. Multinomial Logit Model

- Interpretation of parameters
- Overall effect

2. Multinomial Logit Model

- Interpretation of parameters
- Probability-ratio
- Does not depend on the other alternatives!

2. Multinomial Logit Model

- Estimation
- Alternative estimation procedures
Simulation-assisted estimation (Train, Ch.10)

Bayesian estimation (Train, Ch.12)

- Alternative estimation procedures

2. Multinomial Logit Model

- Variables
- Ui = constant for brand-size i
- BLhi = loyalty of household h to brand of brandsize i
- LBPhit = 1 if i was last brand purchased, 0 otherwise
- SLhi = loyalty of household h to size of brandsize i
- LSPhit = 1 if i was last size purchased, 0 otherwise
- Priceit = actual shelf price of brand-size i at time t
- Promoit = promotional status of brand-size i at time t

Bucklin and Gupta (1992)

2. Multinomial Logit Model

- Data
- A.C.Nielsen scanner panel data
- 117 weeks: 65 for initialization, 52 for estimation
- 565 households: 300 selected randomly for estimation, remaining hh = holdout sample for validation
- Data set for estimation: 30.966 shopping trips, 2275 purchases in the category (liquid laundry detergent)
- Estimation limited to the 7 top-selling brands (80% of category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)

Bucklin and Gupta (1992)

2. Multinomial Logit Model

Scale parameter

- Variance of the extreme value distribution = ²/6
- If true utility is U*nj = *’xnj + *nj with var(*nj)= ² (²/6), the estimated representative utility Vnj = ’xnj involves a rescaling of * → = * /
- * and can not be estimated separately
- take into account that the estimated coeffients indicate the variable’s effect relative to the variance of unobserved factors
- Include scale parameters if subsamples in a pooled estimation (may) have different error variances

2. Multinomial Logit Model

Scale parameter in case of pooled estimation of subsamples with different error variance

- For each subsample s, multiply utility by µs, which is estimated simultaneously with
- Normalization: set µs equal to 1 for 1 subs.
- Values of µs reflect diff’s in error variation
- µs>1 : error variance is smaller in s than in the reference subsample
- µs<1 : error variance is larger in s than in the reference subsample

Swait and Louviere (1993), Andrews and Currim (2002)

2. Multinomial Logit Model

- Example
- Data from online experiment, 2 product categories
- Three diff.assortments, assigned to diff.respondent groups
- Assortment 1: small assortment
- Assortment 2 = ass.1 extended with add.brands
- Assortment 3 = ass.1 extended with add types

- Explanatory variables are the same (hh char’s, MM), with exception of the constants
- A scale factor is introduced for assortment 2 and 3 (assortment 1 is reference with scale factor =1)

Breugelmans et al (2005)

Table 1: Descriptives for each assortment (margarine and cereals)

MARGARINE

Attribute

Assortment 1

(limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand

Common a

Common

Common

Add new brands

Flavor

Common

Common

Common

Add new flavors

# alternatives

11

19

17

# respondents

105

116

100

# purchase occasions

275

279

278

# screens needed

< 1

> 1

> 1

CEREALS

Attribute

Assortment 1

(limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand

Common

Common

Common

Add new brands

Flavor

Common

Common

Common

Add new flavors

# alternatives

21

32

46

# respondents

81

97

87

# purchase occasions

271

261

281

# screens needed

> 1

> 1

> 1

2. Multinomial Logit ModelBreugelmans et al (2005)

a common refers to attribute levels that are present in all three assortments

2. Multinomial Logit Model cereals)

- MNL-model – Pooled estimation
- Phit,a= the probability that household h chooses item i at time t, facing assortment a
- uhit,a= the choice utility of item i for household h facing assortment a
= f(household variables, MM-variables)

- Cha= set of category items available to household h within assortment a
- µa = Gumbel scale factor

Breugelmans et al, based on Andrews and Currim 2002; Swait and Louvière 1993

2. Multinomial Logit Model cereals)

Estimation results

- Goodness-of-Fit
- (average) LL: -0.045 (M), -0.040 (C)
- BIC: 2929 (M), 4763(C)
- CAIC: 2871 (M), 4699 (C)

- Scale factors:
- M: 1.2498 (ass2), 1.2627 (ass3)
- C: 1.0562 (ass2), 0.7573 (ass3)

Breugelmans et al (2005)

2. Multinomial Logit Model cereals)

- Limitations of the MNL model:
- Independence of Irrelevant Alternatives (proportional substitution pattern)
- Order (where relevant) is not taken into account
- Systematic taste variation can be represented, not random taste variation
- No correlation between error terms (iid errors)

2. Multinomial Logit Model cereals)

- Independence of irrelevant alternatives
- Ratio of choice probabilities for 2 alternatives i and j does not depend on other alternatives (see above)
- Implication: proportional substitution patterns
- Cf. Blue Bus – Red Bus Example
- T1: Blue bus (P=50%), Car (P=50%)
- T2: Blue bus (P=33%), Car (P=33%),Red bus (P=33%)

2. Multinomial Logit Model cereals)

- Independence of irrelevant alternatives
New alternatives – or alternatives for which utility has increased - draw proportionally from all other alternatives

- Elasticity of Pni wrt variable xnj

2. Multinomial Logit Model cereals)

- Independence of irrelevant alternatives
Hausman-McFadden specification test

Basic idea: if a subset of the choice set is truly irrelevant, omitting it should not significantly affect the estimates.

2. Multinomial Logit Model cereals)

- Independence of irrelevant alternatives
Hausman-McFadden specification test

Procedure:

-Estimate logit model twice:

a. on full set of alternatives

b. on subset of alternatives

(and subsample with choices from this set) -When IIA is true,

2. Multinomial Logit Model cereals)

- Independence of irrelevant alternatives
Alternative Procedure:

-Estimate logit model twice:

a. on full set of alternatives

b. on subset of alternatives

(and subsample with choices from this set)

- compute LL for subset b with parameters

obtained for set a

- Compare with LLb: GoF should be similar

2. Multinomial Logit Model cereals)

- Solutions to IIA
- Model with attribute-specific constants (intrinsic preferences)
- Nested Logit Model
- Models that allow for correlation among the error terms, such as Probit Models

Download Presentation

Connecting to Server..