Models with limited dependent variables
Download
1 / 78

Models with limited dependent variables - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Models with limited dependent variables. Doctoral Program 2006-2007 Katia Campo. Introduction. Discrete Choice Models. Truncated/ Censored Regr.Models. Duration (Hazard) Models. Limited dependent variables. Discrete dependent variable. Continuous dependent variable. Truncated,

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Models with limited dependent variables' - vincent-warner


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Models with limited dependent variables

Models with limited dependent variables

Doctoral Program 2006-2007

Katia Campo



Limited dependent variables

Discrete

Choice

Models

Truncated/

Censored

Regr.Models

Duration

(Hazard)

Models

Limited dependent variables

Discrete dependent variable

Continuous dependent variable

Truncated,

Censored


Discrete choice models
Discrete choice models

  • Choice between different options (j)

    • Single Choice (binary choice models)

      e.g. Buy a product or not, follow higher education or not, ...

    • j=1 (yes/accept) or 0 (no/reject)

    • Multiple Choice (multinomial choice models),

      e.g. cars, stores, transportation modes

    • j=1(opt.1), 2(opt.2), ....., J(opt.J)


Truncated censored regression models
Truncated/censored regression models

  • Truncated variable:

    observed only beyond a certain threshold level (‘truncation point’)

    e.g. store expenditures, income

  • Censored variables:

    values in a certain range are all transformed to (or reported as) a single value (Greene, p.761)

    e.g. demand (stockouts, unfullfilled demand), hours worked


Duration hazard models
Duration/Hazard models

  • Time between two events, e.g.

    • Time between two purchases

    • Time until a consumer becomes inactive/cancels a subscription

    • Time until a consumer responds to direct mail/ a questionnaire

    • ...



Overview
Overview

  • Part I. Discrete Choice Models

  • Part II. Censored and Truncated Regression Models

  • Part III. Duration Models


Recommended literature
Recommended Literature

  • Kenneth Train, Discrete Choice Methods with Simulation, Cambridge University Press, 2003 (Part I)

  • Ph.H.Franses and R.Paap, Quantitative Models in Market Research, Cambridge University Press, 2001 (Part I-II-III; Data: www.few.eur.nl/few/people/paap)

  • D.A.Hensher, J.M.Rose and W.H.Greene, Applied Choice Analysis, Cambridge University Press, 2005 (Part I)



Overview part i dcm
Overview Part I, DCM

  • Properties of DCM

  • Estimation of DCM

  • Types of Discrete Choice Models

    • Binary Logit Model

    • Multinomial Logit Model

    • Nested logit model

    • Probit Model

    • Ordered Logit Model

  • Heterogeneity


Notation
Notation

  • n = decision maker

  • i,j = choice options

  • y = decision outcome

  • x = explanatory variables

  •  = parameters

  •  = error term

  • I[.] = indicator function, equal to 1 if expression within brackets is true, 0 otherwise

    e.g. I[y=j|x] = 1 if j was selected (given x), equal to 0 otherwise


A properties of dcm
A. Properties of DCM

Kenneth Train

  • Characteristics of the choice set

    • Alternatives must be mutually exclusive

      no combination of choice alternatives

      (e.g. different brands, combination of diff. transportation modes)

    • Choice set must be exhaustive

      i.e., include all relevant alternatives

    • Finite number of alternatives


A properties of dcm1
A. Properties of DCM

Kenneth Train

  • Random utility maximization

    Ass: decision maker selects the alternative that provides the highest utility,

    i.e. Selects i if Uni > Unj j  i

    Decomposition of utility into a deterministic (observed) and random (unobserved) part

    Unj = Vnj + nj


A properties of dcm2
A. Properties of DCM

Kenneth Train

  • Random utility maximization


A properties of dcm3
A. Properties of DCM

Kenneth Train

  • Identification problems

    • Only differences in utility matter

      Choice probabilities do not change when a constant is added to each alternative’s utility

    • Implication

      Some parameters cannot be identified/estimatedAlternative-specific constants; Coefficients of variables that change over decision makers but not over alternatives

      Normalization of parameter(s)


A properties of dcm4
A. Properties of DCM

Kenneth Train

  • Identification problems

    • Overall scale of utility is irrelevant

      Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor

    • Implication

      Coefficients of  models (data sets) are not directly comparable

      Normalization (var.of error terms)


A properties of dcm5
A. Properties of DCM

Kenneth Train

  • Aggregation

    Biased estimates when aggregate values of the explanatory variables are used as input

    Consistent estimates can be obtained by sample enumeration

    - compute prob./elasticity for each dec.maker

    - compute (weighted) average of these values

Swait and Louvière(1993), Andrews and Currim (2002)


Properties of dcm
Properties of DCM

Keneth Train

  • Aggregation


B estimation dcm
B. Estimation DCM

  • Numerical maximization (ML-estimation)

  • Simulation-assisted estimation

  • Bayesian estimation

(see Train)


B ml estimation
B. ML-estimation

  • Objective: “find those parameter values most likely to have produced the sample observations” (Judge et al.)

  • Likelihood for one observation: Pn(X,)

  • Likelihoodfunction

    L() = nPn(X,)

  • Loglikelihood

    LL() =  n ln(Pn(X,))


B ml estimation1
B. ML Estimation

Determine for which LL() reaches its max

  • First derivative = 0  no closed-form solution

  • Iterative procedure:

    • Starting values 0

    • Determine new value t+1 for which LL(t+1) > LL(t)

    • Repeat procedure ii until convergence (small change in LL())



B ml estimation3
B. ML Estimation

- Direction and step size t → t+1 ?

based on taylor approximation of LL(t+1) (with base (t))

LL(t+1) = LL(t)+(t+1- t)’gt+1/2(t+1- t)’Ht (t+1- t) [1]

with


B ml estimation4
B. ML Estimation

- Direction and step size t → t+1 ?

Optimization of [1] leads to:

 Computation of the Hessian may cause problems


B ml estimation5
B. ML Estimation

Alternatives procedures:

  • Approximations to the Hessian

  • Other procedures, such as steepest-ascent

See e.g. Train, Judge et al.(1985)


B ml estimation6
B. ML Estimation

Properties ML estimator

Consistency

Asymptotic Normality

Asymptotic Efficiency

See e.g. Greene (ch.17), Judge et al.


B diagnostics and model selection
B.Diagnostics and Model Selection

  • Goodness-of-Fit

    • Joint significance of explanatory var’s

      LR-test : LR = -2(LL(0) - LL())

      LR ~ ²(k)

    • Pseudo R² = 1 - LL()

      LL(0)


B diagnostics and model selection1
B.Diagnostics and Model Selection

  • Goodness-of-Fit

    • Akaike Information Criterion

      AIC = 1/N (-2LL() +2k)

    • CAIC = -2LL() + k(log(N)+1)

    • BIC = 1/N (-2LL() + k log(N))

    • sometimes conflicting results


B diagnostics and model selection2
B.Diagnostics and Model Selection

  • Model selection based on GoF

    • Nested models : LR-test

      LR = -2(LL(r) - LL(ur))

      r=restricted model; ur=unrestricted (full) model

      LR ~ ²(k) (k=difference in # of parameters)

    • Non-nested models

      AIC, CAIC, BIC  lowest value


C discrete choice models
C. Discrete Choice Models

  • Binary Logit Model

  • Multinomial Logit Model

  • Nested logit model

  • Probit Model

  • Ordered Logit Model


1 binary logit model
1. Binary Logit Model

  • Choice between 2 alternatives

  • Often ‘accept/reject’ or ‘yes/no’ decisions

    • E.g. Purchase incidence: make a purchase in the category or not

  • Dep. var. yn = 1, if option is selected

    = 0, if option is not selected

  • Model: P(yn=1| xn)


1 binary logit model1
1. Binary Logit Model

  • Based on the general RUM-model

  • Ass.: error terms are iid and follow an extreme value or Gumbel distribution


1 binary logit model2
1. Binary Logit Model

  • Based on the general RUM-model

  • Pn =  I[β’xn + εn > 0] f(ε) dε

    =  I[εn > -β’xn] f(ε) dε

    = ε=-β’x f(ε) dε

    = 1 – F(- β’xn)

    = 1 – 1/(1+exp(β’xn))

    = exp(β’xn)/(1+exp(β’xn))

    Ass.: error terms are iid and follow an extreme value/Gumbel distr.


1 binary logit model3
1. Binary Logit Model

  • Leads to the following expression for the logit choice probability


1 binary logit model4
1. Binary Logit Model

Properties

  • Nonlinear effect of explanatory var’s on dependent variable

  • Logistic curve with inflection point at P=0.5



1 binary logit model6
1. Binary Logit Model

Effect of explanatory variables ?

For

Quasi-elasticity


1 binary logit model7
1. Binary Logit Model

Effect of explanatory variables ?

For

Odds ratio is equal to


1 binary logit model8
1. Binary Logit Model

Estimation: ML

  • Likelihoodfunction L()

    = nP(yn=1|x,)yn (1- P(yn=1|x,))1-yn

  • Loglikelihood LL()

    =  n yn ln(P(yn=1|x,) )+

    (1-yn) ln(1- P(yn=1|x,))


1 binary logit model9
1. Binary Logit Model

  • Forecasting accuracy

    • Predictions : yn=1 if F(Xn ) > c (e.g. 0.5)

      yn=0 if F(Xn )  c

    • Compute hit rate = % of correct predictions


1 binary logit model10
1. Binary Logit Model

Example: Purchase Incidence Model

ptn(inc) = probability that household n engages

in a category purchase in the store

on purchase occasion t,

Wtn = the utility of the purchase option.

Bucklin and Gupta (1992)


1 binary logit model11
1. Binary Logit Model

Example: Purchase Incidence Model

With

CRn = rate of consumption for household n

INVnt = inventory level for household n, time t

CVnt= category value for household n, time t

Bucklin and Gupta (1992)


1 binary logit model12
1. Binary Logit Model

  • Data

    • A.C.Nielsen scanner panel data

    • 117 weeks: 65 for initialization, 52 for estimation

    • 565 households: 300 selected randomly for estimation, remaining hh = holdout sample for validation

    • Data set for estimation: 30.966 shopping trips, 2275 purchases in the category (liquid laundry detergent)

    • Estimation limited to the 7 top-selling brands (80% of category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)

Bucklin and Gupta (1992)


1 binary logit model13
1. Binary Logit Model

Goodness-of-Fit


1 binary logit model14
1. Binary Logit Model

Parameter estimates


Variable Coefficient Std. Error z-Statistic Prob.

C 0.222121 0.668483 0.332277 0.7397

DISPLHEINZ 0.573389 0.239492 2.394186 0.0167

DISPLHUNTS -0.557648 0.247440 -2.253674 0.0242

FEATHEINZ 0.505656 0.313898 1.610896 0.1072

FEATHUNTS -1.055859 0.349108 -3.024445 0.0025

FEATDISPLHEINZ 0.428319 0.438248 0.977344 0.3284

FEATDISPLHUNTS -1.843528 0.468883 -3.931748 0.0001

PRICEHEINZ -135.1312 10.34643 -13.06066 0.0000

PRICEHUNTS 222.6957 19.06951 11.67810 0.0000

Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)


Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)

Mean dependent var 0.890279 S.D. dependent var 0.312598

S.E. of regression 0.271955 Akaike info criterion 0.504027

Sum squared resid 206.2728 Schwarz criterion 0.523123

Log likelihood -696.1344Hannan-Quinn criter. 0.510921

Restr. log likelihood -967.918Avg. log likelihood -0.248797

LR statistic (8 df) 543.5673 McFadden R-squared 0.280792

Probability(LR stat) 0.000000

Obs with Dep=0 307 Total obs 2798

Obs with Dep=1 2491


Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)


Binary Logit Model (Franses and Paap: www.few.eur.nl/few/people/paap)


2 multinomial logit model
2. Multinomial Logit Model

  • Choice between J>2 categories

  • Dependent variable yn = 1, 2, 3, .... J

  • Explanatory variables

    • Different across individuals, not across categories (standard MNL model)

    • Different across (individuals and) categories (conditional MNL model)

  • Model: P(yn=j|Xn)


2. Multinomial Logit Model

  • Based on the general RUM-model

  • Ass.: error terms are iid following an extreme value or Gumbel distribution


2 multinomial logit model1
2. Multinomial Logit Model

  • Identification problem  select reference category and set coeffients equal to 0


2. Multinomial Logit Model

  • Conditional MNL model


2 multinomial logit model2
2. Multinomial Logit Model

  • Interpretation of parameters

    • Derivative (marginal effect)

    • Cross-effects

(Traditional MNL model, see Franses en Paap p.80)


2 multinomial logit model3
2. Multinomial Logit Model

  • Interpretation of parameters

    • Overall effect


2 multinomial logit model4
2. Multinomial Logit Model

  • Interpretation of parameters

    • Probability-ratio

    • Does not depend on the other alternatives!


2 multinomial logit model5
2. Multinomial Logit Model

  • Estimation

    • ML estimation

(znj=1 if j is selected, 0 otherwise)


2 multinomial logit model6
2. Multinomial Logit Model

  • Estimation

    • Alternative estimation procedures

      Simulation-assisted estimation (Train, Ch.10)

      Bayesian estimation (Train, Ch.12)


2 multinomial logit model7
2. Multinomial Logit Model

  • Example

Bucklin and Gupta (1992)


2 multinomial logit model8
2. Multinomial Logit Model

  • Variables

    • Ui = constant for brand-size i

    • BLhi = loyalty of household h to brand of brandsize i

    • LBPhit = 1 if i was last brand purchased, 0 otherwise

    • SLhi = loyalty of household h to size of brandsize i

    • LSPhit = 1 if i was last size purchased, 0 otherwise

    • Priceit = actual shelf price of brand-size i at time t

    • Promoit = promotional status of brand-size i at time t

Bucklin and Gupta (1992)


2 multinomial logit model9
2. Multinomial Logit Model

  • Data

    • A.C.Nielsen scanner panel data

    • 117 weeks: 65 for initialization, 52 for estimation

    • 565 households: 300 selected randomly for estimation, remaining hh = holdout sample for validation

    • Data set for estimation: 30.966 shopping trips, 2275 purchases in the category (liquid laundry detergent)

    • Estimation limited to the 7 top-selling brands (80% of category purchases), representing 28 brand-size combinations (= level of analysis for the choice model)

Bucklin and Gupta (1992)


2 multinomial logit model10
2. Multinomial Logit Model

Goodness-of-Fit

Bucklin and Gupta (1992)


2 multinomial logit model11
2. Multinomial Logit Model

Estimation Results

Bucklin and Gupta (1992)


2 multinomial logit model12
2. Multinomial Logit Model

Scale parameter

  • Variance of the extreme value distribution = ²/6

  • If true utility is U*nj = *’xnj + *nj with var(*nj)= ² (²/6), the estimated representative utility Vnj = ’xnj involves a rescaling of * → = * / 

  • * and  can not be estimated separately

  • take into account that the estimated coeffients indicate the variable’s effect relative to the variance of unobserved factors

  • Include scale parameters if subsamples in a pooled estimation (may) have different error variances


2 multinomial logit model13
2. Multinomial Logit Model

Scale parameter in case of pooled estimation of subsamples with different error variance

  • For each subsample s, multiply utility by µs, which is estimated simultaneously with 

  • Normalization: set µs equal to 1 for 1 subs.

  • Values of µs reflect diff’s in error variation

    • µs>1 : error variance is smaller in s than in the reference subsample

    • µs<1 : error variance is larger in s than in the reference subsample

Swait and Louviere (1993), Andrews and Currim (2002)


2 multinomial logit model14
2. Multinomial Logit Model

  • Example

  • Data from online experiment, 2 product categories

  • Three diff.assortments, assigned to diff.respondent groups

    • Assortment 1: small assortment

    • Assortment 2 = ass.1 extended with add.brands

    • Assortment 3 = ass.1 extended with add types

  • Explanatory variables are the same (hh char’s, MM), with exception of the constants

  • A scale factor is introduced for assortment 2 and 3 (assortment 1 is reference with scale factor =1)

Breugelmans et al (2005)


2 multinomial logit model15

Table 1: Descriptives for each assortment (margarine and cereals)

MARGARINE

Attribute

Assortment 1

(limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand

Common a

Common

Common

Add new brands

Flavor

Common

Common

Common

Add new flavors

# alternatives

11

19

17

# respondents

105

116

100

# purchase occasions

275

279

278

# screens needed

< 1

> 1

> 1

CEREALS

Attribute

Assortment 1

(limited)

Assortment 2 (add new flavors of existing brands)

Assortment 3 (add new brands of existing flavors)

Brand

Common

Common

Common

Add new brands

Flavor

Common

Common

Common

Add new flavors

# alternatives

21

32

46

# respondents

81

97

87

# purchase occasions

271

261

281

# screens needed

> 1

> 1

> 1

2. Multinomial Logit Model

Breugelmans et al (2005)

a common refers to attribute levels that are present in all three assortments


2 multinomial logit model16
2. Multinomial Logit Model cereals)

  • MNL-model – Pooled estimation

  • Phit,a= the probability that household h chooses item i at time t, facing assortment a

  • uhit,a= the choice utility of item i for household h facing assortment a

    = f(household variables, MM-variables)

  • Cha= set of category items available to household h within assortment a

  • µa = Gumbel scale factor

Breugelmans et al, based on Andrews and Currim 2002; Swait and Louvière 1993


2 multinomial logit model17
2. Multinomial Logit Model cereals)

Estimation results

  • Goodness-of-Fit

    • (average) LL: -0.045 (M), -0.040 (C)

    • BIC: 2929 (M), 4763(C)

    • CAIC: 2871 (M), 4699 (C)

  • Scale factors:

    • M: 1.2498 (ass2), 1.2627 (ass3)

    • C: 1.0562 (ass2), 0.7573 (ass3)

Breugelmans et al (2005)


2 multinomial logit model18
2. Multinomial Logit Model cereals)

(Excluding brand/size constants)

Breugelmans et al (2005)


2 multinomial logit model19
2. Multinomial Logit Model cereals)

  • Limitations of the MNL model:

    • Independence of Irrelevant Alternatives (proportional substitution pattern)

    • Order (where relevant) is not taken into account

    • Systematic taste variation can be represented, not random taste variation

    • No correlation between error terms (iid errors)


2 multinomial logit model20
2. Multinomial Logit Model cereals)

  • Independence of irrelevant alternatives

  • Ratio of choice probabilities for 2 alternatives i and j does not depend on other alternatives (see above)

  • Implication: proportional substitution patterns

  • Cf. Blue Bus – Red Bus Example

    • T1: Blue bus (P=50%), Car (P=50%)

    • T2: Blue bus (P=33%), Car (P=33%),Red bus (P=33%)


2 multinomial logit model21
2. Multinomial Logit Model cereals)

  • Independence of irrelevant alternatives

    New alternatives – or alternatives for which utility has increased - draw proportionally from all other alternatives

  • Elasticity of Pni wrt variable xnj


2 multinomial logit model22
2. Multinomial Logit Model cereals)

  • Independence of irrelevant alternatives

    Hausman-McFadden specification test

Basic idea: if a subset of the choice set is truly irrelevant, omitting it should not significantly affect the estimates.


2 multinomial logit model23
2. Multinomial Logit Model cereals)

  • Independence of irrelevant alternatives

    Hausman-McFadden specification test

    Procedure:

    -Estimate logit model twice:

    a. on full set of alternatives

    b. on subset of alternatives

    (and subsample with choices from this set) -When IIA is true,


2 multinomial logit model24
2. Multinomial Logit Model cereals)

  • Independence of irrelevant alternatives

    Alternative Procedure:

    -Estimate logit model twice:

    a. on full set of alternatives

    b. on subset of alternatives

    (and subsample with choices from this set)

    - compute LL for subset b with parameters

    obtained for set a

    - Compare with LLb: GoF should be similar


2 multinomial logit model25
2. Multinomial Logit Model cereals)

  • Solutions to IIA

    • Model with attribute-specific constants (intrinsic preferences)

    • Nested Logit Model

    • Models that allow for correlation among the error terms, such as Probit Models


ad