1 / 34

Topic 3

Topic 3. Multinomial Logit Model. Multinomial Logit Model. For modelling categorical dependent variable with more than two categories Survey of 195 undergraduates at the University of Pennsylvania in order to study the effects of parenting styles on altruistic behaviour.

zenda
Download Presentation

Topic 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 3 Multinomial Logit Model

  2. Multinomial Logit Model • For modelling categorical dependent variable with more than two categories • Survey of 195 undergraduates at the University of Pennsylvania in order to study the effects of parenting styles on altruistic behaviour.

  3. Question of interest: If you found a wallet on the street, would you • Keep the wallet and the money • Keep the money and return the wallet • Return both the wallet and the money • The distribution of responses was:

  4. Possible explanatory variables are: MALE: 1=male, 0=female BUSINESS: 1=enrolled in business school, 0=otherwise PUNISH: A variable describing whether the student was physically punished by parents at various ages: 1=punished in elementary school but not middle or high school 2=punished in elementary and middle school but not high school 3=punished at all three levels

  5. EXPLAIN: “When you were punished, did your parents explain what you did was wrong?”1=almost always, 0=sometimes or never Define Pi1 = probability that WALLET = 1 for person i Pi2 = probability that WALLET = 2 for person i Pi3 = probability that WALLET = 3 for person I Let Xi be a column vector of explanatory variables for Person i:

  6. Model is formulated as follows:

  7. But note that

  8. Solving for these three probabilities, we can write One can immediately verify that the sum of the probabilities is 1.

  9. DATA WALLET; INFILE 'D:\TEACHING\MS4225\WALLET.TXT'; input WALLET MALE BUSINESS PUNISH EXPLAIN; PROC CATMOD DATA=WALLET; DIRECT MALE BUSINESS PUNISH EXPLAIN; MODEL WALLET= MALE BUSINESS PUNISH EXPLAIN/NOITER; RUN;

  10. Interpretation of results • ANOVA table gives the Chi-square statistic that the explanatory variable has no effect on the outcome variable e.g. for MALE H0 : b11 = b12 = 0 Likelihood Ratio Statistic Male coefficient in equation 1 Male coefficient in equation 2

  11. The reference category is always the one with the highest value of the dependent variable (so the first equation is a model for 1 vs 3). For example, • for the MALE coefficient, exp(1.27)=3.56 The odds that a male will keep both money and wallet rather than return both are about 3.56 times the odds for female.

  12. For PUNISH, exp(1.08)=2.94 each 1 level increase in PUNISH multiplies the odds of keeping both vs returning both by about 3. • For EXPLAIN, exp(-1.60)=0.25 students whose parents explained their punishment had odds that were only 1/5 the odds for those who parents did not explain • The coefficients in the second column have the same sign but are generally smaller, why?

  13. General Form of the model • Let Pij be the probability that individual i falls into category j, j=1,2,…,J. The model is then where Xi is a column vector of variables describing individual i and bj is a row vector of coefficients for category j. Note that each category is compared with the highest category J.

  14. These equations can be solved to yield and

  15. After the coefficients are estimated, the logit equation for comparing any two categories j and k of the dependent variable can be obtained from

  16. Contingency Table Analysis for Multinomial Logit Model Consider the following table:

  17. DATA afterlif; INPUT white female belief freq; DATALINES; 1 1 1 371 1 1 2 49 1 1 3 74 1 0 1 250 1 0 2 45 1 0 3 71 0 1 1 64 0 1 2 9 0 1 3 15 0 0 1 25 0 0 2 5 0 0 3 13 ; PROC CATMOD DATA=afterlif; WEIGHT freq; DIRECT white female; MODEL belief=white female / NOITER; RUN;

  18. Interpretation of results • The ANOVA table indicates the significance of a gender effect but not a race effect. • The likelihood ratio indicates the model fits very well, which simply means there is no evidence of an interaction between race and gender in their effects on belief in an afterlife. • Analysis of the maximum likelihood estimates reveal that gender effect is entirely concentrated in the yes/no contrast

  19. Exp(0.4186) = 1.52 • Women are about 52% more likely than men to believe in an afterlife

  20. Independence from Irrelevant Alternatives (IIA) Recall in multinomial logit model, and the existence of the other alternatives is irrelevant for these log odds.

  21. The IIA assumption may not be realistic in some situations. Consider the following example: • Let A and B be two mobile telephone service providers • A offers a lower fixed cost but charges a higher price per minute • B offers a higher fixed cost but charges a lower price per minute • Assume that the odds ratio for an individual is 2 in favour of provider A. So P(A) = 2/3 and P(B) = 1/3 • Suppose a third provider C enters the market and offers exactly the same service as B.

  22. if the multinomial logit model holds, then the odds of A versus B will still be 2 because the odds do not depend on the characteristics of other alternatives. • But the odds of A versus C will also be 2 • So • This implies the odds of A versus an alternative with high fixed cost and low variable cost is now equal to 1. i.e., IIA assumption breaks down!!

  23. The Hausman – McFadden Test for IIA assumption We can test to see if the data are consistent with this property of the model. If so, multinomial logit model is okay. If not then we can use another distribution (e.g. multinomial probit) or add more structure to the problem. (e.g. nested logit model)

  24. DATA afterlif; INPUT female belief freq; DATALINES; 1 1 371 1 2 49 1 3 74 0 1 250 0 2 45 0 3 71 1 1 64 1 2 9 1 3 15 0 1 25 0 2 5 0 3 13 ; PROC CATMOD DATA=afterlif; WEIGHT freq; DIRECT female; MODEL belief= female / NOITER COVB; RUN;

  25. DATA afterlif; INPUT female belief freq; DATALINES; 1 1 371 1 3 74 0 1 250 0 3 71 1 1 64 1 3 15 0 1 25 0 3 13 ; proccatmod data=afterlif ; weight freq; direct female; model belief=female/covb; run; PROCgenmod data=afterlif ; freq freq; MODEL belief= female /d=b COVB; RUN;

More Related