1 / 23

Multiple Regression Analysis with Qualitative Information

Multiple Regression Analysis with Qualitative Information. Dummy variables as an independent variable Dummy variable trap Importance of the "reference group" Using dummy variables to test for equal means Dummy variables for Multiple categories Ordinal variables

smithr
Download Presentation

Multiple Regression Analysis with Qualitative Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Regression Analysis with Qualitative Information • Dummy variables as an independent variable • Dummy variable trap • Importance of the "reference group" • Using dummy variables to test for equal means • Dummy variables for • Multiple categories • Ordinal variables • Interaction terms allowing different slope across groups • Testing for equal coefficients across groups • Dummy variables as dependent variable • Linear Probability Model • Heteroskedasticity and other issues • Interpretation of coefficients

  2. Dummy variable as independent variable • Dummy variables can be used to present qualitative information • Examples: gender, race, industry, occupation, year, month, … • Can be measured with a set of "dummy variables" • 1 if true; 0 if false • Example: A single dummy independent variable = the wage gain/loss if the person is a woman rather than a man (holding other things fixed) Dummy variable: =1 ifthepersonis a woman =0 ifthepersonis man

  3. Dummy variable as independent variable • Graphical Illustration Alternative interpretationofcoefficient: i.e. thedifference in mean wage betweenmenandwomenwiththe same levelofeducation. Interceptshift

  4. Dummy variable trap • The above model cannot be estimated because of perfect collinearity. • male+female=1 and is perfectly collinear with intercept • Infinite number of parameters yield same sum of squared errors – no unique estimates that minimize SSE. • To "fix" dummy variable trap, must omit one of the dummies or the intercept.

  5. Wage equation as example. • Estimated wage equation with intercept shift • What would coefficient be if • male dummy replaced female dummy? • Intercept was dropped, but male & female dummies included? Does the above regression imply that women are discriminated against? • Omitted variables bias • Walmart class action gender discrimination case Holding education, experience, and tenure fixed, women earn $1.81 less per hour than men

  6. Comparing means of subpopulations described by dummies Not holding other factors constant, women earn $2.51per hour less than men, i.e. the difference between the mean wage of men and that of women is $2.51. • Simple regression can be used to test whether whether difference in means is significant • The wage difference between men and women is larger if no other things are controlled for • Part of the difference is due to differences in education, experience, and tenure between men and women • -2.51 without controls vs -1.81 with controls

  7. Dummy variables for treatment effects • Effects of training grants on hours of training • This is an example of program evaluation • Treatment group (= grant receivers) vs. control group (= no grant) • Is the effect of treatment on the outcome of interest causal? Hourstraining per employee Dummy variable indicating whether firm received a training grant

  8. Dummy variables in log regressions. • Using dummy explanatory variables in equations for log(y) Dummyindicatingwhetherhouseisofcolonial style As thedummyforcolonial style changesfrom 0 to 1, thehousepriceincreasesby 5.4 percentagepoints

  9. Dummy variables for multiple categories • Define membership in each category by a dummy variable • Leave out one category (which becomes the base category or reference group) • Could leave out intercept instead. • How would coefficients change if marrmale was made reference group? • What hypotheses do t-statisitics on dummies test?

  10. Incorporating ordinal information using dummy variables Creditratingfrom 0-4 (0=worst, 4=best) Municipalbond rate • Example: City credit ratings and municipal bond interest rates Thisspecificationwouldprobably not beappropriateasthecreditratingonlycontainsordinalinformation. A betterwaytoincorporatethisinformationistodefinedummies: • Other examples: • Education groups • Age groups • Monthly or seasonal effects

  11. Interactions involving dummy variables • Interactions with dummies allow different slopes across groups. example: • Interesting hypotheses Interaction term = intercept men = slope men = intercept women = slope women The returntoeducationisthe same formenandwomen The whole wage equationisthe same formenandwomen

  12. Interactions involving dummy variables • Graphical illustration Interactingboththeinterceptandtheslopewiththefemaledummyenablesoneto model completelyindependent wage equationsformenandwomen

  13. Interactions involving dummy variables Estimated wage equation with interaction term Doesthismeanthatthereisnosignificantevidenceoflowerpayforwomenatthe same levelsofeduc, exper, andtenure? No: thisisonlytheeffectforeduc = 0. Toanswerthequestiononehastorecentertheinteractionterm, e.g. aroundeduc = 12.5 (= averageeducation). Noevidenceagainsthypothesisthatthereturntoeducationisthe same formenandwomen

  14. Testing for differences in regression functions across groups High school rank percentile Standardizedaptitudetest score College grade pointaverage • Unrestricted model (contains full set of interactions) • Restricted model (same regression for both groups) Total hoursspent in collegecourses F-test for equal regressions. How many degrees of freedom in numerator? Denominator?

  15. Testing for differences in regression functions across groups All interactioneffectsarezero, i.e. the same regressioncoefficientsapplytomenandwomen • Null hypothesis • Estimation of the unrestricted model Testedindividually, thehypothesisthattheinteractioneffectsarezerocannotberejected

  16. Multiple Regression Analysis with Qualitative Information Null hypothesisisrejected • Joint test with F-statistic • Alternative way to compute F-statistic in the given case • Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions • Run regression for the restricted model and store SSR • If the test is computed in this way it is called the Chow-Test • Important: Test assumes a constant error variance accross groups

  17. The linear probability model • Linear regression when the dependent variable is binary Ifthedependent variable onlytakes on thevalues 1 and 0 Linear probability model (LPM) In the linear probability model, thecoefficientsdescribetheeffectoftheexplanatory variables on theprobabilitythat y=1

  18. The linear probability model • Example: Labor force participation of married women =1 if in laborforce, =0 otherwise Non-wifeincome (in thousanddollars per year) Ifthenumberofkidsundersixyearsincreasesbyone, the pro- probabilitythatthewomanworks falls by 26.2% Does not look significant (but is it "exogenous" – i.e. Cov(kids, error)=0?

  19. Multiple Regression Analysis with Qualitative Information • Example: Female labor participation of married women (cont.) Graph for nwifeinc=50, exper=5, age=30, kindslt6=1, and kidsge6=0 The maximumlevelofeducation in the sample iseduc=17. Forthegi-vencase, thisleadsto a predictedprobabilitytobe in thelaborforceofabout 50%. Negative predictedprobability but noproblembecausenowoman in the sample haseduc < 5.

  20. Multiple Regression Analysis with Qualitative Information • Disadvantages of the linear probability model • Predicted probabilities may be larger than one or smaller than zero • Marginal probability effects sometimes logically impossible • The linear probability model is necessarily heteroskedastic • Heteroskedasticity consistent standard errors need to be computed • Advantanges of the linear probability model • Easy estimation and interpretation • Estimated effects and predictions are often reasonably good in practice VarianceofBer-noulli variable

  21. Multiple Regression Analysis with Qualitative Information • More on policy analysis and program evaluation • Example: Effect of job training grants on worker productivity The firm‘s scrap rate =1 if firm receivedtraininggrant, =0 otherwise Noapparenteffectofgrant on productivity Treatment group: grant receivers,Control group: firms that received no grant Grants weregiven on a first-come, first-servedbasis. Thisis not the same asgivingthem out randomly. Itmightbethecasethatfirmswithlessproductiveworkerssaw an opportunitytoimproveproductivityandappliedfirst.

  22. Multiple Regression Analysis with Qualitative Information • Self-selection into treatment as a source for endogeneity • In the given and in related examples, the treatment status is probably related to other characteristics that also influence the outcome • The reason is that subjects self-select themselves into treatment depending on their individual characteristics and prospects • Experimental evaluation • In experiments, assignment to treatment is random • In this case, causal effects can be inferred using a simple regression The dummyindicatingwhetheror not there was treatmentisunrelatedtootherfactorsaffectingtheoutcome.

  23. Multiple Regression Analysis with Qualitative Information • Further example of an endogenous dummy regressor • Are nonwhite customers discriminated against? • It is important to control for other characteristics that may be important for loan approval (e.g. profession, unemployment) • Omitting important characteristics that are correlated with the non-white dummy will produce spurious evidence for discrimination Dummyindicatingwhetherloan was approved Racedummy Creditrating

More Related