1 / 27

Edward C. Jaenicke Stephan J. Goetz Ping-Chao Wu Carolyn Dimitri (USDA-ERS)

Identifying and Measuring the Effect of Firm Clusters Among Certified Organic Processors and Handlers. Edward C. Jaenicke Stephan J. Goetz Ping-Chao Wu Carolyn Dimitri (USDA-ERS). Format/Outline:. Formulation of Research Idea

adonis
Download Presentation

Edward C. Jaenicke Stephan J. Goetz Ping-Chao Wu Carolyn Dimitri (USDA-ERS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying and Measuring the Effect of Firm Clusters Among Certified Organic Processors and Handlers Edward C. Jaenicke Stephan J. Goetz Ping-Chao Wu Carolyn Dimitri (USDA-ERS)

  2. Format/Outline: • Formulation of Research Idea • Preliminary Analysis: Data and estimation check, and decision to go forward • Research Plan: Double check theory, proceed with empirics • Focus on Methods: What econometric model should we use? • Mistakes Along the Way: Econometrics • Turning the Results Into a Paper: Revisit theory and other thorny issues • Improving the Paper: Future research (hopefully)

  3. 1. Research Idea Formation • Goetz volunteers to fund a grad student for the summer IF we can find a topic that fits the Northeast Regional Center’s program. • Jaenicke and Goetz – ongoing projects, available datasets, research interests • Goetz: Can you analyze the impact of “clusters” of certified organic handling firms on firm behavior? • Jaenicke: I don’t know. (But I’ll check)

  4. 2. Preliminary Analysis: Data and estimation check, and decision to go forward

  5. 3. Research Plan: Double check theory, proceed with empirics Decisions: Theoretical, empirical contribution, or mixture? Proceed? • Prior: Existing literature shows very few studies measuring the impacts of clusters on firm performance Lit review: • Confirms prior, though some related empirical examples exist • More empirical literature on cluster formation (not cluster impact) Bottom line: • The link between firm agglomeration theory and measurement of agglomeration impact is not well established. • Difficult to build on theory. • On the other hand, there is an empirical gap. • If we could empirically measure the impact of firm clusters (agglomeration), we would have a paper. • I.e.: Proceed

  6. 3. (Continued) Preliminary Empirics Lots of empirical choices and decisions: 1. How should firm agglomerations or clusters be defined? Issues to consider? • As a continuous variable – e.g., industrial intensity? • As a binary variable – e.g., presence of a minimum # of firms within an area? • Area? • Minimum #? Answers: • Cluster as binary variable because we have micro (firm-level) data. • But stay flexible on the Minimum # and Area (i.e., try both county and zip code) 2. What variables should be included in a “cluster impact” equation, and how would they be justified? “Impact”on what variable(s)? Answers: • Draw from available survey data. • Plus Census data? • Proceed by considering RHS variables as controlling factors

  7. 3. (Continued) Preliminary Econometric Model (1) yji = Cn,i + xi+ i , • yjis a firm-level decision of type j • Cn,iis a binary variable, where n is the minimum # of firms in a cluster. • xi is the vector of controlling factors •  and are parameters: is our main interest

  8. 3. (Continued) Preliminary Empirics Time to go the data and try a preliminary estimation • OLS: LHS (yj) = Total Gross Sales RHS = Cluster variable (Cn,) plus survey variables (xi) such as dummy variables for types of firm, years in business, etc. More on these variables later. • Results: not so good (See word doc 1)

  9. 3. (Continued) Preliminary Empirics Time to try go the data and try a preliminary estimation • OLS: LHS (yj) = Total Gross Sales RHS = Cluster variable (Cn,) plus survey variables (xi) such as dummy variables for types of firm, years in business, etc. More on these variables later. • Results: not so good (See word doc 1) Change in plans: Because existing literature focuses on (endogenous) cluster formation, account for endogeneity by including this equation in the system • 3SLS: LH1 = Total Sales RH1 = Cluster variable plus survey variables LH2 = Cluster variable RH2 = Cluster variable plus more survey variables

  10. 3. Revised Econometric Model (1) yji = Cn,i + xi1 + 1i , (2) Cn,i = zi2+ 2i , • yjis a firm-level decision of type j • Cn,iis a binary variable, where n is the minimum # of firms in a cluster. • xi is the vector of controlling factors in the cluster impact equation • zi is the vector of controlling factors in the cluster formation equation • , 1, and 2are parameters: is still our main interest • We also added county-level demographics to x and z: • # of farms, land values, education-college, nonfarm per-capita income, population • Results: Seem better! (See word doc 1 again)

  11. 3. Preliminary Empirics: 3SLS Results Looks good: Two “stories” emerge. But something’s wrong: What is it?

  12. 3. Two Stories, and a Mistake Two stories: 1. In many cases, the cluster variable has a significant impact on firm performance or firm decisions. 2. Varying n (the minimum number of firms that defines a cluster) appears to change the impact What’s wrong? We need to account for the endogeneity of Cn, so yes we need a system. • But not 3SLS. Why not? • Because Cnis binary and Three Stage Least Squares would yield biased estimates. • Equation (2) needs to be binary (e.g., logit or probit)

  13. 4. Refocus on Methods: What econometric model should we use? Back to the drawing board: revisit key features • Cn is binary and endogenous – seems relatively simple • Are the two error terms contemporaneously correlated? • Worst case, construct a Likelihood function for (1) and (2), and program it manually. • Turns out there is a fairly large literature on this econometric model • Estimation of “Treatment Effects” (from labor economics) Thanks to Ping-Chao for first suggesting it.

  14. 4. Treatment Effects Agronomical example: • Suppose we wanted to estimate the effect that a new pesticide (Pesticide X) had on corn yields • How would you design an agronomic experiment to measure this effect? • Some sort of randomized plots, control the inputs, measure the output (yield): treatment effect measured by analysis of variance • Instead of experimental data, suppose you had observed data on inputs and yields from some farmers who used Pesticide X, and some who didn’t. Would you just estimate a regression with Pesticide X use on the right hand side? • Under what scenario would that method be accurate? • If the pesticide company distributed Pesticide X to farmers randomly (as in an experimental trial), then this method would be fine. • But, more likely, farmers self select according to some non-random criteria (such as “prior success with new chemical use” or something similar) • Therefore, we must account for selection bias.

  15. 4. Econometric model: Treatment Effects and Evaluation A good example is found in the Stata manual. Topic--Women’s labor market. Similar econometric structure: (1) yi = Cn,i + xi1 + 1i , (2) C = zi2+ 2i , where 1i and 2i have covariance matrix: Now, let • yi = women’s wages • C (again binary) = 1 if the women has a college degree, 0 otherwise. Is C endogenous ? • Probably. Hence estimating (1) alone would be incorrect. What if (1) and (2) were estimated jointly? What would signify? • Normally it would be the y due to C switching from 0 to 1. • But not in this case. Why not? • “Selection bias”

  16. 4. Econometric model: Average Treatment Effect, etc. Measured Treatment Effects come in two forms: 1. Average Treatment Effect: ATE = E[y1|x, C = 1] - E[y0|x, C = 0] 2. Average Treatment Effect on the Treated: ATET = E[y1- y0|C = 1] In both cases, the conditional expectations can introduce selection bias. If  > 0,  underestimates ATE. If  = 0,  is an unbiased estimate of ATE.

  17. 4. Treatment Effects: Stata Command Stata: “TreatReg”. ML estimation of the following Likelihood function Stata language: treatregsales_per_employee x1 x2 x3 x_etc, treat (C = z1 z2 z3 z_etc) Key options: ML or two-step procedure Key estimation issues: • The Treatment Effect model seems appropriate. Will it work/converge? • Will  be significant? Yes mostly means clusters have an impact. • Will  be significant? Yes, means that selection bias plays a role. • 13 choices for yj. How will results vary? • 8 choices for Cn. Again, how will results vary? • Which variables should be in the x and z vectors?

  18. 5. Mistakes Along the Way: Re-estimation with TreatReg See results (Word document 2): • At first glance, our results do not look so hot. • If you look very carefully, you’ll see an indication of another mistake on my part.

  19. 5. Mistakes Along the Way: Re-estimation with TreatReg • Our excitement over some preliminary success (significant estimates of ) caused me to be too hasty in choosing RHS variables in x and z. • The mistake cause me to revisit the variable choices. • See the “raw” survey results (Word doc 3). • See the estimation results for one ML estimated model (Word 4 doc).

  20. 5. Mistakes Along the Way: New Results (no mistakes?) Signs of ML Estimates for , with Different Cluster Definitions

  21. 5. Calculating ATE • We could use the formula on Slide 14. • Or, we could ask Stata to calculate the predicted values for y, conditional on whether the observation was treated (C=1  yctrt) or not treated (C= 0  ycntrt) . Stata code: predict yctrtspe_5,yctrt predict ycntrtspe_5,ycntrt generate diffspe_5 = yctrtspe_5 - ycntrtspe_5 summarize diffspe_5 Stata for… y conditional on treatment Stata for… y conditional on no treatment

  22. 5. Results – AverageTreatment Effects ATE Accounting for Selection Bias, with Different Cluster Definitions

  23. 5. Results – AverageTreatment Effects ATE Accounting for Selection Bias, with Different Cluster Definitions

  24. 6. Turning the Results Into a Paper: Revisit theory and other thorny issues • We believe that the previous table provides the type of results needed for an empirical contribution. (Of course we could be wrong.) • So now we to anticipate what issues that would affect a paper’s “publish-ability” Two* big issues (or weak spots) 1. Leading readers from “theory” on firm agglomeration and clusters to the empirical model. • Be forthright in noting that some previous research resembles our approach, while some does not. • Focus on econometric model (Treatment Effects) rather than theoretical model • Brief mention of how one might try to link the econometric model to theory (e.g., a restricted profit function) * A third issue could be additional endogeneity in the RHS of equation (1) or (2). If there’s time, I’ll revisit this issue.

  25. 6. Two big issues (or weak spots) 2. Providing a rationale for the choice of regressors (especially x and z) • Admit it’s ad hoc (don’t try to skip over that) • Here, we again try to point to prior research. • One paper in particular categorizes regressors in a way that works well for us: Categories for C, x and z: (i) agglomeration variables (C), (ii) urban encroachment and population characteristic variables, (iii) input availability variables, (iv) firm productivity and specialization variables, (v) local economic variables. Our Table uses these categories. (See word doc 5)

  26. 7. Improving the Paper: Future research (hopefully) Two main refinements: Hopefully these can wait for a sequel 1. Incorporate spatial econometrics: • Does a neighboring firm cluster (in the next county) affect a firm’s performance? • Try a spatial lag or a spatial error model. 2. As an alternative to the Treatment Effects model, • Use “Propensity Scores” to get an accurate estimate of the ATET (Average Treatment Effect on the Treated) • Propensity scores can match treated against non-treated recreating a quasi-experimental design from secondary data. • May work better or worse than Treatment Effect models.

  27. 8. Thanks Discussion and questions? Critique of format?

More Related