Genevieve knight and michael white july 8 2014
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Genevieve Knight and Michael White July 8 2014 PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

ESRC Research Methods Festival 2014: Using Secondary Analysis to Research Individual Behaviour On the job training and accounting for endogeneity using BHPS longitudinal data. Genevieve Knight and Michael White July 8 2014. Genevieve Knight [email protected]

Download Presentation

Genevieve Knight and Michael White July 8 2014

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Genevieve knight and michael white july 8 2014

ESRC Research Methods Festival 2014:Using Secondary Analysis to Research Individual BehaviourOn the job training and accounting for endogeneity using BHPS longitudinal data

Genevieve Knight and Michael White

July 8 2014

Genevieve knight g knight@psi org uk

Genevieve Knight

[email protected]

Genevieve knight and michael white july 8 2014

The impact of economic conditions on the outcomes of job mobility and the mediating role of training: Sectoral differences in the private returns to in-job training in the 1990s UK recession

  • This working paper has been prepared as part of the outputs of a UK ESRC-funded project [ES/K00476X/1 The 1990’s: sectoral rebalancing, mobility and adaptation – the employment, self-employment and training policy lessons for the current UK recession].

  • Part of the first the ESRC Secondary Data Analysis Initiative (SDAI) 2013.

  • ESRC

Sectoral differences in the private returns to in job training in the 1990s

Sectoral differences in the private returns to in-job training in the 1990s

  • Other outputs:

  • A detailed description of the mobility of public sector employees relative to market employees during the 1990’s period with extensive public sector cuts;

  • A comparative sectoral analysis of movements into non-employee destinations (see ‘The public sector in the 1990s recession: employee exits to non-employee destinations’).

  • An analysis of the earnings and job-satisfaction outcomes consequent upon employee mobility from public to market sector and within the public sector (see ‘The 1990s recession – consequences of mobility from and in the Public Sector’);

  • ESRC



  • The story

  • The data

  • Causality and the analysis methods

  • The results

1 the story

1. The story

To inform the large scale UK public sector job cuts announced within an economic recession we looked for evidence on what happened last time…. UKPublic sector contraction and forced movement of public service employees to the private sector has happened before – under the John Major government of the 1990s, so we can use the experience then to gauge the effects of public-to-private job moves now

Uk public sector employment

UK public sector employment

1 the story1

1. The story

  • We examine returns to in-job training over the period 1991-8, a period when the UK public sector underwent a severe contraction, and which also experienced widespread turbulence as a result of technological and organizational change.

  • Using publically available longitudinal data from the British Household Panel Survey (BHPS)

  • The effects on earnings are estimated twice (thrice! Why?)- through a treatment effects matching method and also by fixed effects panel regression (+OLS).

  • THE NEW BIT – we get sectoral returns to in-job training

1 why

1. Why?

  • The methods and data are the most difficult part of the puzzle

  • How do we achieve causality attribution? (? endogeneity???)

  • How can we explore timing dynamics in the follow-up period? (we use t and t+1 from panel data)

1 the difficulties of predicting the effects of training in an analytical way

1. The difficulties of predicting the effects of training in an analytical way.

  • Both employers and employees are involved in choices concerning training, and training itself can be regarded as an outcome variable.

  • Selection, including self-selection, into training is an ever-present complication.

  • A further difficulty raised by self-selection is the possibility that training is sought by people of relatively high ability, and that the earnings gains reported in the literature are upwardly biased by ability differences between the trained and non-trained.

1 how to get causality attribution for training

1. How to get causality attribution for training?

Use the methods and design of analysis

  • Treatment effect methods (matching)

  • Fixed Effect/regression methods

    2 criteria to infer a causal relationships

    Covariation(correlation) of causal (X, explanatory) and outcome (Y, dependent) variables AND time order – cause comes before effect

1 how to get causality attribution for training1

1. How to get causality attribution for training?

  • We focus the analysis by defining a training ‘treatment period’ time-point

  • We ensure the X covariates are measured at or before the treatment

  • We measure outcomes after the treatment

  • We include appropriate X

1 why treatment effect methods matching what does it mean when we use these methods

1. Why Treatment effect methods (matching)? What does it mean when we use these methods?

  • How does a matching analysis differ from regression methods (FE/OLS)?

  • Matching focuses on the outcome(s) of an intervention, or ‘treatment’ (here, the receipt of in-job training) that takes place at a particular time for some people but not others.

  • Since it is impossible to observe the same individual being both treated and not treated in the same period, the treated individual is instead ‘matched’ to one or more non-treated individuals whose characteristics and circumstances are so similar that they have virtually the same propensity, or probability, of receiving treatment.

  • Thus the matching method provides between-person comparisons.

1 matching impact evaluation the counterfactual

1. Matching - Impact evaluation – the counterfactual

Individual A – receives training


has to be estimated!!!


She then earns £280 per week


She would earn £220 per week

If she had received no training


Impact on individual A = £280 - £220 = £60

1 matching impact evaluation the counterfactual1

Clearly do not observe the situation where training is not received for those who actually do receive training

Matching impact analysis involves carefully trying to estimate the counterfactual for those who receive training.

1. Matching - Impact evaluation & the counterfactual

1 why fe regression what does it mean when we use these methods

1. Why FE regression? What does it mean when we use these methods?

  • FE regression provides estimates that can be interpreted as approximately ‘causal’, especially in removing bias from unobserved constant individual differences such as ability or personality (see Allison 2009).

  • However, the method is not based on a formal causal model in the sense that the method of matching is.

  • See Wooldridge (2002).

1 why fe regression what does it mean when we use these methods1

1. Why FE regression? What does it mean when we use these methods?

  • Fixed effect (FE) panel regression.

    (AKA ‘within regression’ since estimates reflect within-person variation around her/his mean values rather than comparisons between people).

  • A chief advantage –

    accounts for the influence of unobserved personal factors (such as ability or personality) that are constant over time.

1 why fe regression causation x y z relationships complicated

1. Why FE regression? Causation: X, Y, Z relationships complicated!




  • direct causal: (X → Y).

  • indirect causal: (X → Z → Y),

  • Spurious: both (Z → X and Z → Y).

  • a combination of direct/indirect/spurious

1 why fe regression why matching impact evaluation the experimental ideal an example

1. Why FE regression? Why matching?Impact evaluation - The experimental ideal: an example



Outcome = Op


Baseline data


At random

Eligible population

The Counterfactual

Outcome = Oc


  • Two groups statistically equivalent at allocation/assignment

  • Random allocation ensures no systematic differences between control and programme groups at assignment/allocation

  • No systematic differences in what we can observe about the two groups and, importantly, what we can’t observe

  • Impact of programme = difference in means or proportions between 2 groups

1 why fe regression why matching

1. Why FE regression? Why matching?

  • Matching tries to replicate the experimental control design

  • FE uses panel to replicate the control of what we can’t observe for individual

  • How well they work in practice with the data is

  • An empirical question….

1 the teaser

1. The teaser

  • We find:

    Positive overall effects for some, but with sectoraldifferences (no effect market, 7.5% public) , and phasing (timing), with public sector training providing a more persistent gain in protecting earnings when employees change sector or change employment.

1 final teaser

1. Final teaser…

  • There is a smaller effect indicated by FE, relative to matching.

  • We suspect FE does a better job of allowing for individual unobserved ability/financial motivation

    that no amount of X will get rid of.

2 the data bhps

2. The data: BHPS

  • The British Household Panel Survey (BHPS)

  • Interviewed respondents at regular annual intervals.

  • The initial sample was representative of the British population in 1990.

  • Identify people who were employees during the 1990s and among these the individuals who received in-job training within the period 1991-97 (the training question was discontinued in 1998).

2 the data y earnings

2. The data: Y = earnings

  • Y = log usual monthly earnings, either in the current year of the present job where training is received or in the year following training. natural logarithm

  • Earnings are used, rather than wage, since such a measure reflects paid hours worked as well as the wage, and maintaining hours, hence earnings capability, is an important objective for most job-movers (the scant availability of full-time jobs and the enforced shortening of hours are frequently noted issues in the adverse British conditions post-2008).

  • Usual earnings rather than most recent earnings are used so as to reduce variation for reasons such as absence or exceptional overtime working.

2 the data mobility

2. The data: mobility

  • We use information about employment between two consecutive waves to classify labourmobility.

    1) change in sector, i.e. the mobility between public and market sectors;

2 the data on the job training

2. The data: on the job training

  • Somewhat formal training provision rather than informal provision.

  • It asked in the whether, in the past year, the individual had “taken part in any education or training schemes or courses as part of your present employment”?

  • For our analyses, training is represented as a dummy variable taking value 1 when training has taken place during a one-year period of employment.

  • This variable

  • defined on the year 1994-5 for the matching analyses,

  • on the year prior to the outcome year in the panel FE.

2 the data x controls

2. The data: X controls

  • educational and professional qualifications,

  • broad occupational level as an indicator of acquired skill (Tahlin 2007),

  • age as a proxy for experience,

  • family structure variables(marital status, employment status of spouse, and age of youngest child) separately specified for men and women (for the FE modeling of interactions involving fixed variables, see Allison 2009).

  • employer variables known to affect wages or training: sector, living in the prosperous London or South-East region, working in a small (less than 50 employees) workplace, and presence of a recognized union.

  • FE: age in quadratic form.

  • Matching: age a set of five ‘decade’ dummies representing 20s through to 60s with 16-19 as omitted category.

2 the data x controls1

2. The data: X controls

  • Matching: 16 ‘Card-Sullivan’ variables which summarize profiles of individual advancement over the period 1991-94.

  • Card and Sullivan (1988) originally devised the method as a form of exact matching, but we follow the application of Dolton et al. (2006) who use similar derived variables as matching regressors. It is hoped that this removes bias from unmeasured ability, on the assumption that underlying ability tends to be recognized in patterns of earnings prior to the focal training episode.

  • Matching: a variable for the number of waves observed in employee status, plus its square, over 1991-5. This intended to correct for variations in recent experience resulting from years out of employment.

  • FE: year dummies to control for movements in economic conditions that are likely to affect all employment.

  • PLUS compensate for the impossibility of weighting, we incorporated variables that were used in the original construction of the strata and weights for the survey sample. These included variables representing non-labour income and wealth assets, notably car ownership and home ownership.

3 the methods

3. The methods

We use a combination of

  • (1) treatment effect estimation by the method of matching, and

  • (2) panel data analysis by fixed effect (FE) regression.

    We use matching for large samples (all, market, public) and FE for mobility groups as we can pool across waves

3 comparison of the methods

3. Comparison of the methods?

  • (1) how to contrast matching with the FE analysis?

  • (2) how does a matching analysis differ from regression methods?

3 comparison of the methods what does it mean when we use these methods

3. Comparison of the methods?What does it mean when we use these methods?

  • how does a matching analysis differ from regression methods?

  • We want to test whether T (leads to) Y (along with correlates X)

  • Some other variables Z that have not been measured could have led to the change in Y (unless we can find a way to measure Z or the holy grail of an instrument for Z)

  • Matching gives a different weighting to the cases than (OLS) regression would. But it still cannot doesn’t solve the problem of Z. Both control for observed X, rely on CIA. Suffer omitted variable bias…

  • Heckman et al.1998 ‘matching as an econometric estimator’ & ‘characterising selection bias using experimental data’

  • Imai and Kim (2011) ‘on the use of linear fixed effects regression estimators for causal inference’

3 comparison of the methods what does it mean when we use these methods1

3. Comparison of the methods?What does it mean when we use these methods?

  • Does FE offer much more than OLS/matching for training analysis?

  • A little – we can account for individual ‘ability’ –

  • Important in training context

  • There appears to be heterogeneous treatment (training) effects by sector, important in our context

  • But we still can’t account for time varying confounders (Z)…

That s not all folks 3 other methods data problems

Observational data…(sigh!)

Violation of ignorable treatment assignment

[there are unobserved variables related to both treatment assignment (who gets training) and the outcome (wages)]

‘self selection’

The only true solution – get better data!

In practice, we have to implement solutions to try to fix up these issues….

That’s not all folks….3. Other ‘methods/data’ problems…

3 other problems to solve in panel data analytical problems biases

Attrition – survey ‘drops outs’ can unbalance the information leading to ‘selection bias’ and should be accounted for (a form of unit missing data).

Missing (Y or covariate values) – the methods of accounting for missing data will affect the results (another bias). X =Simple Mean Imputation: missing dummy indicators in the propensity; missing dummies in the FE. Y? see above

Choose The variance estimates for matching – there is still debate on the best way to account for the uncertainty of the propensity estimate not being the True propensity. Just an estimate. (it leads to more conservative – wider confidence interval on the impact than necessary…! ) bias and variance reduction trade off decisions…we use abadie&imbens

You have to choose a matching method…more bias and variance reduction trade off decisions We specify two-match nearest neighbour matching with sample replacement

You have to choose how much covariate balance is enough…

For FE - repeated observations on the same individuals: use robust variance estimator

For FE -‘unbalanced panel’ approach (Wansbeek and Kapteyn 1989), since restriction to the balanced panel would lose too much data.

Inclusion of numerous controls for time-varying variables helps to strengthen the causal interpretation of results as well as substituting for sample weights.

We compensate for the absence of weighting by including a wide range of control variables that have been used by the survey originators in structuring the survey (see Taylor et al. 2011).

3. Other problems to solve in (panel) data…Analytical problems (biases)

Finally 3 what can we say about causality panel fixed effects

Finally3. what can we say about causality: panel Fixed Effects

  • Intuitively the removal of fixed effects is likely to be very important in removing selection bias in a model of training effects.

  • However there is also the possibility of unobserved time-varying bias.

  • We strive to minimize such bias by the inclusion of an extensive set of time-varying control variables, as described above.

  • But some such bias is likely to remain, especially because of a rather limited set of information about employers.

  • Accordingly we do not argue that all sources of bias have been removed nor that a definitive causal effect has been identified.

4 the results matching

4. The results: matching

The fe results model 1

The FE results model 1

  • FE mean marginal predictions: the difference associated with having received training for each of the mobility conditions-

  • 1. Staying in the public sector after previous-year training improved earnings by 3.2 per cent by comparison with staying in the public sector without such training.

  • 2. Moving public to market sector: 7.5 per cent better off than those who lack recent training when they change sector. Or, earnings loss of 7.5 per cent unless protected by prior training. Significant at the 10 per cent significance level

  • 3. no difference to earnings from training for those staying in the market sector,

  • 4. no difference (a small (not statistically sign.) negative effect) of training when moving from market-to-public.

Final final conclusion

Final, final conclusion …

We suspect FE does a better job of allowing for individual unobserved ability/financial motivation that no amount of X will get rid of.

Follow psi on twitter @ psi london

Follow PSI on Twitter: @PSI_London

  • Login