1 / 15

Confidence Intervals for Capture-Recapture Data With Matching

Confidence Intervals for Capture-Recapture Data With Matching. Stephen Sharp, National Records of Scotland. The Problem (i). You have undertaken a (presumably imperfect) enumeration of a given population. You then undertaken a second (also presumably imperfect) coverage survey .

Download Presentation

Confidence Intervals for Capture-Recapture Data With Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confidence Intervals for Capture-Recapture Data With Matching Stephen Sharp, National Records of Scotland

  2. The Problem (i) • You have undertaken a (presumably imperfect) enumeration of a given population. • You then undertaken a second (also presumably imperfect) coverage survey. • You have matched the two so that you know how many people were in both surveys (N12); in the first survey only (N1); and in the second survey only (N2). • You require to estimate the number of people in neither survey (N0).

  3. Summary

  4. The Problem (ii) • The classical estimate of N0 is the product of N1 and N2 divided by N12. • However this assumes that absence from the first survey does not change the probability of absence from the second. • For humans, this is very unlikely.

  5. A Bayesian approach • As we do not know N0, we require its probability distribution conditional on N12, N1 and N2 which we do know. • We get this from Bayes’ theorem. • p(N0 | N12 N1 N2) = constant x p(N12 N1 N2 | N0) x p(N0). • Posterior is proportional to likelihood x prior. • We need a likelihood and a prior.

  6. The likelihood function (i) • The distribution of N12, N1 and N2 conditional on N0 is multinomial with probability parameters p12, p1, p2 and p0. • The four probabilities must sum to one so we need three constraints to specify uniquely three parameters. • We assume that p12, p1 and p2 stand in the same proportions as N12, N1 and N2. • This gives us two constraints.

  7. The likelihood function (ii) • Instead of imposing a third constraint however we let the posterior distribution of N0 depend on the dichotomous correlation ϕ, which measures stochastic dependency. • We can now specify the likelihood for a given value of ϕ and watch the effect of changing it.

  8. The prior distribution • What did we know about the likely size of the population before we took the two surveys? • This knowledge is reflected in the prior distribution. • A safe bet would be an uninformative prior (perhaps a normal or uniform distribution with a very big variance). • If you are confident though you might be better to use an informative prior (i.e. a smaller prior variance). • This reduces the variance of the posterior distribution (though be careful to check that the prior is consistent with the likelihood).

  9. Some examples

  10. Example 1 – Using a Poisson prior with l = 550

  11. Example 2 – Using a totally uninformative prior

  12. Further work (i) • So we can model the point estimate and confidence intervals as a function of the dichotomous correlation f. • But what is the value of f? • This will vary from one subgroup to another within the population. • It will depend on the diversity within the subgroup of the propensity to take part in public surveys like the Census and the coverage survey.

  13. Further work (ii) • Attempts to model this have suggested that typical values for f vary between 0.25 and 0.40. • This suggests that for an uninformative prior, the population point estimate might be 560 against 520 with the independence assumption; an underestimate of about 7%. • The confidence intervals are ±14 or 15 as opposed to ±6 or 7; about twice as wide.

  14. Conclusion • The assumption of independence introduces error into both the point estimate and the confidence intervals when population size is estimated from capture-recapture data. • The CI error is in the “wrong” direction (i.e. not on the side of caution). • Departure from independence arises because those members of the population unlikely to be included in one sample are less likely to be included in the other. • Assessing the extent of dependence is difficult but its effects make it important to try.

  15. Confidence Intervals for Capture-Recapture Data With Matching Stephen Sharp National Records of Scotland Ladywell House Ladywell Road Edinburgh EH12 7TF 0131 314 4649 Stephen.sharp@gro-scotland.gsi.gov.uk

More Related