13 Collecting Statistical Data

13 Collecting Statistical Data 13.1 The Population 13.2 Sampling 13.3 Random Sampling 13.4 Sampling: Terminology and Key Concepts 13.5 The Capture-Recapture Method 13.6 Clinical Studies

Finding the N-value We have already observed that finding the exact N-value of a large and elusivepopulation can be extremely difficult and sometimes impossible. In many cases, agood estimate is all we really need, and such estimates are possible through sampling methods. The simplest sampling method for estimating the N-value of apopulation is called the capture-recapture method.

THE CAPTURE-RECAPTUREMETHOD ■Step 1.Capture (sample):Capture (choose) a sample of size n1, tag (mark,identify) the animals (objects, people), and release them back into the general population.

THE CAPTURE-RECAPTUREMETHOD ■Step 2.Recapture (resample):After a certain period of time, capture a newsample of size n2 and take an exact head count of the tagged individuals(i.e., those that were also in the first sample). Call this number k.

THE CAPTURE-RECAPTUREMETHOD ■Step 3.Estimate:The N-value of the population can be estimated to beapproximately(n1•n2)/k.

Capture-Recapture Method The capture-recapture method is based on the assumption that both the captured and recaptured samples are representative of the entire population. Underthese assumptions, the proportion of tagged individuals in the recaptured sampleis approximately equal to the proportion of the tagged individuals in the population. In other words, the ratio k/n2 is approximately equal to the ratio n1/N. From this we can solve for N and get N ≈ (n1•n2)/k.

Example 13.6 Small Fish in a Big Pond A large pond is stocked with catfish. As part of a research project we need to estimate the number of catfish in the pond. An actual head count is out of the question(short of draining the pond), so our best bet is the capture-recapture method. Step 1.For our first sample we capture a predetermined number n1 of catfish,sayn1 = 200. The fish are tagged and released unharmed back in the pond.

Example 13.6 Small Fish in a Big Pond Step 2.After giving enough time for the released fish to mingle and disperse throughout the pond, we capture a second sample of n2catfish. While n2does not have to equal n1, it is a good idea for the two samples to be ofapproximately the same order of magnitude. Let’s say that n2 = 150. Of the150 catfish in the second sample, 21 have tags (were part of the originalsample).

Example 13.6 Small Fish in a Big Pond Assuming the second sample is representative of the catfish population in thepond, the ratio of tagged fish in the second sample (21/150) is approximately thesame as the ratio of tagged fish in the pond (200/N). This gives the approximateproportion 21/150 ≈ 200/N which in turn gives N ≈ 200 150/21 ≈ 1428.57

Example 13.6 Small Fish in a Big Pond Obviously, the value N = 1428.57cannot be taken literally, since N must bea whole number. Besides, even in the best of cases, the computation is only an estimate. A sensible conclusion is that there are approximately N = 1400catfishin the pond.

13 Collecting Statistical Data