Presented by

- Desislava Nedyalkova
- Swiss Federal Statistical Office

The Sample Selection topic

- The topic covers two main subjects which correspond to two complementary phases in the process of designing and conducting business surveys :
- Sample design and selection
- Sample coordination

Overview of the topic (I)

The sample selection part consists of:

- Main theme module which covers the most used sampling designs in business surveys
- Two method modules:
- Balanced sampling for multi-way stratification
- Subsampling for preliminary estimation

Overview of the topic (II)

The sample coordination part consists of:

- Main theme module on sample coordination
- Three method modules:
- Sample co-ordination using simple random sampling (SRS) with permanent random numbers (PRNs)
- Sample coordination using Poisson sampling with permanent random numbers (PRNs)
- Assigning random numbers when co-ordination of surveys based on different unit types is considered

Sample selection

- Designing a sample in business statistics is a challenging task (Sigman and Monsour, 1995):
- The population is often skewed.
- Dynamic membership:
- Creation of new businesses
- Change in structure of businesses
- Closed-down businesses
- Changes in type or level of activity
- Inter-business relationship.

Stratified sampling I

- Advantages:
- The population can be divided into distinct, independent subpopulations called strata.
- Leads to more efficient statistical estimates.
- Different sampling techniques, e.g. simple random sampling, can be used for different subpopulations.
- Disadvantages:
- Requires the selection of relevant stratification variables.
- It is not useful when there are no homogeneous subgroups.

Stratified sampling II

- Questions:
- How should strata be constructed?
- How should sample size be allocated to strata?
- Optimal conditions for stratification:
- Elements within a stratum are more similar to each other than to elements in other strata (homogeneous strata).
- Large variability between strata, good size variable.
- The stratification variables are strongly correlated with the variables of interest.

Probabilityproportional to size (pps) sampling

- Alternative to stratification
- Main characteristics:
- The probability of inclusion of a unit in the sample is proportional to some numeric size variable (e.g. turnover, number of employees).
- PPS designs of fixed (sequential Poisson sampling) or random (Poisson sampling) sample size.
- Easy implementation (e.g., Hartley and Rao, 1962).
- Preferred usage: small samples.
- In business statistics : Price Index Surveys.

Other sampling schemes

- Cut-off sampling (Knaub, 2008)
- Non-probability sampling design where some elements of the population have no chance of selection.
- Use: in very skewed populations (very many small businesses and a few large ones).
- Systematic sampling (Cochran, 1977)
- Balanced sampling (Deville & Tillé, 2004)
- The Horvitz-Thompson estimate of the total of the auxiliary variable is equal to the population total of the auxiliary variable (design-based approach).

One-waystratification

- Stratified sampling (one-way stratified design):
- Can be used when the objective of the survey is to produce estimates for subpopulations.
- Planned sample size for each domain.
- May have some drawbacks, especially in structural business surveys (large-scale surveys).
- Overall sample size could be too large for survey’s economic constrains.
- Sample allocation may be far from the theoretically desired one.
- Strata with only few units can lead to higher response burden.
- An alternative: multi-way stratification (see e.g., Falorsi and Righi, 2008).

Multi-waystratification

- Multi-way stratified designs
- Controlled selection methods including methods based on controlled rounding problem via linear programming
- Methods based on sample coordination
- Theoretical and operative problems for large-scale surveys can arise with some of these methods.
- Balanced sampling by the cube method can overcome these drawbacks.

Subsampling forpreliminaryestimates (I)

- In short-term statistics, preliminary estimates are demanded from the NSIs (EU Regulation).
- A common approach for dealing with them:
- Efficient estimators based on auxiliary information.
- No explicit definition for a sampling design for preliminary estimates.
- Usually drawn by a non-probabilistic sample design.
- An alternative overall strategy involving sample design and estimator definition can be found in the module on preliminary estimates.

Subsampling forpreliminaryestimates (II)

- Given a sample survey, a preliminary estimate is defined on the basis of a sample of quick respondents. Main strategy:
- A planned subsample for preliminary estimates: PTS – a preliminary theoretical sample is drawn.
- Aim: Planned Preliminary Observed Sample (PPOS) as close as possible to PTS.
- Intensive follow-up of the PTS.
- Design-based or model-based approaches for defining the PTS.

Sample co-ordination (I)

- Sample overlap between surveys: number of common units at two different sampling occasions.
- Independent selection: sample overlaps are not controlled.
- Negative coordination: aims at spreading the response burden, sample overlap is minimized.
- Positive coordination: for repeated surveys, sample overlap is maximized.

Sample co-ordination (II)

- Three main dimensions:
- Sample coordination between surveys.
- Sample coordination over time for the same survey.
- Sample coordination of surveysbased on different unit types.
- Two main types of methods:
- Methods based on PRNs (used by most NSIs).
- Methods based on linear programming (non-PRN methods) – optimal solution, computationally intensive.

Co-ordinationbetweensurveys

- Positive coordination:
- Can facilitate the comparisons between variables of interest on the micro level.
- Can facilitate the production of comparable and coherentstatisticsrequired by the National Accounts for compiling the GDP usingresultsdromdifferenteconomicsurveys.
- Negative coordination:
- Depends on the size of the sampling fractions in the different surveys.
- Very effective mainly for small businesses.
Co-ordination over time

- Panel: a sample measured repeatedly in time (a period could be a week, a month, a quarter or a year).
- Positive coordination over time:
- Used to obtain high precision in estimates of change.
- The size of the overlapisrandom.
- It dependsmainly on the sampling design and changes in the business population.
- Sample rotation: a tool for spreading the response burden.

Co-ordination of surveysbasedon different unit types (I)

- This kind of coordination isused in Australia, France and Sweden (PRN-methods).
- The business register (BR) generallyconsists of different unit types.
- Each business survey uses a unit type in accordance with the statistics to beproduced.
- PRNsshouldbeassigned to each unit type.

Co-ordination of surveysbasedon different unit types (II)

Methods for assigning the PRNs:

- PRNs are assigned to each unit type separately.
- Advantage: a simple method, samples are independent of each other.
- Disadvantage: does not admit co-ordination between surveys using different unit types.
- PRNs are assigned so that co-ordination of unit types through their PRNs is possible.
- Works well for single-location and single-activity businesses where each unit in a business receives the same PRN.
- For multiple-location and/or multiple-activity businesses: less efficient.
- Top-down or bottom-up approach to assign the PRNs (see Lindblom, 2003).

Method: Sample co-ordinationusing SRS withPRNs (I)

- The Swedish system for co-ordination of business samples (SAMU) is based on sequential simple random sampling without replacement(SRSWOR).
- Sequential SRS (SRSWOR):
- Consider a population U of size N (may be a stratum).
- Each unit is assigned a PRN uniformly distributed over the interval [0,1].
- Units are sorted in ascendingorder of their PRNs.
- The first nunits in the sortedlist are selected in the sample.

Method: Sample co-ordinationusing SRS withPRNs (II)

- Due to the symmetry of the uniform distribution:
- the selection of the last nunits in the sortedlistalsogives a sequentialsrswor,
- the selection of the first n units to the left or to the right of a given point a in [0,1] alsoyields asrswor(wrap-around if not enoughunits).
- Dynamic population
- New businesses in the frame (births) receive a new PRN.
- Closed-down businesses (deaths) are withdrawn from the frame.

Method: Sample co-ordinationusing SRS withPRNs (III)

- Positive co-ordination
- Over time: on each occasion a new sequentialsrsworisdrawnfrom the updated frame (samestarting point).
- Of twosurveys: samestarting point and direction are used for bothsurveys.
- Negativeco-ordination
- For two surveys: we must choose properly the starting points and directions, e.g. different starting points and the same direction.

Method: Sample co-ordinationusing SRS withPRNs (IV)

- SAMU allows for positive or negative coordination whendifferent stratifications are used.
- SAMU has implemented a system of rotation of samples :
- Each unit in the frame israndomlydesignated to one of five rotation groups.
- Randomnumbers are shiftedonly in one rotation group eachyear (RRC method of Ohlsson, 1992).

Method: Sample co-ordinationusing SRS withPRNs (V)

- A somewhatdifferentmethodisused in France (Cotton & Hesse, 1992):
- Each unit in the frame receives a uniformrandomnumber in [0,1].
- Units are ordered in ascendingorder of their RNs.
- A sequentialsrswor of size nisdrawn in the orderedlist.
- Negativeco-ordinationisobtained by permuting the randomnumberssothatselectedunitsreceive the largestRNs and non-selected – the smallest. The rank of the RNsshouldberespected.

Method: Sample co-ordinationusing SRS withPRNs (VI)

- The Cotton & Hesse method:
- Can beusedonly for negativeco-ordination.
- Is based on permutation of the RNs.
- Allows the use of different stratifications whenco-ordinatingstratifiedsamples.
- A minimum of the expectedoverlapbetweentwo successive stratifiedsamplesisguaranteed.
- Can beused to co-ordinatesamplingunits of different types, e.g. enterprises and establishments.

Method: Sample co-ordinationusingPoisson sampling withPRNs (I)

- Implemented at SFSO (Qualité, 2009).
- Extension of the method of Brewer et al. (1972).
- Algorithm:
- For each survey, one defines for each unit a zone of selection (can be a union of disjoint intervals).
- The total length of the zone of selection corresponds to the inclusion probability for that unit.
- A unit is selected if its PRN falls within its zone of selection.

Method: Sample co-ordinationusingPoisson sampling withPRNs (II)

- Advantages:
- Theoretically simple and easy to implement.
- Dynamic populations are easilyhandled.
- Disadvantages:
- The random sample size.
- Previously, at SFSO stratifiedsamplingwasused.
- Optimal allocation procedures not need to bemodified, except for smallsamplingstratabecause of the risk of selecting an emptysample.

Example of co-ordination (I)

- We consider the selection of a unit in 6 samples (PRN equal to 0.42). We have:
- The inclusion probability (pi).
- The desired types of coordination : negative (N) or positive (P).
- Two panels: samples 1, 3 and 6 are three waves of panel 1 and samples 2 and 5 are two waves of panel 2.
- Sample 4 is for a survey conducted only once.
- Positive coordination in a panel has a higher priority than negative coordination with the other samples.

Discussion

- Sample design and selection:
- The sample design determines a survey’scharacteristicssuch as cost, variance and respondentburden.
- Sample co-ordination:
- An important tool for spreading the response burden.
- Higher precision in estimates over time.
- A co-ordination system provides a commonsampling frame for all surveys.
- Sample rotation:
- Reducing response burden in periodic surveys.

