1 / 34

Sample Selection

Sample Selection. Presented by. Desislava Nedyalkova Swiss Federal Statistical Office. The Sample Selection topic. The topic covers two main subjects which correspond to two complementary phases in the process of designing and conducting business surveys : Sample design and selection

dillon
Download Presentation

Sample Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SampleSelection

  2. Presented by • Desislava Nedyalkova • Swiss Federal Statistical Office

  3. The Sample Selection topic • The topic covers two main subjects which correspond to two complementary phases in the process of designing and conducting business surveys : • Sample design and selection • Sample coordination

  4. Overview of the topic (I) The sample selection part consists of: • Main theme module which covers the most used sampling designs in business surveys • Two method modules: • Balanced sampling for multi-way stratification • Subsampling for preliminary estimation

  5. Overview of the topic (II) The sample coordination part consists of: • Main theme module on sample coordination • Three method modules: • Sample co-ordination using simple random sampling (SRS) with permanent random numbers (PRNs) • Sample coordination using Poisson sampling with permanent random numbers (PRNs) • Assigning random numbers when co-ordination of surveys based on different unit types is considered

  6. Sample selection • Designing a sample in business statistics is a challenging task (Sigman and Monsour, 1995): • The population is often skewed. • Dynamic membership: • Creation of new businesses • Change in structure of businesses • Closed-down businesses • Changes in type or level of activity • Inter-business relationship.

  7. Stratified sampling I • Advantages: • The population can be divided into distinct, independent subpopulations called strata. • Leads to more efficient statistical estimates. • Different sampling techniques, e.g. simple random sampling, can be used for different subpopulations. • Disadvantages: • Requires the selection of relevant stratification variables. • It is not useful when there are no homogeneous subgroups.

  8. Stratified sampling II • Questions: • How should strata be constructed? • How should sample size be allocated to strata? • Optimal conditions for stratification: • Elements within a stratum are more similar to each other than to elements in other strata (homogeneous strata). • Large variability between strata, good size variable. • The stratification variables are strongly correlated with the variables of interest.

  9. Probabilityproportional to size (pps) sampling • Alternative to stratification • Main characteristics: • The probability of inclusion of a unit in the sample is proportional to some numeric size variable (e.g. turnover, number of employees). • PPS designs of fixed (sequential Poisson sampling) or random (Poisson sampling) sample size. • Easy implementation (e.g., Hartley and Rao, 1962). • Preferred usage: small samples. • In business statistics : Price Index Surveys.

  10. Other sampling schemes • Cut-off sampling (Knaub, 2008) • Non-probability sampling design where some elements of the population have no chance of selection. • Use: in very skewed populations (very many small businesses and a few large ones). • Systematic sampling (Cochran, 1977) • Balanced sampling (Deville & Tillé, 2004) • The Horvitz-Thompson estimate of the total of the auxiliary variable is equal to the population total of the auxiliary variable (design-based approach).

  11. One-waystratification • Stratified sampling (one-way stratified design): • Can be used when the objective of the survey is to produce estimates for subpopulations. • Planned sample size for each domain. • May have some drawbacks, especially in structural business surveys (large-scale surveys). • Overall sample size could be too large for survey’s economic constrains. • Sample allocation may be far from the theoretically desired one. • Strata with only few units can lead to higher response burden. • An alternative: multi-way stratification (see e.g., Falorsi and Righi, 2008).

  12. Multi-waystratification • Multi-way stratified designs • Controlled selection methods including methods based on controlled rounding problem via linear programming • Methods based on sample coordination • Theoretical and operative problems for large-scale surveys can arise with some of these methods. • Balanced sampling by the cube method can overcome these drawbacks.

  13. Subsampling forpreliminaryestimates (I) • In short-term statistics, preliminary estimates are demanded from the NSIs (EU Regulation). • A common approach for dealing with them: • Efficient estimators based on auxiliary information. • No explicit definition for a sampling design for preliminary estimates. • Usually drawn by a non-probabilistic sample design. • An alternative overall strategy involving sample design and estimator definition can be found in the module on preliminary estimates.

  14. Subsampling forpreliminaryestimates (II) • Given a sample survey, a preliminary estimate is defined on the basis of a sample of quick respondents. Main strategy: • A planned subsample for preliminary estimates: PTS – a preliminary theoretical sample is drawn. • Aim: Planned Preliminary Observed Sample (PPOS) as close as possible to PTS. • Intensive follow-up of the PTS. • Design-based or model-based approaches for defining the PTS.

  15. Sample co-ordination (I) • Sample overlap between surveys: number of common units at two different sampling occasions. • Independent selection: sample overlaps are not controlled. • Negative coordination: aims at spreading the response burden, sample overlap is minimized. • Positive coordination: for repeated surveys, sample overlap is maximized.

  16. Sample co-ordination (II) • Three main dimensions: • Sample coordination between surveys. • Sample coordination over time for the same survey. • Sample coordination of surveysbased on different unit types. • Two main types of methods: • Methods based on PRNs (used by most NSIs). • Methods based on linear programming (non-PRN methods) – optimal solution, computationally intensive.

  17. Co-ordinationbetweensurveys • Positive coordination: • Can facilitate the comparisons between variables of interest on the micro level. • Can facilitate the production of comparable and coherentstatisticsrequired by the National Accounts for compiling the GDP usingresultsdromdifferenteconomicsurveys. • Negative coordination: • Depends on the size of the sampling fractions in the different surveys. • Very effective mainly for small businesses. • .

  18. Co-ordination over time • Panel: a sample measured repeatedly in time (a period could be a week, a month, a quarter or a year). • Positive coordination over time: • Used to obtain high precision in estimates of change. • The size of the overlapisrandom. • It dependsmainly on the sampling design and changes in the business population. • Sample rotation: a tool for spreading the response burden.

  19. Co-ordination of surveysbasedon different unit types (I) • This kind of coordination isused in Australia, France and Sweden (PRN-methods). • The business register (BR) generallyconsists of different unit types. • Each business survey uses a unit type in accordance with the statistics to beproduced. • PRNsshouldbeassigned to each unit type.

  20. Co-ordination of surveysbasedon different unit types (II) Methods for assigning the PRNs: • PRNs are assigned to each unit type separately. • Advantage: a simple method, samples are independent of each other. • Disadvantage: does not admit co-ordination between surveys using different unit types. • PRNs are assigned so that co-ordination of unit types through their PRNs is possible. • Works well for single-location and single-activity businesses where each unit in a business receives the same PRN. • For multiple-location and/or multiple-activity businesses: less efficient. • Top-down or bottom-up approach to assign the PRNs (see Lindblom, 2003).

  21. Method: Sample co-ordinationusing SRS withPRNs (I) • The Swedish system for co-ordination of business samples (SAMU) is based on sequential simple random sampling without replacement(SRSWOR). • Sequential SRS (SRSWOR): • Consider a population U of size N (may be a stratum). • Each unit is assigned a PRN uniformly distributed over the interval [0,1]. • Units are sorted in ascendingorder of their PRNs. • The first nunits in the sortedlist are selected in the sample.

  22. Method: Sample co-ordinationusing SRS withPRNs (II) • Due to the symmetry of the uniform distribution: • the selection of the last nunits in the sortedlistalsogives a sequentialsrswor, • the selection of the first n units to the left or to the right of a given point a in [0,1] alsoyields asrswor(wrap-around if not enoughunits). • Dynamic population • New businesses in the frame (births) receive a new PRN. • Closed-down businesses (deaths) are withdrawn from the frame.

  23. Method: Sample co-ordinationusing SRS withPRNs (III) • Positive co-ordination • Over time: on each occasion a new sequentialsrsworisdrawnfrom the updated frame (samestarting point). • Of twosurveys: samestarting point and direction are used for bothsurveys. • Negativeco-ordination • For two surveys: we must choose properly the starting points and directions, e.g. different starting points and the same direction.

  24. Method: Sample co-ordinationusing SRS withPRNs (IV) • SAMU allows for positive or negative coordination whendifferent stratifications are used. • SAMU has implemented a system of rotation of samples : • Each unit in the frame israndomlydesignated to one of five rotation groups. • Randomnumbers are shiftedonly in one rotation group eachyear (RRC method of Ohlsson, 1992).

  25. Method: Sample co-ordinationusing SRS withPRNs (V) • A somewhatdifferentmethodisused in France (Cotton & Hesse, 1992): • Each unit in the frame receives a uniformrandomnumber in [0,1]. • Units are ordered in ascendingorder of their RNs. • A sequentialsrswor of size nisdrawn in the orderedlist. • Negativeco-ordinationisobtained by permuting the randomnumberssothatselectedunitsreceive the largestRNs and non-selected – the smallest. The rank of the RNsshouldberespected.

  26. Method: Sample co-ordinationusing SRS withPRNs (VI) • The Cotton & Hesse method: • Can beusedonly for negativeco-ordination. • Is based on permutation of the RNs. • Allows the use of different stratifications whenco-ordinatingstratifiedsamples. • A minimum of the expectedoverlapbetweentwo successive stratifiedsamplesisguaranteed. • Can beused to co-ordinatesamplingunits of different types, e.g. enterprises and establishments.

  27. Method: Sample co-ordinationusingPoisson sampling withPRNs (I) • Implemented at SFSO (Qualité, 2009). • Extension of the method of Brewer et al. (1972). • Algorithm: • For each survey, one defines for each unit a zone of selection (can be a union of disjoint intervals). • The total length of the zone of selection corresponds to the inclusion probability for that unit. • A unit is selected if its PRN falls within its zone of selection.

  28. Method: Sample co-ordinationusingPoisson sampling withPRNs (II) • Advantages: • Theoretically simple and easy to implement. • Dynamic populations are easilyhandled. • Disadvantages: • The random sample size. • Previously, at SFSO stratifiedsamplingwasused. • Optimal allocation procedures not need to bemodified, except for smallsamplingstratabecause of the risk of selecting an emptysample.

  29. Example of co-ordination (I) • We consider the selection of a unit in 6 samples (PRN equal to 0.42). We have: • The inclusion probability (pi). • The desired types of coordination : negative (N) or positive (P). • Two panels: samples 1, 3 and 6 are three waves of panel 1 and samples 2 and 5 are two waves of panel 2. • Sample 4 is for a survey conducted only once. • Positive coordination in a panel has a higher priority than negative coordination with the other samples.

  30. Example of co-ordination (II)

  31. Selection zones

  32. Discussion • Sample design and selection: • The sample design determines a survey’scharacteristicssuch as cost, variance and respondentburden. • Sample co-ordination: • An important tool for spreading the response burden. • Higher precision in estimates over time. • A co-ordination system provides a commonsampling frame for all surveys. • Sample rotation: • Reducing response burden in periodic surveys.

  33. References (I) • Brewer, K., Early, L., and Joyce, S. (1972). Selecting several samples from a single population, Australian Journal of Statistics, 3:231--239. • Cochran, W.G. (1977). Sampling Techniques, Wiley, New York. • Cotton, F. and Hesse, C. (1992b). Tirages coordonnés d'échantillons, Technicalreport, INSEE, Paris. • Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method, Biometrika, 91:893--912. • Falorsi P. D, Righi P. (2008). A Balanced Sampling Approach for Multi-way Stratification Designs for Small Area Estimation, Survey Methodology, 34, 223-234. • Hartley, H. and Rao, J. (1962). Sampling with unequal probabilities and • without replacement. Annals of Mathematical Statistics, 33:350--374. • Hesse, C. (1999). Sampling co-ordination: A review by country. Technical Report E9908, Direction des Statistique d'Entreprises, INSEE, Paris.

  34. References (II) • Knaub, J.R., Jr. (2008), Cutoff Sampling, In Encyclopedia of Survey Research Methods (red. P.J. Lavrakas), Sage, London. • Lindblom, A. (2003). SAMU - The system for coordination of frame populations and samples from the Business Register at Statistics Sweden, Background Facts on Economic Statistics 2003:3, Statistics Sweden. • Ohlsson, E. (1992). The system for co-ordination of samples from the business register at Statistics Sweden. R&D report 1992:18, Statistics Sweden. • Qualité, L. (2009). Unequal probability sampling and repeated surveys. Ph.D. thesis, University of Neuchâtel, Switzerland (http://doc.rero.ch/record/12284). • Sigman, R. S. and Monsour, N. J. (1995). Selecting Samples from List Frames of Businesses, In Cox, B. G. et al., editors, Business Survey Methods, chapter 8, pages 133—152, Wiley. inc., New York, USA.

More Related