250 likes | 509 Views
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey using the Information of the Business Register Luigi Biggeri , Piero Demetrio Falorsi National Statistical Institute of Italy (ISTAT). Summary.
E N D
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey using the Information of the Business RegisterLuigi Biggeri , Piero Demetrio FalorsiNational Statistical Institute of Italy (ISTAT)
Summary • The presentation describes a proposal of a new sampling strategy for the Italian CPI survey, aiming to identify a solution that may work out some of the problems of the current design, based on purposive sampling that sometimes could cause bias in the estimates. A complex random multiple stage pps sampling schema is proposed where the inclusion (or selection) probabilities at the different stages are proportional to the turnover. • Two relevant innovation herein proposed are related to the procedure for the selection of elementary items and to the estimation procedure, based on an observational strategy allowing: (i) to calculate proxy values of the weights w unknown at elementary item level; (ii) to define a consistent estimation method by means of which the national CPI estimate can be obtained as a weighted sum of the estimates of the subpopulation indices.
Summary • The Current CPI construction: characteristics and issues • Analysis and new studies • A proposal for a probability sampling strategy • Sampling frame and design • Estimation method • Concluding remarks
The Laspeyres type index Where: P is the price; y the year; mthe month; a= geographic area; c = local district, v = outlet, j =elementary item 1. The Current CPI construction: characteristics and issues (a)
The current purposive sample strategy of the CPI survey The collection of prices of a fixed basket of 562 representative products (purposively chosen) is carried out in two different ways: (a) centrally (roughly 60 products) by the staff of Istat through specific sample procedures (b) locally (roughly 500 products) directly by staff of Municipal Statistical Offices involved in the survey. Local survey: Three sampling stages: The first stage units (PSU) are the chief towns of provinces (86 municipalities out of 103) The second stage units are the outlets purposively chosen (at December of each year) in each PSU to be representative of the consumer behaviour as a kind of quota sampling (roughly 40,000) The most sold elementary items of the fixed basked of products (chosen at December of each year) are observed in each selected outlet (roughly 400,000) 1. The Current CPI construction: characteristics and issues (b)
The elementary indexes are obtained at municipality level by unweighted geometric mean The national index is calculated by subsequent territorial aggregation of elementary indexes, using weights at different levels based on population, national account data and households expenditure survey CPI for each sampled municipality is also calculated 1. The Current CPI construction: characteristics and issues (c)
Some issues of the current survey The current survey structure based on purposive sampling strategy does not allow to evaluate the accuracy; attempts to evaluate variance should be carried out. Not all the chief towns of provinces are included in the survey and the small municipalities are not included at all. The selection criterion of the “most sold elementary item” of the product in each outlet could introduce unknown bias The lack of adequate detailed information on the households’ consumer expenditures, prevents the use of the weights at the elementary aggregate level and at municipal and regional level 2. Analysis and new studies (a)
The need for experimental analysis To get information on the importance of the possible biases, analysis and computations must be carried out implementing adequate experiments To evaluate and improve the quality of the Italian CPIs, last year Istat established a Scientific Committee that is reviewing the different aspects of the indices construction process. The Committee has stressed the need to study and verify the fesibility construction of a probabilistic sample strategy. 2. Analysis and new studies (b)
The proposal is tailored for the survey of prices collected locally Recent availability of a business register referred to the local units and yearly updated (outlets) Possibility to estimate the turnover of each outlet for each product, to be used for construction of weights. The proposed survey framework based on a probability sample strategy guarantees unbiased estimates and should deal with most of the mentioned issues The sample design consists of a three stage selection scheme (local districts, outlets and items) using probabilities proportional to the turnover used as a proxy of the consumer expenditure. The index estimation in based on an observational scheme allowing to obtain proxy measures of the weight. Generalised regression estimator is used. A coherence of the calculated indexes for different estimation domains (planned or not) is obtained 3. A proposal for a probability sampling strategy
The parameter of interest is the national prices index c = local district, v = outlet, d = type of product, j = item price index of item (d,j,c,v) weigth of item (d,j,c,v) in terms of sold in base period 4. Sampling frame and design (a)
PSUs: Local districts (municipalities in Italy) are selected within the geographical area through balanced sampling, aiming to define a sample producing direct estimates of the totals of some auxiliary variables equal to the known totals (Deville and Tillé, 2004) SSUs: The sampling design for the outlets consists of linking D distinct samples, one for each type of product (TP). The outlet selection is made through a coordinated selection technique (PRN) aiming atobtaining an high level of overlapping of the selected samples for each type of product, reducing the size of the total sample of outlets, being equal the number of observed items (Ohlsson, 1995) FINAL UNITS: A probability sample scheme for the item selection based on iterative hierarchical drawing of groups of products is proposed. Such a scheme is feasible and allows to solve the current problem of the definition of the fixed basket of products 4. Sampling frame and design (b)
Planned domains for the survey estimates: The most detailed domain is the geographical area by Type of Product (TP), element of the four digit classification of COICOP 4. Sampling frame and design (b)
Sampling frame:Local Unit Archive Yearly updated The information contained in the archive (expositive surface, number of employees, economic activity code, geographical zone) are used for the stratification by size, outlet typology, etc. The NACE code may allow to establish which outlets sell the TP items to households A table linking NACE codes and types of products has been constructed 4. Sampling frame and design (c)
4. Sampling frame and design (d) Table 1. Example of table linking Types of products and NACE codes
Sampling Frame CONSTRUCTION: definition of turnover outlet turnover: from business register source, it is exactly known only for the enterprises with only one local unit; otherwise it is imputed using different data sources turnover for outlet and type of product: estimated using different data sources (fiscal data, business register, National Accounts, Household Budget Survey) Note that possible errors in imputation do not imply bias on sampling strategy but they can cause only an increase of variance 4. Sampling frame and design (e)
Local districts Selection sample local districts are drawn from the local districts of the a area by means of a balanced sampling design with inclusion probabilities proportional to the turnover: The balancing equations are being where is the overall turnover of the c-th local district for the d-th type of product calculated by summing up frame data 4. Sampling frame and design (f)
Outlet Selection Separate samples are realised (one for each type of product). Each sample is performed through a PRN coordination technique which realises the maximum overlapping of the outlets selected for the different types of products In the sample selected for the generic type of product d (d=1,…,D) the outlets are stratified by typology within the local district The outlet final inclusion probability is defined as proportional to the outlet size in terms of turnover for the d-th type of product in the area. 4. Sampling frame and design (g)
Items Selection (1) In order to perform the probability selection of the items, the main operational difficulty is the construction of the list of all items sold in the outlet belonging to the type of product for which the outlet has been included in the sample A way to solve such a difficulty is to define: A hierarchical tree classification of elementary products for each type of product; A selection procedure for each level of this structure. The procedure should be translated in a specific algorithm, implemented in the lap-top used by the interviewer for the data collection. This operation allows to identify briefly a very small subset of homogeneous items to be used for the item selection in the outlet. 4. Sampling frame and design (h)
TP = Equipment for Sport Swimming Skiing Body Building Other Sports Swimming Water Polo Downhill skiing Cross-country skiing Body Building Tennis Football Swimsuit Flippers Maschere Water polo equipment Ski Ski boots Skiwear Ski Ski boots Skiwear Body Building Sportswear Tennis racket Sportswear Football Swimsuit Flippers Water polo equipment Ski Ski boots Snowsuit Gloves Ski Ski boots Snowsuit Gloves Body Building Sportswear Tennis racket Undershirt Short trousers Sneakers Football
Item Selection (2) The procedure of item selection uses, at each level, the inclusion probabilities defined on the basis of information available in the sampled outlet or available as a auxiliary priori information. The optimal situation would occur if the probabilities used at each level were proportional to the turnover of the unit with respect to the total turnover of the outlet for the set of units among which the selection has to be carried out at the specific level. The probability selection allows to define unbiased estimators The efficiency of the estimates depends on the kind of the selection probabilities used 4. Sampling frame and design (i)
Final inclusion probabilities The sampling scheme is implemented giving the items an inclusion probability proportional to the ratio between the item turnover and the overall turnover, at the d -th TP and area a level. This expression shows that the proposed sample design is approximately self-weighting 4. Sampling frame and design (l)
In the estimation phase it is useful to express the weight with the following factorisation: Therefore, a proxy observable value of this weight can be calculated as 5. Estimation method (a) where are respectively the imputed values of
The general index estimate can be obtained by means of the generalised regression estimator proposed by Valliant (1999), based on the model The expression of the estimator is In this way the sample estimates equal the population totals, known or estimates from external sources (Households Budget Survey, National Accounts). 5. Estimation method (b)
The proposed strategy is coherent with the Italian current practice: the sample of elementary items and outlets is updated each year to take into account the rapid changes in the products and in outlet universes. The sampling selection of outlets and items developed with permanent random numbers techniques allows implementing in a simple way a yearly updating of the samples guaranteeing, at the same time, to realize a prefixed rotation rate (Ohlsson, 1995). Meanwhile, the sample of Local Districts, once selected, remains unchanged for several years. This is justified by cost consideration, connected with the high cost of training the interviewers for the local districts, and by the fact that the structure of local districts changes over time very slowly. 6. Concluding remarks (a)
To verify the feasibility of the proposed probability sampling design, an experimental version of the frame has been implemented for testing various aspects of the sampling strategy. The outcome of the experiments have been encouraging. An experimentation of the selection of local districts (correspond to municipalities) and outlets for the Italian survey has been carried out. Many other experiments have to be carried out to evaluate: (i) the feasibility and the cost-efficient implementation of the proposed probability sampling strategy; (ii) the quality improvements that can be obtained using only partially the proposed strategy. Concluding remarks (b)