A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey using the Information of the Business Register Luigi Biggeri , Piero Demetrio Falorsi National Statistical Institute of Italy (ISTAT). Summary.
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey using the Information of the Business RegisterLuigi Biggeri , Piero Demetrio FalorsiNational Statistical Institute of Italy (ISTAT)
Where: P is the price; y the year; mthe month;
a= geographic area; c = local district, v = outlet, j =elementary item
1. The Current CPI construction: characteristics and issues (a)
The collection of prices of a fixed basket of 562 representative products (purposively chosen) is carried out in two different ways:
(a) centrally (roughly 60 products) by the staff of Istat through specific sample procedures
(b) locally (roughly 500 products) directly by staff of Municipal Statistical Offices involved in the survey.
Local survey: Three sampling stages:
The first stage units (PSU) are the chief towns of provinces (86 municipalities out of 103)
The second stage units are the outlets purposively chosen (at December of each year) in each PSU to be representative of the consumer behaviour as a kind of quota sampling (roughly 40,000)
The most sold elementary items of the fixed basked of products (chosen at December of each year) are observed in each selected outlet (roughly 400,000)
1. The Current CPI construction: characteristics and issues (b)
The national index is calculated by subsequent territorial aggregation of elementary indexes, using weights at different levels based on population, national account data and households expenditure survey
CPI for each sampled municipality is also calculated
1. The Current CPI construction: characteristics and issues (c)
The current survey structure based on purposive sampling strategy does not allow to evaluate the accuracy; attempts to evaluate variance should be carried out.
Not all the chief towns of provinces are included in the survey and the small municipalities are not included at all.
The selection criterion of the “most sold elementary item” of the product in each outlet could introduce unknown bias
The lack of adequate detailed information on the households’ consumer expenditures, prevents the use of the weights at the elementary aggregate level and at municipal and regional level
2. Analysis and new studies (a)
To get information on the importance of the possible biases, analysis and computations must be carried out implementing adequate experiments
To evaluate and improve the quality of the Italian CPIs, last year Istat established a Scientific Committee that is reviewing the different aspects of the indices construction process. The Committee has stressed the need to study and verify the fesibility construction of a probabilistic sample strategy.
2. Analysis and new studies (b)
Recent availability of a business register referred to the local units and yearly updated (outlets)
Possibility to estimate the turnover of each outlet for each product, to be used for construction of weights.
The proposed survey framework based on a probability sample strategy guarantees unbiased estimates and should deal with most of the mentioned issues
The sample design consists of a three stage selection scheme (local districts, outlets and items) using probabilities proportional to the turnover used as a proxy of the consumer expenditure.
The index estimation in based on an observational scheme allowing to obtain proxy measures of the weight. Generalised regression estimator is used. A coherence of the calculated indexes for different estimation domains (planned or not) is obtained
3. A proposal for a probability sampling strategy
c = local district, v = outlet, d = type of product, j = item
price index of item (d,j,c,v)
weigth of item (d,j,c,v)
in terms of sold in base period
4. Sampling frame and design (a)
SSUs: The sampling design for the outlets consists of linking D distinct samples, one for each type of product (TP). The outlet selection is made through a coordinated selection technique (PRN) aiming atobtaining an high level of overlapping of the selected samples for each type of product, reducing the size of the total sample of outlets, being equal the number of observed items (Ohlsson, 1995)
FINAL UNITS: A probability sample scheme for the item selection based on iterative hierarchical drawing of groups of products is proposed. Such a scheme is feasible and allows to solve the current problem of the definition of the fixed basket of products
4. Sampling frame and design (b)
The most detailed domain is the geographical area by Type of Product (TP), element of the four digit classification of COICOP
4. Sampling frame and design (b)
The information contained in the archive (expositive surface, number of employees, economic activity code, geographical zone) are used for the stratification by size, outlet typology, etc.
The NACE code may allow to establish which outlets sell the TP items to households
A table linking NACE codes and types of products has been constructed
4. Sampling frame and design (c)
Table 1. Example of table linking Types of products and NACE codes
outlet turnover: from business register source, it is exactly known only for the enterprises with only one local unit; otherwise it is imputed using different data sources
turnover for outlet and type of product: estimated using different data sources (fiscal data, business register, National Accounts, Household Budget Survey)
Note that possible errors in imputation do not imply bias on sampling strategy but they can cause only an increase of variance
4. Sampling frame and design (e)
sample local districts are drawn from the local districts of the a area by means of a balanced sampling design with inclusion probabilities proportional to the turnover:
The balancing equations are
where is the overall turnover of the c-th local district for the d-th type of product calculated by summing up frame data
4. Sampling frame and design (f)
Separate samples are realised (one for each type of product).
Each sample is performed through a PRN coordination technique which realises the maximum overlapping of the outlets selected for the different types of products
In the sample selected for the generic type of product d (d=1,…,D) the outlets are stratified by typology within the local district
The outlet final inclusion probability is defined as proportional to the outlet size in terms of turnover for the d-th type of product in the area.
4. Sampling frame and design (g)
In order to perform the probability selection of the items, the main operational difficulty is the construction of the list of all items sold in the outlet belonging to the type of product for which the outlet has been included in the sample
A way to solve such a difficulty is to define:
A hierarchical tree classification of elementary products for each type of product;
A selection procedure for each level of this structure.
The procedure should be translated in a specific algorithm, implemented in the lap-top used by the interviewer for the data collection. This operation allows to identify briefly a very small subset of homogeneous items to be used for the item selection in the outlet.
4. Sampling frame and design (h)
Water polo equipment
Water polo equipment
The procedure of item selection uses, at each level, the inclusion probabilities defined on the basis of information available in the sampled outlet or available as a auxiliary priori information.
The optimal situation would occur if the probabilities used at each level were proportional to the turnover of the unit with respect to the total turnover of the outlet for the set of units among which the selection has to be carried out at the specific level.
The probability selection allows to define unbiased estimators
The efficiency of the estimates depends on the kind of the selection probabilities used
4. Sampling frame and design (i)
The sampling scheme is implemented giving the items an inclusion probability proportional to the ratio between the item turnover and the overall turnover, at the d -th TP and area a level.
This expression shows that the proposed sample design is approximately self-weighting
4. Sampling frame and design (l)
Therefore, a proxy observable value of this weight can be calculated as
5. Estimation method (a)
where are respectively the imputed values of
The expression of the estimator is
In this way the sample estimates equal the population totals, known or estimates from external sources (Households Budget Survey, National Accounts).
5. Estimation method (b)
Meanwhile, the sample of Local Districts, once selected, remains unchanged for several years. This is justified by cost consideration, connected with the high cost of training the interviewers for the local districts, and by the fact that the structure of local districts changes over time very slowly.
6. Concluding remarks (a)
Many other experiments have to be carried out to evaluate: (i) the feasibility and the cost-efficient implementation of the proposed probability sampling strategy; (ii) the quality improvements that can be obtained using only partially the proposed strategy.
Concluding remarks (b)