1 / 18

Alfredo Cirianni, Fernanda Panizon, Roberto Gismondi

Data Editing and its Effects on Estimates: a Short-term Business Surveys Appraisal. Alfredo Cirianni, Fernanda Panizon, Roberto Gismondi. Q2008, Roma, 9 July 2008. Summary of this presentation. Main features of short term Business surveys on Other services; Review of non sampling errors;

selene
Download Presentation

Alfredo Cirianni, Fernanda Panizon, Roberto Gismondi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Editing and its Effects on Estimates: a Short-term Business Surveys Appraisal Alfredo Cirianni, Fernanda Panizon, Roberto Gismondi Q2008, Roma, 9 July 2008

  2. Summary of this presentation • Main features of short term Business surveys on Other services; • Review of non sampling errors; • Strategy of editing and imputation procedures: • Methods for imputing missing values; • Macroediting approach for detecting influent units; • Methods for detecting and dealing with influent outliers; • Effects of editing and imputation procedures on final estimates.

  3. Main features of the surveys • Surveys repeated each quarter. • Main parameter: year-to–year quarterly change of turnover and index numbers in specific service sectors. • Target population: active firms in the field of reference, using the business register ASIA, released after 18 months. ASIA, contains information on turnover, employees and other variables used in sample design. This archive is also useful for the refreshment of the sample and for the periodical rotation of not eligible or never responding units. • Sample design : panel updated periodically to reflect business demography and to take into account attrition. • Different kinds of surveys: cut-off for oligopolistic sectors (air and maritime transports; postal and telecommunications services); stratified random sample for wholesale trade and computer services; stratified balanced sample for maintenance and repair of vehicles. • Sample size: 8.000 firms for wholesale, 2.700 for maintenance, 1.800 for computer services, 600 for oligopolistic sectors (about 13.000 theoretical businesses). • Variables in the questionnaire: turnover (net of VAT) as goal variable and number of employees (auxiliary variable used as stratifier). • Methods of data collection: by telefax, by mail, by web.

  4. Main kinds of non sampling errors • Errors in the list of firms in ASIA: incorrect addresses and units not eligible. In this case the correction procedure implies instruments such as "white pages" and documents from the Chambers of Commerce. If addresses are not recovered, these units are excluded from the sample (next year). • Measurement errors: errors in completing the questionnaire (turnover expressed in other units of measurement, accumulation of data of more quarters, incorrect interpretation of turnover definition - including accessories revenues, not with the principle of competence,…). Such errors are corrected during micro editing by telephone calling. • Errors in process: during the data capturing phase (decimal of Euros unrecognized by the system), unreadable faxes or errors during the data entry of questionnaires. • Non responses: specific strategy for each survey. • Detection and treatment of influential outliers: specific strategy for each survey.

  5. Strategies of data editing and imputation • The guidelines of the Methodological Manual (EUROSTAT) provide a wide range of methods for dealing with non-response and for the treatment of outliers. • Prevention of non-response: • 1) reminder by e-mail for business panel which provide us their e-mail; 2) postal reminder for all; 3) phone reminder for the largest units only. • Treatment of non responses • Basic idea. for small units: MAR assumption; for large units: ad hoc methods.

  6. Imputation of missing data • Methods for imputation • Ratio method in each stratum (X = turnover in (t-4)) in computer services and small business of oligopolistic sectors, excluding firms with a year-to-year change (YC) over 50% in absolute terms. • Regression method in each stratum in maintenance (X = ASIA yearly turnover), excluding respondent units identified as anomalous by Studentised residuals technique. • In some cases, ad hoc methods for dominant firms in oligopolistic sectors (use of auxiliary information: flows of goods and passengers by air transport) and validation with 2 yearly reviews. • No imputation for wholesale; this implies that to the not responding units is implicitly applied the year-to-year average change of respondent enterprises (identified as not anomalous) belonging to the same stratum.

  7. Macroediting for detecting influent businesses • For not oligopolistic sectors • The method of macroediting is applied for identifying influential units. • Identification of outliers: automaticapproach (Hidiroglou-Berthelot) for large samples (wholesale trade) and an interactive approach with phone recalling for samples with a smaller size (maintenance of vehicles and computer services). • Selection of influential strata through the MACRO score function; • Selection of influential businesses through the MICRO score function in each stratum. • Influent businesses fall into the tails of the distribution of the micro score function.

  8. Selection of influent strata through the MACRO score function Selection of influent strata through the MACRO score function: SF1s = YCs·Ws where: SF1 is the MACRO score function; YCs is the year-to-year change of turnover in stratum s; Ws is the weight of stratum s in its domain in the base year. SF1 expresses the contribution of the stratum on the estimate of the index of its domain. The largest (in absolute terms) SF1s are selected to cover 80% of the variation of the domain.

  9. Selection of influent businesses through the MICRO score function Units that fall on the tails of the distribution of the micro score function (SF2) and belonging to influential strata, are defined influential on the basis of: SF2i = YCi·Wi,(t-4) where: SF2 represents the micro score function; YCi is the year-to-year change of turnover for business unit i Wi,(t-4) is the weight of business unit i at time (t-4). The sum of micro scoring functions is the average change of turnover in the stratum.

  10. Method for detection and treatment of influential outliers Automatic Procedures for Wholesale trade Implementation of the method of Hidiroglou-Berthelot (H-B) for detecting abnormal data of turnover: Problem of defining the c parameter of H-B method in current procedure c is equal to 1,5 and with this threshold the 8% of business are identified as influential outliers Interactive procedures for computer services and maintenance of vehicles Outliers: units with a year-to-year variation YC in absolute terms that is more than 50%. In case of erratic trends, the unit is recalled to verify and correct measurement errors. Detected outliers are estimated as missing values in the case of maintenance and instead they excluded from calculation in the case of computer services. Outliers are units whose anomalous “development” is justified by events as the effect of secondary activities, misclassifications, change of stratum and billing problems.

  11. Percentages of detected outliers varying the acceptance threshold depending on c parameter in the HB method – Wholesale trade

  12. Problem of defining the c parameter of H-B method in the automatic procedure - Wholesale trade

  13. Outliers in oligopolistic sectors • Ordering in decreasing way the year-to-year quarterly changes (in absolute terms). • Review of time series and phone recalling for correction of the errors of measurement and process. • Temporary suspension from the process of businesses which seem to be anomalous in a single occasion of the survey; permanent exclusion from panel in presence of persistence in anomalous behaviour in several occasions.

  14. Evaluation of automatic procedures for detection of the outliers in the wholesale trade sector

  15. Evaluation of automatic procedures for detection of the outliers in maintenance of vehicles

  16. Evaluation of automatic procedures for the detection of outliers in computer service sector

  17. Number of outliers (influential and not representative) excluded from calculation of computer service sector

  18. Conclusions and future developments • We are planning to use the automatic HB procedure also for maintenance of vehicles and to enlarge the HB threshold of acceptance for wholesale trade, in order to reduce outliers and revision, that is difference between raw and final estimate. • Not solved problem: the treatment of representative outliers remains in some way subjective. The choice of excluding them – or considering them self-representative – may produce biases. • Larger use of models and larger connection among the estimations strategy and treatment of non responses and outliers. • Need to reduce the heterogeneity of methods applied to short-term service data.

More Related