Create Presentation
Download Presentation

UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

102 Views

Download Presentation
Download Presentation
## UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**The Editing Process in the Italian Short-Term Survey**on Labour Cost based on Administrative Data Silvia Pacini (pacini@istat.it) M. Carla Congia, Donatella Tuzi ISTAT - ITALY UNECE - Conference of European Statisticians Work Session on Statistical Data Editing Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Outlines • The main features of the Oros Survey • - The peculiarities of the administrative sources used and their main impact on the E&I process • The main steps of the Oros Editing process • Different from traditional surveys where many non-sampling errors may be prevented or reduced ex ante during the planning phase • Final remarks Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main features of the Oros Survey (1) • Variables : gross wages, other labour costs, total labour cost • Coverage : all Italian firms with at least one employee in the private non-agricoltural sector • Timeliness : 70 days from the end of the reference quarter In Italy the Oros Survey represents an innovative example of administrative data extensively used to produce short-term business statistics Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main features of the Oros Survey (2) Until 2002: gross wages and total labour cost were produced only by the Montly Survey on Large Enterprises covering firms with more than 500 employees (LES) Since 2003: the Oros Survey has released indicators for all Italian private firms with at least one employee through the use of the administrative data collected by the Italian National Social Security Institute (INPS) integrated with LES data Administrative data of the National Social Security Institute (INPS) The Oros Survey 100% of private employees Combined with LES Data 20%of total private employees Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**The administrative sources (1) Administrative Register (AR) Structural information on administrative units (id number, fiscal code, birth date…) The AR represents the current population updated at the end of the reference quarter but suffers of over-coverage problems (temporary suspensions and firm closures are under-recorded) Impact on the E&I process: • Making the AR suitable for statistical purposes requires: • checks on the quality of the fiscal code (used as firm identification code) • drawing the NACE code from the Italian Statistical Business Register (BR-ASIA). The 90% of the INPS active units are linked Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**The administrative sources (2) Electronic Montly Social Contribution Declarations (DM10 forms archive) Impact on the E&I process: • preliminary checks • complex retrieval process of the statistical variables based on a “metadata database” ad hoc in-house built • RAW DATA trasmitted to Istat after 35 days from the end of the reference quarter. These data are not subjected to previous aggregations and checks from INPS because of the tight time constrains - PROVISIONAL POPULATION used to produce provisional estimates (95-98% of the population used to produce final estimates 5 quarters later) 1.3 mln employers - 10 mln employees the high number of units makes necessary a selective editing Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Given the peculiarities of the administrative information used checks have been developped along the whole process Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Preliminary checks and the retrieval of statistical variables (1) The social contribution declaration or DM10 form is a detailed grid, containing information about firm, number of employees by type of employment, paid days, wage bill, social contributions, credit terms and tax relieves. Information is declared at a high disaggregated level identified by 4 digits administrative codes (more than 5,000) • Complete and updated METADATA are necessary for: • the translation of the administrative information • the estimation of some components of other labour costs non declared in the DM10 form • METADATA DATABASE • in-house built to standardize and use information on laws and regulations, contribution rates, codes, and other technical aspects on Social Security • quarterly updated Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Preliminary checks and the retrieval of statistical variables (2) Preliminary checks: to investigate and possibly correct errors on codes, duplications, incoherencies with current legislation… • Retrieval of statistical variables: • number of employees and related wage bills have to be calculated selecting and aggregating the appropriate codes • other labour costs have to be calculated and some components (e.g. Employers’ injuries insurance premium and severance payment) not recorded in the DM10 have to be estimated In this step E&I is mainly automatic and based on the metadata database BUT the metadata database updating cannot be completely automated Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Preliminary checks and the retrieval of statistical variables (3) FROM (input) the adminstrative micro data (10 mln of records each month) 8 records on average for 1 DM10 - translation - checks - aggregation TO (output) the statistical micro data (1.3 mln of records each month) 1 record for 1 DM10 Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Preliminary checks and the retrieval of statistical variables (4) Implications of highly disaggregated raw data: • a mine of information for multiple statistical aims • the possibility of keeping under control the translation of administrative information (activity not done by INPS considering the short-time available for the release of Oros Indicators) • very complex ad hoc and in-house procedures for the translation of administrative information into statistical data • the building and continuous updating of the metadata database which requires multiple skills (legal, statistical,etc.) • the handling of a huge quantity of data in a very short time Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Micro editing (1) Once statistical data have been made available A more traditional micro editing procedure is set up based on a score function assigning to each of the 1.3 mln of units the probability that an error occurs in the target variables Selective editing criteria to select the anomalous values which are interactively analysed and if necessary corrected Cut-off thresholds In this step E&I is mainly interactive Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Micro editing (2) Units are checked through some edit rules mainly based on well-known functional relations among the analysed variables. Edits are aimed at evaluating at unit record level both cross-sectional and longitudinal consistency using the information on the previous month: • a positive amount of wage bills must correspond to a positive amount of employment, and often to a particular rate of social contributions • the number of employees recorded in the current month should not significantly differ from that of the previous month • the gross per capita wages, or the per capita paid days, should have similar and acceptable amounts in the analysed period • the rate of social contributions on gross wages should fall within an expected range Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Micro editing (3) Given the peculiarities of the administrative data used, the target variable distributions may have significant tail area so that the identification of the cut-off threshold is particularly problematic: • very low per capita wages (e.g. firms with employees all receiving only supplementary earnings from employers) • negative per capita other labour costs (e.g. social contribution rebates) These aspects affect both the calculation of suitable check indicators and the single out of potential errors Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Imputation of temporary employment agencies (1) The INPS provisional population covers the 95-98% of total active units but evidence shows that unit nonresponses do not affect Oros wages and other labour costs changes. Temporary employment agencies are an exception: 3% of privat sector employment 100 enterprises 300,000 employees 20% of employees of sector K (Real estate, renting and business activities) where they are all classified by INPS These large enterprises are not included in the LES data. Their imputation essential because of their weight: the absence of only one of this unit may have a significant impact not only on levels but also on changes of the per capita indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Imputation of temporary employment agencies (2) Which are unit nonresponses? In the Oros Survey: the AR is available but it suffers of over-coverage problems In a traditional survey: a list of theoretical respondents is available Prediction of the unit activity state throught a longitudinal analysis of the unit activity in the nearby quarters (based on the evidence that it is actually low the probability that a latecamer position in a quarter is latecamer also in its near quarters) Given the dinamic nature of these firms, it is necessary to follow their frequent changes (e.g. mergers, split-ups, etc.) over time to correctly single out unit nonresponses Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Imputation of temporary employment agencies (3) Imputation criteria • Deterministic imputation mainly based on the longitudinal information available on each unit: • suitable values are selected from the closest quarter when the current missing unit was respondent • those values are fairly updated using panel information drawn from the current respondents Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**The large firms: checks for survey data integration (1) Why the integration with survey data? The administrative data could guarantee the coverage of all firms in the private sectors, but for the estimation of Large Enterprises the use of LES data is preferable because: - each of them has a relevant influence on the estimations (1000 enterprises / 2 million of employees) - they are frequently subjected to changes over time A direct contact with LE can guarantee a higher quality of data and a more rapid and efficient management of their changes (spill overs, mergers,…) Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**The large firms: checks for survey data integration (2) The integration implies: • the production of variables harmonised with those produced using administrative data • a check and editing procedure to correctly single out the LES enterprises in the administrative data. Starting from the list of firms belonging to the survey, a complementary list of INPS firms must be defined avoiding omissions and duplications Record linkage problems • The FISCAL CODE is the only matching variable between the two archives but it is not sufficient to correctly identify the SAME firms for: • - formal errors • updating at different times (mergers, hive-offs, split-ups might be recorded in several periods) Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**The large firms: checks for survey data integration (3) A firm may have different fiscal codes in the two archives: signal of possible problems is a significantly different number of employees manually checked and joined to the correspondent INPS firms Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Macroediting (1) Final quality controls on macro data are a key step in the E&I process to identify possible residual errors that may significantly affect the series • Changes in contribution legislation as economic events with an impact on macro data are frequent, so irregular but acceptable trends must be as possible distinguished from anomalies due for example to: • an erroneous updating of the “metadata database” • outliers/errors not singled out and corrected in the previous editing steps If errors are detected, a drill-down to micro data is necessary Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Macroediting (2) • - analytic and graphical inspection of the time series at a sub-population detail, through some statistical measures which have to respect pre-determined acceptance boundaries • automatic detection of outliers based on TERROR, an application of the software TRAMO-SEATS (Caporello and Maravall, 2002) which detects suspected errors in the last observations comparing them with their forecasts estimated trough REG-ARIMA models • comparison with figures drawn from other Istat statistical sources (e.g. National Account data, Indices of wages according to collective agreements, etc.) • variable relationships, whose coherence has always to be guaranteed (e.g. the ratio of other labor costs on wages, the evolution of their trends, etc.) Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Main steps of the Oros Survey E&I process Monthly administrative micro data Preliminary checks and retrieval of statistical variables Documen t a t i on Micro editing Imputation of temporary employment agencies The large firms: checks for survey data integration Macroediting Oros Survey indicators Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Documentation of the E&I process • The Oros E&I process, ad hoc developped in SAS and fully integrated in the general survey production process, is quarterly updated and documented: • metadata are archived • methodological information is documented • imputed data are flagged (and pre-imputation data are archived) • quality indicators on the impact of the imputation are calculated The documentation of the Oros process guarantees its reproducibility and repeatability Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Final remarks • The E&I process of the quarterly Italian Oros Survey: • - developed without any previous experience in the use of administrative data for the production of short term indicators • gradually implemented learning by the experience • characterized by the implementation of a complex translation process of administrative information into statistical data • continuously updated because of the evolution of the social security legislation • consists of a systematic sequence of checks and editing steps which should assure the quality of indicators produced • turns out to be reliable both in terms of effectiveness (quality of the entire process) and efficiency (relatively limited time consuming and low use of human and economic resources) Less “standardizable” than the E&I process of a traditional survey? Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**Final remarks Although the E&I process of a survey based on administrative data is very source-specific, what have welearnt from the Oros experience? very selective has to be the E&I process but still interactive (only partially automatic) Greater is the size file Grater is the disaggregation of the source more detailed checks are necessary and more E&I steps have to be developped higher attention and human sources have to be used to monitor frequent modifications Greater is the complexity of the nature of data and the frequence of metadata changes Link with administrative institutions are fundamental, nevertheless in-house metadata database and ad hoc procedures have to be built Vienna, Austria, 21 – 23 April 2008**Work Session on Statistical Data Editing**References Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D. (2004) Use of Administrative Data to produce Short Term Statistics on Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome. Caporello G., Maravall A. (2002) A tool for quality control of time series data. Program TERROR. Bank of Spain. Istat (2006) Rilevazione mensile sull’occupazione, gli orari di lavoro e le retribuzioni nelle grandi imprese, Metodi e Norme n.29, Roma. Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, available on the web site: http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_EDIMBUS Thank you for your attention Silvia Pacini pacini@istat.it Vienna, Austria, 21 – 23 April 2008