Evaluating the Quality of Editing and Imputation: the Simulation Approach

Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute UN/ECE Work Session on Statistical Data Editing Ottawa, 16-18 May 2005

Outline • Introduction • The simulation approach • Perfomance indicators • An example: the Istat software ESSE

Quality of E&I = Accuracy • accuracy atmicrolevel Capability of editing of correctly identifying errors / the capability of imputation of correctly recovering true data • accuracy atmacrolevel Capability of editing/imputation of preservingthe data distributions and target estimates The quality of E&I in terms of accuracy can be measured only when it is possible to compare the edited and imputed data with the corresponding true ones

Why evaluating the quality of E&I • Analysis of the performance of an editing/imputation method • for a specific type of data/error • under different data/error scenarios • Improve the performance of an editing/imputation method for a specifictype of data/error • Choose among alternative editing/imputation methods for a specifictype of data/error

? ? ? ? ? ? ? ? ? ? Localized errors Final values True values Observed (corrupted) values Error/missing mechanisms Imputation model Editing model The evaluation framework “E&I represent additional sources of non sampling errors in the statistical production process” (Super-population/ Finite populatoin)

Evaluating the quality of E&I • The evaluation of the qualityof editing and/or imputation has to be performed taking into account the other mechanisms involved in the statistical production process • This correspond to measuring the effects on data induced by the editing and/or the imputation mechanisms conditionally to the other mechanisms influencing the survey results

The simulation approach Artificial generation of some of the key elements of the evaluation framework based on predefined mechanisms/models • Controlled experiments • data distributions and data relations • error and missing data mechanisms • error and missing data incidence • Variability due to each stochastic mechanism(repeated simulations) • Low cost

The simulation approach • High modelling effort • true data • raw data

Simulation of true data Let (X1, …, Xp) be a random variable following the probability function F(x1, …, xp ; q ) • F(x1, …, xp;q)unknown • parametric approaches (specify a data model; estimate parameters; re-sampling techniques) • non parametric approaches (no assumptions; re-sampling techniques)

Simulation of true data Additional problems: • Modelling multivariate distributions (reproducing joint relations/dependencies between variables) • Modelling asymmetricmultivariate distributions • Modelling under edit constraints

Simulation of raw data Parametric/non parametric approaches: • Generating missing data • Generating errors (deviations from true data)

Simulation of missing data • Assumptions on non response mechanisms(MCAR, MAR, NMAR) • Assumptions on the incidence of non response (non response rates) • In multivariate contexts, modelling patternsof non response • Assumptions on multivariate non response mechanisms(e.g. independence) • Assumptions on rates of non response patterns

Simulation of errors • Assumptions on error mechanism (EAR, ECAR, ENAR) • Assumptions on the incidence of errors (error rates) • Assumptions on the intensity of errors (error magnitude; intermittent nature of errors) • In a multivariate context, modelling errorpatterns: • Assumptions on multivariate error mechanisms(e.g. independence) • Assumptions on rates of error patterns • Overlapping mechanisms (e.g. stochastic+systematic) • Simulation of errors under constraints

How to measure: evaluation indicators under the simulation approach • Evaluation objectives • Accuracy at micro level • Accuracy w.r.t. distributions and target estimates • Indicators • Level (micro/macro; local/global) • Identification • Priority

An Istat tool forevaluating E&I under the simulation approach ESSE (Editing Systems Standard Evaluation) system (SAS language + SAS/AF environment) • Module for raw data simulation • Module for evaluation

Module for raw data simulation • Approach: non parametric • Missing data mechanisms: MCAR, MAR and independent non responses • Error mechanisms: Completely At Random (ECAR) and independent errors (e.g. Misplacement errors, Interchange of values, Interchange errors,Loss or addition of zeroes,….)

Module for evaluation Assumptions • Editing is a classification procedure that assigns each raw value into one of two states: • (1) acceptable • (2) not acceptable • Imputation affects only values previously classified by the editing process as unacceptable. • Imputation is successful if the new assigned value is equal to the original one

Module for evaluation • Evaluation objective: assessing the accuracy of E&I at micro level (capability to detect as many errors as possible; capability to to restore the true values) • Evaluation approach: single applicationof E&I (no variability) • Evaluation level: micro level • Indicators: local indicators (hit rates) based on the number of detected, undetected, introduced and corrected errors

Future work at ISTAT • Identify standard measures to assess the accuracy of E&I at macro level • Simulating multivariate patterns of errors/missing values (dependent errors/non response) • Evaluating the impact of E&I on variability at micro/macro level

Evaluating the Quality of Editing and Imputation: the Simulation Approach

Evaluating the Quality of Editing and Imputation: the Simulation Approach

Presentation Transcript

Evaluating the Alignment and Quality of the Written Curriculum

Evaluating the Quality of Northern Ireland’s Democracy

SAS Enterprise Guide project for editing and imputation

Statistical Data Editing and Imputation

APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA

Evaluating the quality of vital statistics

Evaluating the Alignment and Quality of the Taught Curriculum

Evaluating the Quality of Online Programs

Evaluating the Biological Approach

Integrated Data Editing and Imputation

Quality Metrics for Assessing the Impact of Editing and Imputation on Economic Data

Evaluating the Quality of Research Papers

Evaluating the Quality of Health Care

Evaluating the Quality of Health Care

Evaluating the quality and use of Impact Assessments The role and approach of the NAO

THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS

Evaluating THE Biological Approach

Evaluating the quality of services

Study of Editing and Imputation Practices at Statistics Finland

THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS

DATA VALIDATION-I Evaluation of editing and imputation