1 / 19

A handbook on validation methodology Marco Di Zio Istat

A handbook on validation methodology Marco Di Zio Istat. Workshop ValiDat Foundation – Wiesbaden, 10-11 November 2015. Underlying idea of the HB. Why a handbook on methodology for data validation? Standardization of language, of elements, provide common measures for evaluation…

nancywright
Download Presentation

A handbook on validation methodology Marco Di Zio Istat

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A handbook on validation methodologyMarco Di ZioIstat Workshop ValiDat Foundation – Wiesbaden, 10-11 November 2015

  2. Underlying idea of the HB • Why a handbook on methodology for data validation? • Standardization of language, of elements, provide common measures for evaluation… • establish a common reference framework and develop metrics for evaluating DV • The HB is composed of two main parts: • A generic framework for data validation • Discuss metrics to evaluate a validation procedure (tuning, evaluating the procedure..) ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  3. Generic framework for data validation The objective of this first section is to clarify • What • Why • How and … ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  4. Generic framework for data validation • Clearly establish the relation with other phases of the statistical production process and internationals standards as • GSBPM • GSDEMs • GSIM • Describe the data validation life cycle – useful for managing the data validation process ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  5. What is data validation… Definition • Data Validation is an activity verifying whether or not a combination of values is a member of a set of acceptable combinations. • not far from the Unece definition: An activity aimed at verifying whether the value of a data itemcomes from the given (finite or infinite) set of acceptable values • but essentially different… ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  6. What… • It is a decisional procedure ending with an acceptance or refusal of data as acceptable. • The decisional procedure is generally based on rules expressing the acceptable combinations of values. ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  7. Why do we perform data validation… • The purpose of data validation is to ensure a certain level of quality of the final data • but quality has several aspects. We clarified which aspects are related to DV • Essentially the ones related the ‘structure of the data’, that are accuracy, comparability, coherence. • But others are connected, e.g., timelines can be seen as a constraining factor ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  8. How to perform DV… Two main elements • Validation levels • to what extent a data set has been validated • Validation rules • Rules are applied to data, a failure of the rule implies that the corresponding validation level is not attained by the data at hand (decisional process: accept/not accept) ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  9. Validation levels They are related to the perspective of the ‘validator’ … In the HB: • Business perspective • Starting form the elements characterising usually the DV process (increasing information) • A formal approach • Looking a the elements characterizing a point in a statistical setting ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  10. Validation levels: business perspective ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  11. Validation levels: formal approach metadata aspects that are necessary to identify a data point, • The universe U from which a statistical object originates. (e.g., household, company,) • The time t of selecting an element u from the current population p(t) • The selected element u. This determines the value of variables X over time that may be observed. • The variable selected for measurement. ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  12. Data validation - GSDEMs • Generic Statistical Data Editing Models • statistical data editing composed of three different function types: Review, Selection and Amendment • The review functions are defined as: Functionsthatexamine the data to identifypotentialproblems. Thismay be by evaluatingformallyspecifiedqualitymeasures or editrules or by assessing the plausibility of the data in a lessformalsense, for instance by usinggraphicaldisplays ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  13. Data validation - GSDEMs • Among the GSDEMs different function categories there is ‘Review of data validity’ that is Functions that check the validity of data values against a specified range or a set of values and also the validity of specified combinations of values. Each check leads to a binary value (TRUE, FALSE) ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  14. Data Validation - GSBPM ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  15. Data validation life cycle ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  16. Second part of the document: Metrics • Evaluating validation procedure • …next presentation… ValiDat foundation workshop - Wiesbaden 10-11 November 2015

  17. Thanks for your attention ValiDat foundation workshop - Wiesbaden 10-11 November 2015

More Related