1 / 13

On Privacy-Preserving Utility-Based Statistical Disclosure Limitation Methods

On Privacy-Preserving Utility-Based Statistical Disclosure Limitation Methods. Daniela Ichim. Dissemination of microdata files Confidentiality issues Quality/utility issues Strategies to balance confidentiality and utility. Outline. Official statistics Dissemination potfolio Purposes

Download Presentation

On Privacy-Preserving Utility-Based Statistical Disclosure Limitation Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Privacy-Preserving Utility-Based Statistical Disclosure Limitation Methods Daniela Ichim

  2. Dissemination of microdata files Confidentiality issues Quality/utility issues Strategies to balance confidentiality and utility Outline

  3. Official statistics Dissemination potfolio Purposes educational research policy-making Examples – MFR, PUF (European Statistical law) Consider both confidentiality and utility Dissemination at European level: comparability Microdata release

  4. Official statistics Law: “confidentiality of any statistical unit should not be breached when account is taken of all relevant means that might reasonably be used” There is a law: risk definition + risk assessment + protection Privacy is an individual concept Risk management not risk avoidance evaluate risk in realistic scenarios apply risk reduction methods ... design different microdata releases consider the needs of the counterpart: users Confidentiality Issues

  5. Local/global recoding Local/global suppression Subsampling Top/Bottom coding Rounding Adding noise Swapping Shuffling Microaggregation Post-Randomization (PRAM) IPSO ROMM Model-based perturbation methods … Guided by utility measures!!! Statistical disclosure limitation methods

  6. No law Utility depends on survey user the most important variables (users and survey) Measure it w.r.t. original data Utility is an aggregate concept Main approaches (no widely accepted measure): Mathematical approach Statistical approach Analyses (model) based Data utility

  7. Information theory (Shannon) Entropy change Kullback-Leiber divergence Issues: Rigorous formulation (Willenborg) High implementation costs How users perceive these measures? How these measures relate to the analyses? Data utility:mathematical approach Distribution before protection Distribution after protection

  8. Continuous vars: e.g.Hellinger distance e.g.Total variance distance Goodness of fit tests (Qualitative indication) e.g. Kolmogorov-Smirnov Categorical vars: Measures of association (Concordant and Discordant paris) e.g. Gamma e.g. Goodman Issues: Statistical foundations Very easy to implement How should/could be used by analysts? Data utility:statistical approach

  9. CI for the protected data CI for the original data Data utility:analyses • The idea is to simulate how data would be used in future analyses. • Compare the two analyses (one using the original data and the other using protected data) • In case the difference is acceptable, the information loss is considered negligible. • Example (Karr) • difficult to imagine all possible data analyses • multiple outputs from a single analysis • Collaboration with expert users!!!

  10. Dissemination strategies • Microdata  apply a SDL method  evaluate utility • Microdata  risk assessment  apply a SDL method to reduce the risk, maintaining some utility  measure utility Business Process Model: User Needs -> Design->Build ->Collect-> Process->Analyse->Disseminate

  11. Utility-based SDL methods • Adding noise • ROMM • IPSO • Data swapping • Priority-driven approaches • Model-based perturbations • Regression • Classification • Descriptive statistics • …

  12. Utility-based SDL methods • Advantage: • Utility indicators = minimum standards • Comparable dissemination • Harmonised dissemination (EU level) • Identify user needs - > apply flexible SDL methods - > deliver quality

  13. THANK YOU!

More Related