the argus software of the sdc project
Download
Skip this Video
Download Presentation
The ARGUS Software of the SDC-project

Loading in 2 Seconds...

play fullscreen
1 / 35

The ARGUS Software of the SDC-project - PowerPoint PPT Presentation


  • 450 Views
  • Uploaded on

The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999 Statistical Disclosure Control the balance between the need for (more and more) information and the privacy of the respondents Statistical Disclosure Control

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The ARGUS Software of the SDC-project' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the argus software of the sdc project

The ARGUS Software of the SDC-project

Anco Hundepool

Statistics Netherlands

Washington, August 1999

statistical disclosure control
Statistical Disclosure Control
  • the balance between the need for (more and more) information and
  • the privacy of the respondents
statistical disclosure control3
Statistical Disclosure Control
  • Need for detailed micro data files
    • electronic publications
    • computing power of users
  • Need for more detailed tables

But....!!!

statistical disclosure control4
Statistical Disclosure Control
  • Protection of privacy of respondents

persons, enterprises, institutions

  • Respondents must be able to trust Statistical Offices!
  • Risks:
    • Intruders/ hackers
    • Accidental recognition
    • Advanced record linkage techniques
statistical disclosure control5
Statistical Disclosure Control
  • Produce ‘safe’ datafiles and tables
  • Apply data modification techniques
  • Preserve as much information

Implemented in ARGUS!

framework of development of argus
Framework of developmentof ARGUS
  • SDC project
  • partly subsidised by EU (4th Framework)
  • Co-operation between The Netherlands, Italy (+Spain) and UK
general aims of sdc project
General aims of SDC project
  • Methodological research in SDC
    • microdata, tables
    • concerning statistics, OR
    • geographical data
  • (general) SDC Software development
    • microdata (m-ARGUS)
    • tables (t-ARGUS)
sdc project members
SDC project members
  • Netherlands
    • CBS (ARGUS)
    • TU-Eindhoven (OR for microdata)
  • Italy
    • Istat (with Univ. of Rome)(Research/testing)
    • CPR-Padova (with Univ. Tenerife)(OR for tabular data)
sdc project members9
SDC project members
  • UK
    • ONS (data)
    • Univ. Manchester (with Univ. of Southampton)(Research on SARs)
    • Univ. Of Leeds (Geographical data)
main software developed in sdc project
Main software developed in SDC-project
  • m-ARGUS (CBS and TUE)
    • micro data
  • t-ARGUS (CBS and CPR)
    • tabular data
ideas of m argus
Ideas of m-ARGUS
  • Intruder uses information of identifying variables (e.g. region, sex, age, education, occupation) to identify records.
  • This leads to the sensitive information
m argus
m-ARGUS
  • Levels of protection
    • public use files (PUF)
    • micro files for researchers (MUC)universities, contract etc.
    • safe-setting
ideas of m argus13
Ideas of m-ARGUS
  • a list of combinations of identifying variables must checked
  • find value combinations that are unsafe
    • e.g. |a x b x c| <= threshold
    • threshold depends on level of protection
      • Public use files
      • Micro data for researchers (contract)
ideas of m argus14
Ideas of m-ARGUS
  • eliminate the unsafe combinations
    • by global recoding (age -> agegroup, region -> province)
    • local suppression (imputing missings)
    • interactively/automatically
  • with minimum information loss (entropy)
m argus15
m-ARGUS
  • For microdata
  • Developed in Borland C++
  • Windows-95/98
  • Version 3.0 last SDC-version
    • interactive/automatic global recoding
    • automatic local suppression
features of m argus
Features of m-ARGUS
  • can handle large microdata files
    • only tables derived from microdata are being used
  • flexible global recoding
  • options for automatic mix of global recoding and local suppression (TU Eindhoven)
addit features of m argus
Addit. features of m-ARGUS
  • Micro-aggregation
  • Top/Bottom coding
  • Rounding
slide18

m-ARGUS

metadata

microdata

Generate tables

Recoding

schemes

Global recoding

Local suppression

Micro aggregation

Top/bottom coding

Rounding

Report

metadata

microdata

m argus input data
m-ARGUS input data
  • Data: Fixed format ASCII
  • Metadata
    • Name
    • Position
    • Missing values (2)
    • Identification level
    • Hierarchical coding
    • Codelist (opt.)
using m argus
Using m-ARGUS
  • reading data file
  • generating tables
  • apply global recodes
  • local suppression
  • generate safe file
  • generate report
ideas of t argus
Ideas of t-ARGUS
  • identification of sensitive cellsusing e.g. dominance rule
    • at least n (e.g. 2) contributors to a cell
    • sum of largest 3 contributors >= 75%(one large contributor could recalculate the contribution of its competitor)
  • easy part
ideas of t argus23
Ideas of t-ARGUS
  • Eliminate/protect sensitive cells(hard part)
  • by applying SDC techniques
    • table redesign
    • cell suppression
    • rounding
    • interactively and/or automatically
  • with minimum information loss (e.g. cell weights)
ideas of t argus24
Ideas of t-ARGUS
  • cell suppression in tables with marginals
  • identify primary sensitive cells
  • protect primary cells by suppressing additional (secondary) cells to prevent recalculation (to some approximation)
  • with minimal information loss (CPR)
t argus25
t-ARGUS
  • 3-D tables
  • interactive table redesign
  • primary & secondary cell suppression
  • optimisation routines for automatic cell suppression
  • rounding
slide26

t-ARGUS

metadata

microdata

tabulation

codelists

redesign

rounding

suppression

report

Safe table

features of t argus
Features of t-ARGUS
  • Initial run through microdata
    • Determine also top k per cell ->sensitive cells
    • Table redesign possible without going back to microdata
  • Uses procedures for secondary cell suppression using state-of-the optimisation algorithms (CPR)
  • Prepared for linked tables
t argus28
t-ARGUS
  • Data: fixed format ASCII
  • Meta data:
    • Variable name
    • Start. position
    • Field length
    • Status
t argus29
t-ARGUS
  • Apply global recoding
  • Protect file with secondary suppression
  • Rounding
  • Safe table as ASCII or .WK1(plus report)
t argus30
t-ARGUS
  • Version 2.0 final SDC-version
  • requires commercial OR-solver(Xpress by Dash, UK, 600 GBP)
future casc
Future / CASC
  • Computational Aspects of Statistical Confidentiality
  • New European project-proposal(2000-2002)
    • Extending ARGUS
    • New research
  • Additional joint USA/EU-project?
casc m
CASC-m
  • Concentration on business/economic data
    • microaggregation
    • PRAM
    • Noise-addition/ masking
casc t
CASC-t
  • Hierarchical tables
  • Linked tables
  • Optimal solution vz. heuristics
  • Different input formats
casc team
CASC-team
  • Statistics Netherlands
  • Istat (Italy)
  • ONS, Univ. Southampton, Manchester, London, Plymouth (UK)
  • Bundesambt, IAB (Germany)
  • Stat. Catalunya, Univ Tenerife (Spain)
contact
Contact
  • Anco Hundepool
  • Statistics Netherlands
  • PO box 4000
  • 2200 JM Voorburg
  • The Netherlands
  • email [email protected]
  • fax: +31 70 3375990
  • phone: +31 70 3375038
ad