The argus software of the sdc project
Download
1 / 35

The ARGUS Software of the SDC-project - PowerPoint PPT Presentation


  • 441 Views
  • Uploaded on

The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999 Statistical Disclosure Control the balance between the need for (more and more) information and the privacy of the respondents Statistical Disclosure Control

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The ARGUS Software of the SDC-project' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The argus software of the sdc project l.jpg

The ARGUS Software of the SDC-project

Anco Hundepool

Statistics Netherlands

Washington, August 1999


Statistical disclosure control l.jpg
Statistical Disclosure Control

  • the balance between the need for (more and more) information and

  • the privacy of the respondents


Statistical disclosure control3 l.jpg
Statistical Disclosure Control

  • Need for detailed micro data files

    • electronic publications

    • computing power of users

  • Need for more detailed tables

    But....!!!


Statistical disclosure control4 l.jpg
Statistical Disclosure Control

  • Protection of privacy of respondents

    persons, enterprises, institutions

  • Respondents must be able to trust Statistical Offices!

  • Risks:

    • Intruders/ hackers

    • Accidental recognition

    • Advanced record linkage techniques


Statistical disclosure control5 l.jpg
Statistical Disclosure Control

  • Produce ‘safe’ datafiles and tables

  • Apply data modification techniques

  • Preserve as much information

    Implemented in ARGUS!


Framework of development of argus l.jpg
Framework of developmentof ARGUS

  • SDC project

  • partly subsidised by EU (4th Framework)

  • Co-operation between The Netherlands, Italy (+Spain) and UK


General aims of sdc project l.jpg
General aims of SDC project

  • Methodological research in SDC

    • microdata, tables

    • concerning statistics, OR

    • geographical data

  • (general) SDC Software development

    • microdata (m-ARGUS)

    • tables (t-ARGUS)


Sdc project members l.jpg
SDC project members

  • Netherlands

    • CBS (ARGUS)

    • TU-Eindhoven (OR for microdata)

  • Italy

    • Istat (with Univ. of Rome)(Research/testing)

    • CPR-Padova (with Univ. Tenerife)(OR for tabular data)


Sdc project members9 l.jpg
SDC project members

  • UK

    • ONS (data)

    • Univ. Manchester (with Univ. of Southampton)(Research on SARs)

    • Univ. Of Leeds (Geographical data)


Main software developed in sdc project l.jpg
Main software developed in SDC-project

  • m-ARGUS (CBS and TUE)

    • micro data

  • t-ARGUS (CBS and CPR)

    • tabular data


Ideas of m argus l.jpg
Ideas of m-ARGUS

  • Intruder uses information of identifying variables (e.g. region, sex, age, education, occupation) to identify records.

  • This leads to the sensitive information


M argus l.jpg
m-ARGUS

  • Levels of protection

    • public use files (PUF)

    • micro files for researchers (MUC)universities, contract etc.

    • safe-setting


Ideas of m argus13 l.jpg
Ideas of m-ARGUS

  • a list of combinations of identifying variables must checked

  • find value combinations that are unsafe

    • e.g. |a x b x c| <= threshold

    • threshold depends on level of protection

      • Public use files

      • Micro data for researchers (contract)


Ideas of m argus14 l.jpg
Ideas of m-ARGUS

  • eliminate the unsafe combinations

    • by global recoding (age -> agegroup, region -> province)

    • local suppression (imputing missings)

    • interactively/automatically

  • with minimum information loss (entropy)


M argus15 l.jpg
m-ARGUS

  • For microdata

  • Developed in Borland C++

  • Windows-95/98

  • Version 3.0 last SDC-version

    • interactive/automatic global recoding

    • automatic local suppression


Features of m argus l.jpg
Features of m-ARGUS

  • can handle large microdata files

    • only tables derived from microdata are being used

  • flexible global recoding

  • options for automatic mix of global recoding and local suppression (TU Eindhoven)


Addit features of m argus l.jpg
Addit. features of m-ARGUS

  • Micro-aggregation

  • Top/Bottom coding

  • Rounding


Slide18 l.jpg

m-ARGUS

metadata

microdata

Generate tables

Recoding

schemes

Global recoding

Local suppression

Micro aggregation

Top/bottom coding

Rounding

Report

metadata

microdata


M argus input data l.jpg
m-ARGUS input data

  • Data: Fixed format ASCII

  • Metadata

    • Name

    • Position

    • Missing values (2)

    • Identification level

    • Hierarchical coding

    • Codelist (opt.)


Using m argus l.jpg
Using m-ARGUS

  • reading data file

  • generating tables

  • apply global recodes

  • local suppression

  • generate safe file

  • generate report


T argus l.jpg

t-ARGUS


Ideas of t argus l.jpg
Ideas of t-ARGUS

  • identification of sensitive cellsusing e.g. dominance rule

    • at least n (e.g. 2) contributors to a cell

    • sum of largest 3 contributors >= 75%(one large contributor could recalculate the contribution of its competitor)

  • easy part


Ideas of t argus23 l.jpg
Ideas of t-ARGUS

  • Eliminate/protect sensitive cells(hard part)

  • by applying SDC techniques

    • table redesign

    • cell suppression

    • rounding

    • interactively and/or automatically

  • with minimum information loss (e.g. cell weights)


Ideas of t argus24 l.jpg
Ideas of t-ARGUS

  • cell suppression in tables with marginals

  • identify primary sensitive cells

  • protect primary cells by suppressing additional (secondary) cells to prevent recalculation (to some approximation)

  • with minimal information loss (CPR)


T argus25 l.jpg
t-ARGUS

  • 3-D tables

  • interactive table redesign

  • primary & secondary cell suppression

  • optimisation routines for automatic cell suppression

  • rounding


Slide26 l.jpg

t-ARGUS

metadata

microdata

tabulation

codelists

redesign

rounding

suppression

report

Safe table


Features of t argus l.jpg
Features of t-ARGUS

  • Initial run through microdata

    • Determine also top k per cell ->sensitive cells

    • Table redesign possible without going back to microdata

  • Uses procedures for secondary cell suppression using state-of-the optimisation algorithms (CPR)

  • Prepared for linked tables


T argus28 l.jpg
t-ARGUS

  • Data: fixed format ASCII

  • Meta data:

    • Variable name

    • Start. position

    • Field length

    • Status


T argus29 l.jpg
t-ARGUS

  • Apply global recoding

  • Protect file with secondary suppression

  • Rounding

  • Safe table as ASCII or .WK1(plus report)


T argus30 l.jpg
t-ARGUS

  • Version 2.0 final SDC-version

  • requires commercial OR-solver(Xpress by Dash, UK, 600 GBP)


Future casc l.jpg
Future / CASC

  • Computational Aspects of Statistical Confidentiality

  • New European project-proposal(2000-2002)

    • Extending ARGUS

    • New research

  • Additional joint USA/EU-project?


Casc m l.jpg
CASC-m

  • Concentration on business/economic data

    • microaggregation

    • PRAM

    • Noise-addition/ masking


Casc t l.jpg
CASC-t

  • Hierarchical tables

  • Linked tables

  • Optimal solution vz. heuristics

  • Different input formats


Casc team l.jpg
CASC-team

  • Statistics Netherlands

  • Istat (Italy)

  • ONS, Univ. Southampton, Manchester, London, Plymouth (UK)

  • Bundesambt, IAB (Germany)

  • Stat. Catalunya, Univ Tenerife (Spain)


Contact l.jpg
Contact

  • Anco Hundepool

  • Statistics Netherlands

  • PO box 4000

  • 2200 JM Voorburg

  • The Netherlands

  • email [email protected]

  • fax: +31 70 3375990

  • phone: +31 70 3375038


ad