The argus software of the sdc project
1 / 35

The ARGUS Software of the SDC-project - PowerPoint PPT Presentation

  • Uploaded on

The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999 Statistical Disclosure Control the balance between the need for (more and more) information and the privacy of the respondents Statistical Disclosure Control

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'The ARGUS Software of the SDC-project' - jana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The argus software of the sdc project l.jpg

The ARGUS Software of the SDC-project

Anco Hundepool

Statistics Netherlands

Washington, August 1999

Statistical disclosure control l.jpg
Statistical Disclosure Control

  • the balance between the need for (more and more) information and

  • the privacy of the respondents

Statistical disclosure control3 l.jpg
Statistical Disclosure Control

  • Need for detailed micro data files

    • electronic publications

    • computing power of users

  • Need for more detailed tables


Statistical disclosure control4 l.jpg
Statistical Disclosure Control

  • Protection of privacy of respondents

    persons, enterprises, institutions

  • Respondents must be able to trust Statistical Offices!

  • Risks:

    • Intruders/ hackers

    • Accidental recognition

    • Advanced record linkage techniques

Statistical disclosure control5 l.jpg
Statistical Disclosure Control

  • Produce ‘safe’ datafiles and tables

  • Apply data modification techniques

  • Preserve as much information

    Implemented in ARGUS!

Framework of development of argus l.jpg
Framework of developmentof ARGUS

  • SDC project

  • partly subsidised by EU (4th Framework)

  • Co-operation between The Netherlands, Italy (+Spain) and UK

General aims of sdc project l.jpg
General aims of SDC project

  • Methodological research in SDC

    • microdata, tables

    • concerning statistics, OR

    • geographical data

  • (general) SDC Software development

    • microdata (m-ARGUS)

    • tables (t-ARGUS)

Sdc project members l.jpg
SDC project members

  • Netherlands

    • CBS (ARGUS)

    • TU-Eindhoven (OR for microdata)

  • Italy

    • Istat (with Univ. of Rome)(Research/testing)

    • CPR-Padova (with Univ. Tenerife)(OR for tabular data)

Sdc project members9 l.jpg
SDC project members

  • UK

    • ONS (data)

    • Univ. Manchester (with Univ. of Southampton)(Research on SARs)

    • Univ. Of Leeds (Geographical data)

Main software developed in sdc project l.jpg
Main software developed in SDC-project

  • m-ARGUS (CBS and TUE)

    • micro data

  • t-ARGUS (CBS and CPR)

    • tabular data

Ideas of m argus l.jpg
Ideas of m-ARGUS

  • Intruder uses information of identifying variables (e.g. region, sex, age, education, occupation) to identify records.

  • This leads to the sensitive information

M argus l.jpg

  • Levels of protection

    • public use files (PUF)

    • micro files for researchers (MUC)universities, contract etc.

    • safe-setting

Ideas of m argus13 l.jpg
Ideas of m-ARGUS

  • a list of combinations of identifying variables must checked

  • find value combinations that are unsafe

    • e.g. |a x b x c| <= threshold

    • threshold depends on level of protection

      • Public use files

      • Micro data for researchers (contract)

Ideas of m argus14 l.jpg
Ideas of m-ARGUS

  • eliminate the unsafe combinations

    • by global recoding (age -> agegroup, region -> province)

    • local suppression (imputing missings)

    • interactively/automatically

  • with minimum information loss (entropy)

M argus15 l.jpg

  • For microdata

  • Developed in Borland C++

  • Windows-95/98

  • Version 3.0 last SDC-version

    • interactive/automatic global recoding

    • automatic local suppression

Features of m argus l.jpg
Features of m-ARGUS

  • can handle large microdata files

    • only tables derived from microdata are being used

  • flexible global recoding

  • options for automatic mix of global recoding and local suppression (TU Eindhoven)

Addit features of m argus l.jpg
Addit. features of m-ARGUS

  • Micro-aggregation

  • Top/Bottom coding

  • Rounding

Slide18 l.jpg




Generate tables



Global recoding

Local suppression

Micro aggregation

Top/bottom coding





M argus input data l.jpg
m-ARGUS input data

  • Data: Fixed format ASCII

  • Metadata

    • Name

    • Position

    • Missing values (2)

    • Identification level

    • Hierarchical coding

    • Codelist (opt.)

Using m argus l.jpg
Using m-ARGUS

  • reading data file

  • generating tables

  • apply global recodes

  • local suppression

  • generate safe file

  • generate report

T argus l.jpg


Ideas of t argus l.jpg
Ideas of t-ARGUS

  • identification of sensitive cellsusing e.g. dominance rule

    • at least n (e.g. 2) contributors to a cell

    • sum of largest 3 contributors >= 75%(one large contributor could recalculate the contribution of its competitor)

  • easy part

Ideas of t argus23 l.jpg
Ideas of t-ARGUS

  • Eliminate/protect sensitive cells(hard part)

  • by applying SDC techniques

    • table redesign

    • cell suppression

    • rounding

    • interactively and/or automatically

  • with minimum information loss (e.g. cell weights)

Ideas of t argus24 l.jpg
Ideas of t-ARGUS

  • cell suppression in tables with marginals

  • identify primary sensitive cells

  • protect primary cells by suppressing additional (secondary) cells to prevent recalculation (to some approximation)

  • with minimal information loss (CPR)

T argus25 l.jpg

  • 3-D tables

  • interactive table redesign

  • primary & secondary cell suppression

  • optimisation routines for automatic cell suppression

  • rounding

Slide26 l.jpg










Safe table

Features of t argus l.jpg
Features of t-ARGUS

  • Initial run through microdata

    • Determine also top k per cell ->sensitive cells

    • Table redesign possible without going back to microdata

  • Uses procedures for secondary cell suppression using state-of-the optimisation algorithms (CPR)

  • Prepared for linked tables

T argus28 l.jpg

  • Data: fixed format ASCII

  • Meta data:

    • Variable name

    • Start. position

    • Field length

    • Status

T argus29 l.jpg

  • Apply global recoding

  • Protect file with secondary suppression

  • Rounding

  • Safe table as ASCII or .WK1(plus report)

T argus30 l.jpg

  • Version 2.0 final SDC-version

  • requires commercial OR-solver(Xpress by Dash, UK, 600 GBP)

Future casc l.jpg
Future / CASC

  • Computational Aspects of Statistical Confidentiality

  • New European project-proposal(2000-2002)

    • Extending ARGUS

    • New research

  • Additional joint USA/EU-project?

Casc m l.jpg

  • Concentration on business/economic data

    • microaggregation

    • PRAM

    • Noise-addition/ masking

Casc t l.jpg

  • Hierarchical tables

  • Linked tables

  • Optimal solution vz. heuristics

  • Different input formats

Casc team l.jpg

  • Statistics Netherlands

  • Istat (Italy)

  • ONS, Univ. Southampton, Manchester, London, Plymouth (UK)

  • Bundesambt, IAB (Germany)

  • Stat. Catalunya, Univ Tenerife (Spain)

Contact l.jpg

  • Anco Hundepool

  • Statistics Netherlands

  • PO box 4000

  • 2200 JM Voorburg

  • The Netherlands

  • email [email protected]

  • fax: +31 70 3375990

  • phone: +31 70 3375038