1 / 12

G-Confid: Turning the tables on disclosure risk

G-Confid: Turning the tables on disclosure risk. Joint UNECE/ Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013. Peter Wright. G-Confid: a cell suppression application. Use with any table size and any number of dimensions

nico
Download Presentation

G-Confid: Turning the tables on disclosure risk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter Wright

  2. G-Confid: a cell suppression application • Use with any table size and any number of dimensions (subject to hardware / memory limitations) • Available for SAS 9.2 and 9.3; SAS EG 4.3 and 5.1 Overview by component • PROC SENSITIVITY identifies sensitivecells • Highlights, inputs, strategies • Macro SUPPRESS creates a suppression pattern • Inputs, outputs, strategies • Macro AUDIT audits a suppression pattern

  3. PROC SENSITIVITY identifies confidential cells Highlights: • Choice of sensitivity rule: p-percent, (n,k), arbitrary • Allows multipledecomposition where

  4. Inputs for PROC SENSITIVITY • Definition of hierarchy(ies) for each table dimension • Microdata file • Classification variables (e.g., geography, industry) • Enterprise identifier • Enterprise value Tip: to reduce the sensitivity of a cell by the value of an enterprise, set the enterprise identifier = missing

  5. Example of SAS code to run PROC SENSITIVITY proc sensitivity data=microfile outconstraint=consfileoutcell=cellfile outlargest=largestfile hierarchy="0 East West; 0 1 2 3;" srule=“pq .20" range=“East A B: West C D; 1 101 201 301: 2 102 202 302: 3 103 203 303;" minresp=5; idEnterpriseid; var Income; dimensionEastWest Industry; run;

  6. Strategies using PROC SENSITIVITY • Use the MINRESP=r option to set the minimum number of respondents • Any cell with fewer than r respondents is assigned a sensitivity of max{1, S} where S is the sensitivity of the cell • Only positive (>0) values are counted as respondents • MINRESP rule is ignored for a cell with a value contributed by an anonymous enterprise • Note: we can use MINRESP without applying a sensitivity rule

  7. Strategies using PROCSENSITIVITY (continued) • To reduce oversuppression, apply rules that make use of sampling weights Example: if the sampling weight wi>3, make the enterprise anonymous (set ID value=missing). G-Confid will use its contribution to reduce the sensitivity of the cell. Find more strategies in: Tambay and Fillion (Proceedings of the JSM 2013)

  8. Macro SUPPRESS – complementary suppression • Uses the SAS/OR® LP solver • Input files: (i) cell sensitivities file, and (ii) linear constraints file • Syntax:%Suppress(InCell=, Constraint=, CFunction1=, CFunction2=, CVar1=, CVar2=, OutCell=, ByVars=, OutComplement=, ScaleCost=); • Output file has final status (Suppress, Publish) and the net variation (largest amount the cell was “moved”)

  9. Strategies using the macro SUPPRESS • Choice of cost functions (functions of cell total) • Can run the LP process twice to reduce the number of suppressions (e.g., SIZE or DIGITS, then INFORMATION) • Can favour publishing certain cells by defining higher cost values (by default, cost=tot) SIZE (=tot) DIGITS (=log[tot+1]) CONSTANT (=1) INFORMATION (=log[tot+1]/[tot+1])

  10. Macro AUDIT – validates a suppression pattern • Calculates minimum and maximum values for each suppressed cell using LP solver • Provides results for each cell (protection achieved, not achieved, or exact disclosure) • Coming soon: pre-set narrower starting intervals than the default values (0.5tot and 1.5tot) using the Shuttle algorithm (Buzzigoli and Giusti (2006)) Using the Shuttle algorithm to pre-set the starting intervals ↓ run time

  11. Conclusion • PROC SENSITIVITY • Use pre-defined or customized sensitivity rule • Can do multiple decomposition • MINRESP function • Can apply weighting strategies • Macro SUPPRESS • Can favour cells to publish (or suppress) • Macro AUDIT Coming soon: additive controlled rounding

  12. For more information,  Pour plus d’information, please contact: veuillezcontacter : Peter Wright Peter.Wright@statcan.gc.ca

More Related