Protecting the confidentiality of tables by adding noise to the underlying microdata
Download
1 / 32

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata. Paul Massell and Jeremy Funk Statistical Research Division U.S. Census Bureau Washington, DC 20233 [email protected] Talk Outline. Overview of EZS Noise

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata' - abigail-puckett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Protecting the confidentiality of tables by adding noise to the underlying microdata

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata

Paul Massell and Jeremy Funk

Statistical Research Division

U.S. Census Bureau

Washington, DC 20233

[email protected]


Talk outline
Talk Outline the Underlying Microdata

  • Overview of EZS Noise

  • Measuring Effectiveness of Perturbative Protection

  • Noise Applied to Weighted Data

  • Noise Applied to Unweighted Data: Random vs. Balanced Noise

  • Conclusions and Future Research


The ezs noise method evans zayatz slanta
The EZS Noise Method the Underlying Microdata (Evans, Zayatz, Slanta)

  • Developed by Tim Evans, Laura Zayatz, and John Slanta in the 1990’s

  • Multiplicative noise is added to the underlying microdata, before table creation

  • A noise factor or multiplier is randomly generated for each record


The EZS Noise Method the Underlying Microdata (Evans, Zayatz, Slanta)

  • The distribution of the multipliers should produce unbiased estimates, and ensure that no multipliers are too close to 1

  • Weights both known and unknown to users are combined with the noise factors to obtain ‘noisy’ values for all records

  • When tabulated, in general, sensitive cells are changed quite a bit and non-sensitive cells are changed only by a small amount


Attractive Features of EZS the Underlying Microdata

  • Tables with noisy data are created in

  • the same way as the original tables:

  • simply: replace var X with var X-noisy

  • Tables are automatically additive

  • An approximate value could be released for every cell

  • (depends on agency policy)

  • No Complementary Suppressions


Attractive Features of EZS the Underlying Microdata

  • Linked tables and special tabs are automatically protected consistently

  • EZS allows for protection at the company level (Census requirement)

  • Ease of implementation compared to methods such as cell suppression


Measuring effectiveness of the ezs method
Measuring Effectiveness the Underlying Microdataof the EZS Method

  • Step 1: Determine which cells in a table are sensitive – e.g., using p% Sensitivity Rule

  • Step 2: Measure level of protection to sensitive cells (using protection multipliers)

  • Step 3: Measure amount of perturbation to non-sensitive cells (via % change graph)


The p sensitivity rule
The p% Sensitivity Rule the Underlying Microdata

  • Unweighted Data:

  • Let T = cell total ; x1, x2 top 2 contributions

  • Let ‘rem’ denote remainder

  • Set rem = T – (x1 + x2)

  • Let ‘prot’ denote suggested protection

  • Set prot = (p/100) * x1 – rem

  • if prot > 0, when Contributor 2 tries to

  • estimate x1, rem does NOT provide enough uncertainty ; additional protection is needed; noise may provide this uncertainty


P sensitivity rule
p% Sensitivity Rule the Underlying Microdata

  • Weighted Data:

  • TA = Fully Weighted Cell Estimate

  • X1 = Largest Cell Respondent Contribution

  • X2 = 2nd Largest Cell Contribution

  • wkn = Known Weights

  • wun = Unknown Weights


Extended p rule w weights rounding
Extended p% rule w. weights & rounding the Underlying Microdata

  • rem = TA – (X1 * wkn1 + X2 * wkn2 )

  • prot = ( (p/100) * X1 * wkn1 ) – rem


Measuring the effectiveness of a perturbative protection method
Measuring the Effectiveness of a Perturbative Protection Method

  • Protection of Sensitive Cells :

  • Define Protection Multiplier (PM)

  • PM = abs (perturbation) / prot

  • Find how many (or %) have PM < 1

  • Data Quality:

  • Important: % change for non-sensitive cells

  • Less important: % over-pertubation for

  • sensitive cells


Ezs noise factors for unweighted data
EZS Noise Factors for Unweighted Data Method

  • Let X = original microdata value

  • Let Y = perturbed value

  • Let M = noise multiplier; i.e. a draw from a specified noise distribution of EZS type

  • Y = X * M



Noise applied to weighted data
Noise Applied to Weighted Data Method

  • Key idea: weights (e.g., sample weights)

  • provide protection to microdata since users typically “know” weights only roughly (except when close to 1)

  • Not necessary to apply full M factor to X unless w = 1


Ezs noise factor for weighted data
EZS Noise Factor for Weighted Data Method

  • Weighted Data:

  • For a simple weight w with associated uncertainty interval at least as wide as 2*b*w

  • the noise factor S can be combined with w to

  • form the Joint Noise-Weight Factor


Noise formula for known and unknown weights
Noise Formula for Known and Unknown Weights Method

  • Calculation of Perturbed Values:

  • wkn is the known weight

  • wun is the unknown weight.


Noise for weighted data commodity flow survey cfs
Noise for Weighted Data: MethodCommodity Flow Survey (CFS)

  • Measures flow of goods via transport system in U.S.

  • Estimates volume and value of each commodity shipped: by origin, destination, modes of transport

  • Used for transport modeling, planning, ... Some users have objected to disclosure suppressions


Effect of noise on high level aggregate cells
Effect of Noise on MethodHigh Level Aggregate Cells

  • CFS Table: National 2-DigitCommodityData Quality Measure: 43 cells; 0 are sensitive

  • 41 cells change by [0 - 1] %

  • 2 cells change by [1 - 2] %


Cfs test table
CFS Test Table Method

  • (Origin State by Destination State by 2 digit Commodity)

  • 61,174 cells of which 230 are sensitive

  • Data Quality and Protection Assessments

  • (following slides)


Cfs noise results data quality assessment
CFS Noise Results MethodData Quality Assessment

  • While some cells may receive large doses of noise, vast majority get less than 1% or 2%


Cfs random noise protection assessment
CFS Random Noise MethodProtection Assessment

  • Most sensitive cells receive significant noise, i.e. 5% to 11%

  • Only 2 out of 230 sensitive cells do not receive full protection from noise, as measured by Protection Multipliers (PM)


Noise for unweighted data non employers statistics
Noise for Unweighted Data MethodNon-Employers Statistics

  • Special Features of Microdata

  • Unweighted adminstrative data

  • Only 1 variable to protect: receipts

  • Many small integers (after rounding to $1000)

  • Special Features of Key Table

  • Many cells have a small number of contributors; these include many safe cells

  • Many sensitive cells with only 1 or 2 contributors


Ne noise results data quality assessment
NE Noise Results MethodData Quality Assessment

  • Lack of weights results in much more distortion to non-sensitive cells than occurs for CFS


Ne noise results protection assessment
NE Noise Results MethodProtection Assessment

  • Resembles noise factor distribution, due to prevalence of 1 respondent cells in NE test table and no weights


Noise balancing
Noise Balancing Method

  • Is there a way to improve data quality in this situation?

  • Yes, if one can focus on one key table T

  • Idea: balance noise at each cell in ‘balancing sub-table B of T ’ (defn: every micro value is in at most one cell of B)

  • Choose noise directions to maximize noise cancellation for each cell of B


Noise balancing supportive ne characteristics
Noise Balancing MethodSupportive NE Characteristics

  • Balancing works especially well for NE because a high % of microdata is single unit

  • After balancing interior cells, need to check noise effect on aggregate cells in same table

  • Also need to check noise effect in higher and lower tables; these we call “trickle up” and “trickle down” effects

  • For NE, there are few of these other tables;

  • this makes balancing decision easier


Ne balanced noise data quality assessment
NE – Balanced Noise MethodData Quality Assessment

  • Vast improvement in data quality

  • Resembles that of weighted data in CFS


Ne balanced noise protection assessment
NE – Balanced Noise MethodProtection Assessment

  • Very similar to Random Noise application

  • 91.7% of sensitive cells fully protected


Random noise vs balanced noise non employer test data
Random Noise vs. Balanced Noise MethodNon Employer Test Data

  • Data Quality is greatly improved

  • Protection Level is not significantly reduced

  • Thus Balanced Noise is a Good Choice Here

PM density curves on [0,1] are nearly identical for 2 methods


Conclusions
Conclusions Method

  • Conclusions:

  • EZS Noise is a useful method for protecting tables from a variety of economic programs

  • There are now several variations of the basic EZS method ; which is best for a survey depends on both microdata and table characteristics


Future research
Future Research Method

  • 1. Should some sensitive cells be suppressed; high noise cells flagged ?

  • 2. How to handle multiple variables ?

  • 3. What is the most that users can be told about noise process without compromising data protection ?

  • 4. How to handle company dynamics (births, deaths, mergers, ….) ?

  • 5. How to coordinate survey protection ?


ad