Loading in 2 Seconds...

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata

Loading in 2 Seconds...

- 78 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata' - abigail-puckett

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata

Paul Massell and Jeremy Funk

Statistical Research Division

U.S. Census Bureau

Washington, DC 20233

Talk Outline

- Overview of EZS Noise
- Measuring Effectiveness of Perturbative Protection
- Noise Applied to Weighted Data
- Noise Applied to Unweighted Data: Random vs. Balanced Noise
- Conclusions and Future Research

The EZS Noise Method (Evans, Zayatz, Slanta)

- Developed by Tim Evans, Laura Zayatz, and John Slanta in the 1990’s
- Multiplicative noise is added to the underlying microdata, before table creation
- A noise factor or multiplier is randomly generated for each record

The EZS Noise Method (Evans, Zayatz, Slanta)

- The distribution of the multipliers should produce unbiased estimates, and ensure that no multipliers are too close to 1
- Weights both known and unknown to users are combined with the noise factors to obtain ‘noisy’ values for all records
- When tabulated, in general, sensitive cells are changed quite a bit and non-sensitive cells are changed only by a small amount

Attractive Features of EZS

- Tables with noisy data are created in
- the same way as the original tables:
- simply: replace var X with var X-noisy
- Tables are automatically additive
- An approximate value could be released for every cell
- (depends on agency policy)
- No Complementary Suppressions

Attractive Features of EZS

- Linked tables and special tabs are automatically protected consistently
- EZS allows for protection at the company level (Census requirement)
- Ease of implementation compared to methods such as cell suppression

Measuring Effectiveness of the EZS Method

- Step 1: Determine which cells in a table are sensitive – e.g., using p% Sensitivity Rule
- Step 2: Measure level of protection to sensitive cells (using protection multipliers)
- Step 3: Measure amount of perturbation to non-sensitive cells (via % change graph)

The p% Sensitivity Rule

- Unweighted Data:
- Let T = cell total ; x1, x2 top 2 contributions
- Let ‘rem’ denote remainder
- Set rem = T – (x1 + x2)
- Let ‘prot’ denote suggested protection
- Set prot = (p/100) * x1 – rem
- if prot > 0, when Contributor 2 tries to
- estimate x1, rem does NOT provide enough uncertainty ; additional protection is needed; noise may provide this uncertainty

p% Sensitivity Rule

- Weighted Data:
- TA = Fully Weighted Cell Estimate
- X1 = Largest Cell Respondent Contribution
- X2 = 2nd Largest Cell Contribution
- wkn = Known Weights
- wun = Unknown Weights

Extended p% rule w. weights & rounding

- rem = TA – (X1 * wkn1 + X2 * wkn2 )
- prot = ( (p/100) * X1 * wkn1 ) – rem

Measuring the Effectiveness of a Perturbative Protection Method

- Protection of Sensitive Cells :
- Define Protection Multiplier (PM)
- PM = abs (perturbation) / prot
- Find how many (or %) have PM < 1
- Data Quality:
- Important: % change for non-sensitive cells
- Less important: % over-pertubation for
- sensitive cells

EZS Noise Factors for Unweighted Data

- Let X = original microdata value
- Let Y = perturbed value
- Let M = noise multiplier; i.e. a draw from a specified noise distribution of EZS type
- Y = X * M

Noise Distribution used for all examples:

- (a=1.05, b=1.15) 5% to 15% noise

Noise Applied to Weighted Data

- Key idea: weights (e.g., sample weights)
- provide protection to microdata since users typically “know” weights only roughly (except when close to 1)
- Not necessary to apply full M factor to X unless w = 1

EZS Noise Factor for Weighted Data

- Weighted Data:
- For a simple weight w with associated uncertainty interval at least as wide as 2*b*w
- the noise factor S can be combined with w to
- form the Joint Noise-Weight Factor

Noise Formula for Known and Unknown Weights

- Calculation of Perturbed Values:
- wkn is the known weight
- wun is the unknown weight.

Noise for Weighted Data:Commodity Flow Survey (CFS)

- Measures flow of goods via transport system in U.S.
- Estimates volume and value of each commodity shipped: by origin, destination, modes of transport
- Used for transport modeling, planning, ... Some users have objected to disclosure suppressions

Effect of Noise on High Level Aggregate Cells

- CFS Table: National 2-DigitCommodityData Quality Measure: 43 cells; 0 are sensitive
- 41 cells change by [0 - 1] %
- 2 cells change by [1 - 2] %

CFS Test Table

- (Origin State by Destination State by 2 digit Commodity)
- 61,174 cells of which 230 are sensitive
- Data Quality and Protection Assessments
- (following slides)

CFS Noise ResultsData Quality Assessment

- While some cells may receive large doses of noise, vast majority get less than 1% or 2%

CFS Random NoiseProtection Assessment

- Most sensitive cells receive significant noise, i.e. 5% to 11%
- Only 2 out of 230 sensitive cells do not receive full protection from noise, as measured by Protection Multipliers (PM)

Noise for Unweighted DataNon-Employers Statistics

- Special Features of Microdata
- Unweighted adminstrative data
- Only 1 variable to protect: receipts
- Many small integers (after rounding to $1000)
- Special Features of Key Table
- Many cells have a small number of contributors; these include many safe cells
- Many sensitive cells with only 1 or 2 contributors

NE Noise ResultsData Quality Assessment

- Lack of weights results in much more distortion to non-sensitive cells than occurs for CFS

NE Noise ResultsProtection Assessment

- Resembles noise factor distribution, due to prevalence of 1 respondent cells in NE test table and no weights

Noise Balancing

- Is there a way to improve data quality in this situation?
- Yes, if one can focus on one key table T
- Idea: balance noise at each cell in ‘balancing sub-table B of T ’ (defn: every micro value is in at most one cell of B)
- Choose noise directions to maximize noise cancellation for each cell of B

Noise BalancingSupportive NE Characteristics

- Balancing works especially well for NE because a high % of microdata is single unit
- After balancing interior cells, need to check noise effect on aggregate cells in same table
- Also need to check noise effect in higher and lower tables; these we call “trickle up” and “trickle down” effects
- For NE, there are few of these other tables;
- this makes balancing decision easier

NE – Balanced NoiseData Quality Assessment

- Vast improvement in data quality
- Resembles that of weighted data in CFS

NE – Balanced NoiseProtection Assessment

- Very similar to Random Noise application
- 91.7% of sensitive cells fully protected

Random Noise vs. Balanced NoiseNon Employer Test Data

- Data Quality is greatly improved
- Protection Level is not significantly reduced
- Thus Balanced Noise is a Good Choice Here

PM density curves on [0,1] are nearly identical for 2 methods

Conclusions

- Conclusions:
- EZS Noise is a useful method for protecting tables from a variety of economic programs
- There are now several variations of the basic EZS method ; which is best for a survey depends on both microdata and table characteristics

Future Research

- 1. Should some sensitive cells be suppressed; high noise cells flagged ?
- 2. How to handle multiple variables ?
- 3. What is the most that users can be told about noise process without compromising data protection ?
- 4. How to handle company dynamics (births, deaths, mergers, ….) ?
- 5. How to coordinate survey protection ?

Download Presentation

Connecting to Server..