Protecting Statistical Databases Against Snoopers

Protecting Statistical Databases Against Snoopers Comparison of two methods

Disclosure vs. Anonymity • Information disclosure necessary for planning and numerical measurements • Anonymity necessary for protection of the individual and the public’s trust in systems

Medical Data Necessary for: • Measuring effectiveness of current treatments • Finding sources of common medical mistakes • Tracking contagious disease • Government spending planning • Health Insurance Companies

Anonymity: Not as Easy as it Looks Complete Identification Without Uniquely Identifying Information

Outside Factors Affecting Privacy • Snooper’s supplementary knowledge • Public data sources • Rarity

Comparing Two Methods of Protection • What are the privacy guarantees? • Can useful information be gained?

Sensitivity-based Noise-adding Algorithm • Proposed by Dwork, McSherry, Nissim and Smith • Adds noise to each answer based on the sensitivity of the series of queries • Amount of privacy based on ε, a coefficient in the noise-generating formula

How much could changing one row change an answer? MEAN COUNT HISTOGRAMS The sensitivity of a series of queries is the sum of the sensitivities of the queries Sensitivity

Coin-flip Algorithm • Proposed by Mishra and Sandler • A way for individuals to publish their own personal data • Amount of privacy based on ε, the bias in the coin-flip

Each of the k possible answers to a query are ordered and numbered If an individual’s answer to the query is the ith answer, the profile would be a string of k bits where the ith is a one and the others are zero To sanitize, each bit is flipped with probability ½ + ε/2 All sanitized profiles resemble a random string of ones and zeros Implementing the Coin-flip Algorithm

Example: HIV status • Ordered possible responses: “POSITIVE, NEGATIVE, UNKNOWN” • The original profile of an HIV+ individual: “1, 0, 0” • Results of coin-flips: “STAY, FLIP, STAY” • Resulting sanitized profile: “1, 1, 0” • What do we know about the individual from the sanitized profile?

My Research • Compare the total amount of error generated by histogram / frequency queries • Hypothesis: The noise-adding algorithm will generate less error for few queries and the coin-flip algorithm will generate less error for many queries • Research question: Where is the “sweet spot” where the error lines cross on a graph?

The “sweet spot” first occurs at 101 queries.

With the smallest histograms first, the first “sweet spot” occurs at 32 queries.

With the largest histograms first, the first “sweet spot” occurs at 189 queries.

A Second Look • Range of sensitivity: 2 to 136 • Unordered histograms: • At first “sweet spot”, sensitivity= 30. • Smallest histograms first: • At first “sweet spot”, sensitivity= 32. • Largest histograms first: • At first “sweet spot”, sensitivity= 34.

Conclusions • For histogram / frequency queries, “sweet spots” occur between sensitivity=30 and sensitivity=40, so for least error: • If sensitivity < 30, use NOISE-ADDING algorithm • If sensitivity > 40, use COIN-FLIP algorithm

Quick Bibliography • Survey: • N R Adam and J C Wortmann. Security-control methods for statistical databases: a comparative study. ACM Computing Surveys, 25(4), December 1989. • Noise-adding algorithm: • C Dwork, F McSherry, K Nissim, A Smith. Calibrating noise to sensitivity in private data analysis. 3rd Theory of Cryptography Conference, 2006. • Coin-flip algorithm: • N Mishra, M Sandler. Symposium on Principles of Database Systems, 2006.

Professor Alf Weaver, PhD Professor Nina Mishra, PhD • REU program at UVa, sponsored by the National Science Foundation

Protecting Statistical Databases Against Snoopers

Protecting Statistical Databases Against Snoopers

Presentation Transcript

IFC Against AIDS Protecting People and Profitability

IFC Against AIDS Protecting People and Profitability

Inference Control in Statistical databases

Inference Protection on Statistical Databases

Protecting against national-scale power blackouts

Protecting Obfuscation Against Algebraic Attacks

Surfers against sewages Protecting British beaches

Surfers Against Sewage protecting Britain

Protecting Against Enemy Collaborators

Protecting against information threats

ABS Statistical Databases

Protecting your Commission Against Non-Licensees

Statistical Databases – Query Auditing

IFC Against AIDS Protecting People and Profitability

IFC Against AIDS Protecting People and Profitability

Statistical databases in theory and practice Part III: Designing statistical databases

Protecting Adults Against Vaccine Preventable Viral Hepatitis

Protecting Your Sign Code Against Attack (S634)

Protecting Against Failure

Protecting against computerized corporate espionage

Protecting Yourself Against Acquaintance Rape

SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES