protecting statistical databases against snoopers
Download
Skip this Video
Download Presentation
Protecting Statistical Databases Against Snoopers

Loading in 2 Seconds...

play fullscreen
1 / 20

Protecting Statistical Databases Against Snoopers - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Protecting Statistical Databases Against Snoopers. Comparison of two methods. Disclosure vs. Anonymity. Information disclosure necessary for planning and numerical measurements Anonymity necessary for protection of the individual and the public’s trust in systems. Medical Data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Protecting Statistical Databases Against Snoopers' - oliver


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
disclosure vs anonymity
Disclosure vs. Anonymity
  • Information disclosure necessary for planning and numerical measurements
  • Anonymity necessary for protection of the individual and the public’s trust in systems
medical data
Medical Data

Necessary for:

  • Measuring effectiveness of current treatments
  • Finding sources of common medical mistakes
  • Tracking contagious disease
  • Government spending planning
  • Health Insurance Companies
anonymity not as easy as it looks
Anonymity: Not as Easy as it Looks

Complete Identification Without Uniquely Identifying Information

outside factors affecting privacy
Outside Factors Affecting Privacy
  • Snooper’s supplementary knowledge
  • Public data sources
  • Rarity
comparing two methods of protection
Comparing Two Methods of Protection
  • What are the privacy guarantees?
  • Can useful information be gained?
sensitivity based noise adding algorithm
Sensitivity-based Noise-adding Algorithm
  • Proposed by Dwork, McSherry, Nissim and Smith
  • Adds noise to each answer based on the sensitivity of the series of queries
  • Amount of privacy based on ε, a coefficient in the noise-generating formula
sensitivity
How much could changing one row change an answer?

MEAN

COUNT

HISTOGRAMS

The sensitivity of a series of queries is the sum of the sensitivities of the queries

Sensitivity
coin flip algorithm
Coin-flip Algorithm
  • Proposed by Mishra and Sandler
  • A way for individuals to publish their own personal data
  • Amount of privacy based on ε, the bias in the coin-flip
implementing the coin flip algorithm
Each of the k possible answers to a query are ordered and numbered

If an individual’s answer to the query is the ith answer, the profile would be a string of k bits where the ith is a one and the others are zero

To sanitize, each bit is flipped with probability ½ + ε/2

All sanitized profiles resemble a random string of ones and zeros

Implementing the Coin-flip Algorithm
example hiv status
Example: HIV status
  • Ordered possible responses: “POSITIVE, NEGATIVE, UNKNOWN”
  • The original profile of an HIV+ individual: “1, 0, 0”
  • Results of coin-flips: “STAY, FLIP, STAY”
  • Resulting sanitized profile: “1, 1, 0”
  • What do we know about the individual from the sanitized profile?
my research
My Research
  • Compare the total amount of error generated by histogram / frequency queries
  • Hypothesis: The noise-adding algorithm will generate less error for few queries and the coin-flip algorithm will generate less error for many queries
  • Research question: Where is the “sweet spot” where the error lines cross on a graph?
a second look
A Second Look
  • Range of sensitivity: 2 to 136
  • Unordered histograms:
    • At first “sweet spot”, sensitivity= 30.
  • Smallest histograms first:
    • At first “sweet spot”, sensitivity= 32.
  • Largest histograms first:
    • At first “sweet spot”, sensitivity= 34.
conclusions
Conclusions
  • For histogram / frequency queries, “sweet spots” occur between sensitivity=30 and sensitivity=40, so for least error:
    • If sensitivity < 30, use NOISE-ADDING algorithm
    • If sensitivity > 40, use COIN-FLIP algorithm
quick bibliography
Quick Bibliography
  • Survey:
    • N R Adam and J C Wortmann. Security-control methods for statistical databases: a comparative study. ACM Computing Surveys, 25(4), December 1989.
  • Noise-adding algorithm:
    • C Dwork, F McSherry, K Nissim, A Smith. Calibrating noise to sensitivity in private data analysis. 3rd Theory of Cryptography Conference, 2006.
  • Coin-flip algorithm:
    • N Mishra, M Sandler. Symposium on Principles of Database Systems, 2006.
slide20
Professor Alf Weaver, PhD

Professor Nina Mishra, PhD

  • REU program at UVa, sponsored by the National Science Foundation
ad