setting the stage how de identification came into u s law and why the debate matters today n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today PowerPoint Presentation
Download Presentation
Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today

Loading in 2 Seconds...

play fullscreen
1 / 14

Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today. Professor Peter Swire Ohio State University/Future of Privacy Forum FPF Conference on DeIdentification National Press Club December 5 , 2011. Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today' - preston


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
setting the stage how de identification came into u s law and why the debate matters today

Setting the Stage:  How De-Identification Came into U.S. Law, and Why the Debate Matters Today

Professor Peter Swire

Ohio State University/Future of Privacy Forum

FPF Conference on DeIdentification

National Press Club

December 5, 2011

overview
Overview
  • U.S. history: Census, federal agency statistics, & HIPAA
  • Why Deidentification (DeID) matters today
    • The debate – it works or it doesn’t
    • Three threat models
    • Analogy to law enforcement
  • Big picture – useful for many tasks, even with the limits shown by scientists
census statistics deid
Census, Statistics & DeID
  • Many years of Census experience
    • Highly useful data
    • Deidentified
      • Periodic opposition to mandatory reporting
      • Needed strong confidentiality promises
    • Suppress small cell size
      • Only home in a census tract
    • Fuzz data
    • Strict rules against release even for national security purposes
federal agency statistics
Federal Agency Statistics
  • Codification in Confidential Information Protection & Statistical Efficiency Act of 2002 (CIPSEA)
    • Good history by Sylvester & Lohr
  • Basic rule: if collect data for statistical purposes, use only for statistical purposes, don’t ReID
  • Funny thing: same culture & practice for years in private sector polling (Gallup-style) and market research
  • Many years of practice here
  • Perhaps a basic guideline going forward?
hipaa
HIPAA
  • 1999-2000 regs informed by Sweeney research
  • Safe harbor – delete a lot of specified data fields
  • Expert (I pushed for this) – where statistical basis, can achieve DeID based on risk, not safe harbor
  • Data use agreements – release for research, with enforceable promise not to ReID
  • In short:
    • If scrubbed enough, can release publicly
    • If scrubbed less, then enforceable promise not to ReID
why it matters today
Why It Matters Today
  • Now data mining far beyond specialized researchers
    • The Internet (commercial since only 1993) gives me access to data
    • Storage & processing on my laptop > mainframe of 25 years ago
    • Search is way better
    • The erosion of practical obscurity – “they” really may figure out who “we” are
the debate is joined
The Debate is Joined
  • Ohm (and others) draw on Sweeney-type research
    • DeID likely to lead to ReID
  • Yakowitz (and others) respond
    • Benefits of public data enormous
    • Practical risk/harm from ReID low
  • Anonymization creates huge risks or low risks?
  • Worth doing anonymization/DeID at all?
  • Today’s conference to shed light on this …
threat models which attackers
Threat Models – Which Attackers?
  • Three types of attackers on “anonymized” data:
    • Insiders “peeping”
    • Outside hackers intruding
    • The public who doesn’t get into the database
  • DeID often effective for first two
  • Ohm/Yakowitz debate primarily on the third
insiders peeping
Insiders Peeping
  • Swire 2009 Peeping article, at peterswire.net
  • Threat: employee or employee of sub-contractor sees the data and “peeps”
    • Sees celebrity information - Clooney
    • Sees information about friend/family/ex
    • Sees information to create harm (ID theft, blackmail)
  • Anonymization useful part of anti-peeping strategy
    • Employee doesn’t search or stumble upon Clooney
    • Employee may lack tools to do Sweeney-type analysis
    • Audit logs catch employees who try
    • Give employees access to statistical data, not PII
outside hackers
Outside Hackers
  • Hacker may intrude for a short while
    • Anonymization may prevent “ah hah” – Clooney
  • Hacker may download database
    • If so, then hacker becomes similar to the public
    • May or may not be good at Sweeney-type tricks
    • May be focused on specific types of information, and not try to ReID
  • Less-than-perfect DeID may substantially reduce incidence of ReID
re id by the public
Re-ID by “The Public”
  • So, masking may help against some threats
  • The debate, though, is whether “the public” (i.e., the experts) can ReID
  • Sweeney & other research provides startling & important results of ReID
    • Can everything be ReIdentified?
reid 2 famous studies
ReID & 2 Famous Studies
  • Date of birth, zip, & gender -> 80%+ unique
    • Yes
    • BUT, DOB is off-the-charts different
      • Gender – splits population in half
      • DOB = 366 (days) x 80 (years) = over 25,000 cells
      • Moral – DOB ridiculously strong to ReID
  • Netflix and can Re-ID over 60% of movie reviews
    • BUT, takes known ImDB reviewers and matches to Netflix
    • Can ReID a lot, but not a big effect
law enforcement analogy
Law Enforcement Analogy
  • So, is ReID generally easy or hard, useful or useless?
  • Consider cop with a bunch of clues (male, tall, red hair, etc.)
    • Enough to ReID? No
    • Helpful to ReID? Yes
    • A matter of how much legwork, analysis, extra data is available and accurate
    • Very big range for difficulty of finding the suspect
    • Same is true for ability of “the public” to ReID, to name the suspect
conclusion
Conclusion
  • Issue matters today -- more data potentially available to “the public”
  • History of useful anonymization in statistics
    • If collect data for statistical purposes, use only for statistical purposes, store that way, don’t ReID
  • DeID helps against insider & hacker threats
  • DeID by “the public” varies widely in the effort needed to find the “suspect”
  • Our conference today to help policymakers learn where DeID likely to be most useful