1 / 58

Data Privacy

Data Privacy. CS 656 Spring 2009. Should We Be Worried?. Medical Records Misuse.

janice
Download Presentation

Data Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Privacy CS 656 Spring 2009

  2. Should We Be Worried?

  3. Medical Records Misuse • Burlington Northern allegedly conducted genetic tests on employees who had filed worker’s compensation claims for carpal tunnel syndrome, without their knowledge. The company’s intention was presumably to be able to reject some claims because of genetic predisposition to the condition. Source - The Dark Side of Genetic Testing: Railroad Workers Allege Secret Testing, by Dana Hawkins, U.S. News and World Report, February 11 (19), 2001 Dr. Indrajit Ray, Associate Professor, Computer Science Department

  4. Latanya Sweeney’s Work (1) • In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees • GIC has to publish the data for research purposes GIC (zip, dob, gender, diagnosis, procedure, ...) Dr. Indrajit Ray, Associate Professor, Computer Science Department

  5. Latanya Sweeney’s Work (2) • Sweeney paid $20 and bought the voter registration list for Cambridge, MA GIC (zip, dob, gender, diagnosis, procedure, ...) VOTER (name, party, ..., zip, dob, gender) Dr. Indrajit Ray, Associate Professor, Computer Science Department

  6. Latanya Sweeney’s Work (3) • William Weld (former governor) lives in Cambridge, hence is in VOTER • 6 people in VOTER share his date of birth • Only 3 of them were men (same gender) • Weld was the only one in that zip • Sweeney learned Weld’s medical records Dr. Indrajit Ray, Associate Professor, Computer Science Department

  7. On-line Privacy Concerns • Data is often collected silently • Web allows large quantities of data to be collected inexpensively and unobtrusively • Data from multiple sources may be merged • Non-identifiable information can become identifiable when merged • Data collected for business purposes may be used in civil and criminal proceedings • Users given no meaningful choice • Few sites offer alternatives Dr. Indrajit Ray, Associate Professor, Computer Science Department

  8. Privacy International’s Privacy Ranking of Internet Service Companies interimrankings.pdf

  9. Privacy Risks • Identity thefts • Demographics re-identification and its consequences • 87% of US population is uniquely identified by <gender, dob, zip> • Latanya Sweeney’s study in 2001 • Real world stalking • On-line stalking • Censorship Dr. Indrajit Ray, Associate Professor, Computer Science Department

  10. Source - Beth Rosenberg (Sandstorm.net). Available from Privacy Rights Clearinghouse www.privacyrights.org Dr. Indrajit Ray, Associate Professor, Computer Science Department

  11. Total Number of 2006 Reported Data Breach Incidents 327 Approximate Minimum Total # of Personal Records Potentially Compromised in 2006 100,453,730 # Data-Breach Identity Thieves Sentenced in 2006 5 # Individual Victims of Sentenced Identity Thieves 238 Analysis of Privacy Breaches in 2006 Dr. Indrajit Ray, Associate Professor, Computer Science Department

  12. Average Cost of Data Loss Total # of affected records = 250,000 Source: Tech//404 Data Loss Calculator http://www.tech-404.com/calculator.html Dr. Indrajit Ray, Associate Professor, Computer Science Department

  13. Surveys Identify Concerns • Increasingly people say they are concerned about online privacy (80-90% of US Net users) • Improved privacy protection is factor most likely to persuade non-Net users to go online • 27% of US Net users have abandoned online shopping carts due to privacy concerns • 64% of US Net users decided not to use a web site or make an online purchase due to privacy concerns • 34% of US Net users who do not buy online would buy online if they didn’t have privacy concerns Dr. Indrajit Ray, Associate Professor, Computer Science Department

  14. Legislation?

  15. Ruling Limits Prosecution of People Who Violate Law on Privacy of Medical Records. By Robert Pear, The New York Times, June 7, 2005....If a hospital sells a list of patients' names to a firm for marketing purposes, the hospital can be held criminally liable, ... But if a hospital clerk does the same thing, in defiance of hospital policy, the clerk cannot be prosecuted under the 1996 law, because the clerk is not a ''covered entity.'' Dr. Indrajit Ray, Associate Professor, Computer Science Department

  16. How Do They Get My Data?

  17. Workplace Monitoring • 75% of employers monitor their employees website visit • Most computer monitoring equipment allows monitoring remotely without user’s knowledge • Almost all employers review employee email • Deleted emails are not really deleted • 33% track keystrokes and time spent at the keyboard • Currently there are very few laws regulating employee monitoring Dr. Indrajit Ray, Associate Professor, Computer Science Department

  18. Browsers chatter about IP address, domain name, organization, Referring page Platform: O/S, browser What information is requested URLs and search terms Cookies To anyone who might be listening End servers System administrators Internet Service Providers Other third parties Advertising networks Anyone who might subpoena log files later Browser Chatter Dr. Indrajit Ray, Associate Professor, Computer Science Department

  19. search for medical information buy CD replay cookie set cookie Ad Ad Linking With Cookies Ad companycan get yourname and address fromCD order andlink them to your search Search Service CD Store Dr. Indrajit Ray, Associate Professor, Computer Science Department

  20. Monitoring on the InternetWhat your browsing reveals Privacy.net Analyzer Results.pdf

  21. Data Dissemination • Personally identifiable information collected whenever a user • creates an account • submits an application • signs up for newsletter • participates in a survey • … • Data sharing and dissemination may be done • to study trends or to make useful statistical inference • to share knowledge • to outsource the management of data • …. Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  22. Macrodata vs. Microdata • In the past data were mainly released in summary form (macrodata) and through statistical databases • Today many situations require that the specific stored data themselves, called microdata, be released • increased flexibility and availability of information • Microdata are a table of rows (tuples) and columns (attributes) • Microdata are subject to higher risk of privacy breaches Dr. Indrajit Ray, Associate Professor, Computer Science Department

  23. Online and offline merging • In November 1999, DoubleClick purchased Abacus Direct, a company possessing detailed consumer profiles on more than 90% of US households. • In mid-February 2000DoubleClick announced plans to merge “anonymous” online data with personal information obtained from offline databases • By the first week in March 2000 the plans were put on hold • Stock droppedfrom$125 (12/99) to $80 (03/00) Dr. Indrajit Ray, Associate Professor, Computer Science Department

  24. Subpoenas • Data on online activities is increasingly of interest in civil and criminal cases • The only way to avoid subpoenas is to not have data • In the US, your files on your computer in your home have much greater legal protection that your files stored on a server on the network Dr. Indrajit Ray, Associate Professor, Computer Science Department

  25. Privacy Enhancing Technologies Educating Users to Privacy Threats

  26. Anti-Phishing Phil • From the website An interactive game that teaches users how to identify phishing URLs, where to look for cues in web browsers, and how to use search engines to find legitimate sites Dr. Indrajit Ray, Associate Professor, Computer Science Department

  27. Privacy Enhancing Technologies Knowing Privacy Policies of Web Sites

  28. Platform for Privacy Preferences (P3P) • Allows websites to express their privacy practices in a machine as well human readable way • Can be retrieved automatically by P3P enabled web browsers and interpreted • Users can be made aware of privacy practices • Enables automated decision making based on these practices List of P3P Enabled Tools Dr. Indrajit Ray, Associate Professor, Computer Science Department

  29. PrivacyFinder • From the PrivacyFinder web site PrivacyFinderis a privacy-enhanced search engine. Once you state your privacy preferences (low, medium, high, or custom), the search results are ordered based on how their computer-readable privacy policies comply with your preferences. A red bird indicates that the site has conflicts with your preferences while a green bird indicates compliance. The absence of any bird means that a valid computer-readable privacy policy, known as a P3P policy, could not be located. Dr. Indrajit Ray, Associate Professor, Computer Science Department

  30. Work @ CSU • P3P based efforts are simple statements • Can we trust a site to adhere to its stated policies? How much? • Solution • Evaluate trustworthiness of site to actually follow its privacy policies • Use prior experience with site, properties (like P3P policies, technology used, compliance certificates etc.) reputation (may privacy ranking created by others) and recommendation from somebody you trust • Warn user of the trust level of site by integrating trust computation into Privacy Bird (or similar tool). Dr. Indrajit Ray, Associate Professor, Computer Science Department

  31. Privacy Enhancing Technologies Anonymizing Protocols for Communication

  32. Anonymizing Protocols • Makes it difficult from someone to trace back a message to its source • Prevents • Linkability • Traceability • Examples • Anonymizing Proxy • Mix Relays • Tarzan, Tor • Protocols using these for anonymous communication Dr. Indrajit Ray, Associate Professor, Computer Science Department

  33. Traffic Analysis Breaks Anonymity Anonymizing Proxy Dr. Indrajit Ray, Associate Professor, Computer Science Department

  34. Mix Networks Database of Mix Nodes & their Public Keys Prevent edge analysis by introducing cover traffic Dr. Indrajit Ray, Associate Professor, Computer Science Department

  35. Tor • From the website Tor is a toolset for a wide range of organizations and people that want to improve their safety and security on the Internet. Using Tor can help you anonymize web browsing and publishing, instant messaging, IRC, SSH, and other applications that use the TCP protocol. Tor also provides a platform on which software developers can build new applications with built-in anonymity, safety, and privacy features. • Based on the Mix Network concept Dr. Indrajit Ray, Associate Professor, Computer Science Department

  36. Anonymous Communication • Not easy to use and administer • Most rely on a majority of entities being trusted • Susceptible to collusion among some subsets of entities • Susceptible to some types of traffic analysis • Scalability • Ease of Adaptation Dr. Indrajit Ray, Associate Professor, Computer Science Department

  37. Microdata Disclosure Control

  38. The Anonymity Problem: Example Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  39. Microdata Disclosure Control • Disclosure can • occur based on the released data alone • result from combination of the released data with publicly available information or external data sources • Data should be released to the public via techniques that • do not reveal identities and/or sensitive information • preserve the utility of the data for a wide range of analysis • Microdata disclosure control is for “safe” and “useful” data dissemination Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  40. Preserving Privacy: k-Anonymity • The released data should be indistinguishably related to no less than a certain number, k, of respondents • The respondents must be indistinguishable with respect to a set of attributes (quasi-identifiers) • k-Anonymity requires that every combination of values of quasi-identifiers in the released table must have at least k occurrences • Enforced using generalization and suppression Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  41. Generalization and Suppression • Generalization: the values of a given attribute are replaced by more general values • ZIP codes 80521 and 80523 can be generalized to 8052* • date of birth 12/04/64 and 12/10/64 can be generalized to 64 or 12/64 • types: local, global, single-dimensional, multi-dimensional • Suppression: remove the information altogether • usually done locally • suppression can reduce the amount of generalization necessary to satisfy the k-anonymity requirement Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  42. A k-Anonymized Table original table a 2-anonymized table equivalence class Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  43. Too Much Sanitization • Will reduce the quality of the data to such extent that it may not be useful anymore • What is too much? • Need to assess the degree of data disclosure • Need to assess the quality of data resulting from disclosure control Dr. Indrajit Ray, Associate Professor, Computer Science Department

  44. Preserving Data Utility generalization choose the one with lowest information loss (given by some metric) microdata k-anonymous tables (for a given k) Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  45. Two Ignored Aspects • The data publisher’s dilemma • a data publisher must weigh in the the risk of publicly disseminated information against the statistical utility of the content • how to decide what a good value of k is? • how to assure that higher k values or lower information loss is not possible in the neighborhood of a chosen value? • Biased privacy • k-anonymity only specifies a minimum privacy level present for all individuals in the microdata • individual privacy levels can be very different for different individuals Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  46. An Example of Bias Probability of breach 1/3 1/2 2-anonymized table Dr. Indrajit Ray, Associate Professor, Computer Science Department

  47. Minimalistic View of Privacy • Other models are proposed around the privacy issues identified by k-anonymity • l-diversity and t-closeness • Mondrian multidimensional k-anonymity • k-type anonymizations • (α,k)-anonymity, p-sensitive k-anonymity, (k,l)-anonymity • Anatomy, personalized privacy, skyline privacy • All existing anonymity models are minimalistic view models • privacy of a table is characterized by the minimum privacy level of all individuals • can become a source of biased privacy Dr. Indrajit Ray, Associate Professor, Computer Science Department

  48. Current Research Focus • Multi-objective analysis • to resolve the data publisher’s dilemma • Quantification of anonymization bias • bias may be infused to cater to personal privacy requirements • Fair comparison of anonymization techniques in the presence of bias • Alternative characterization of privacy • privacy from an individualistic viewpoint • Optimization framework for alternative privacy models Dr. Indrajit Ray, Associate Professor, Computer Science Department Introduction

  49. Data Anonymization for Network Traces

  50. Use of Public Trace* *Source: Jelena Mirkovic, Privacy-Safe Network Trace Sharing via Secure Queries, NDS ‘08. Dr. Indrajit Ray, Associate Professor, Computer Science Department

More Related