1 / 36

Limiting Disclosure in Hippocratic Databases

Limiting Disclosure in Hippocratic Databases. VLDB August 31, 2004. Kristen LeFevre Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David DeWitt. Presentation Outline. Hippocratic Databases framework for managing privacy, including the problem of limiting disclosure

kreeli
Download Presentation

Limiting Disclosure in Hippocratic Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Limiting Disclosure in Hippocratic Databases VLDB August 31, 2004 Kristen LeFevre Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David DeWitt

  2. Presentation Outline • Hippocratic Databases framework for managing privacy, including the problem of limiting disclosure • Overview of our proposal for integrating policy-driven disclosure control into an existing relational database environment • Brief discussion of alternative cell-level enforcement models • Optimized implementation of opt-in and opt-out choices • Overview of performance evaluation • Conclusions Limiting Disclosure in Hippocratic Databases

  3. Hippocratic Databases and Limited Disclosure • Hippocratic Databases have been proposed as a framework for managing privacy-sensitive information • Limited disclosure is one of the defining principles of this framework • Limited Disclosure includes 3 Main Ideas: • Privacy Policy Organizations define a set of rules describing to whom data may be disclosed (recipients) and how the data may be used (purposes) • Consent Data subjects given control over who may see their personal information and under what circumstances • Disclosure ControlDatabase ensures that privacy policy and data subject consent is enforced with respect to all data access • Limits the outflow of information from the database Limiting Disclosure in Hippocratic Databases

  4. Motivating Example • Consider a group of athletes registering for a major international competition • Personal information is collected from each athlete, possibly including • Name, Age, Nationality, Address, Phone number, Visa status • Data must be managed according to the organizing committee’s privacy policy • Government officials are allowed to see visa information for the purpose of venue security • Team travel agents may see the contact information for athletes from their own country for making travel arrangements • Organizing committee may not disclose athletes’ information to journalists without the athlete’s consent Limiting Disclosure in Hippocratic Databases

  5. Limited Disclosure Framework Goals • Provide techniques for enforcing a broad class of privacy policy rules • Privacy policy enforcement should require little or no modification to existing application code • Policy rules should be stored and managed by the database • Provide limited disclosure enforcement at the cell level Limiting Disclosure in Hippocratic Databases

  6. Query Modifier Policy Definition Subject Consent Privacy Meta-Data Data Table Limited Disclosure Framework Overview Queries are modified so results respect privacy policy and consent Start with an existing database environment with associated applications Query When providing information, data subjects also provide consent for various data use Privacy policy is defined and stored in the database in privacy meta-data tables Consent Info Limiting Disclosure in Hippocratic Databases

  7. Policy Definition • Privacy policy is defined using one of the following XML-based policy definition languages • Platform for Privacy Preferences (P3P) • Enterprise Privacy Authorization Language (EPAL) Limiting Disclosure in Hippocratic Databases

  8. Privacy Meta-Data and Policy Meta-Language • Privacy “meta-language” for expressing the privacy policy in the database • Not tied to one particular policy language • Many practical P3P and EPAL policies can be translated to this language • Privacy policy is a set of rules of the form <data, purpose, recipient, condition> • Condition must be a predicate that can be expressed in SQL • Privacy policy rules stored in the database Limiting Disclosure in Hippocratic Databases

  9. Privacy Meta-Data Example Journalists may only see athletes’ names for the purpose of writing articles with explicit consent Government officials may see athletes’ visa information for security purposes. Policy Rule Purpose Recipient Table Column CondID P1 R1 Security Gov’t Off. Athletes Visa - P1 R2 Security Gov’t Off. Athletes Name - P1 R3 Travel Travel Ag. Athletes Name - P1 R4 Travel Travel Ag. Athletes Phone - P1 R5 Articles Journalist Athletes Name C1 P1 R6 Articles Journalist Athletes Address C2 CondID Predicate C1 “EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Name_choice = 1)” C2 “EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Address_choice = 1)” Limiting Disclosure in Hippocratic Databases

  10. Query Modification • Implemented two alternative algorithms for modifying queries to incorporate policy rules and consent information • Queries modified in such a way that query results follow one our cell-level semantic models Limiting Disclosure in Hippocratic Databases

  11. Enforcement Models • Row (tuple)-level enforcement insufficient for enforcing arbitrary policies when existing database schemas are not designed with the policy in mind Limiting Disclosure in Hippocratic Databases

  12. An Example Table “Athletes” Consent information for journalists writing stories Limiting Disclosure in Hippocratic Databases

  13. Row-Level Enforcement Table “Athletes” Consent information for journalists writing stories Limiting Disclosure in Hippocratic Databases

  14. Filter Athlete #2 because no consent is provided Must either disclose prohibited information, or restrict information that should be available! Row-Level Enforcement Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 3 Ian Thorpe 23 Sydney 333-3333 4 Jenny Thompson 31 New York 444-4444 Consent information for journalists writing stories Limiting Disclosure in Hippocratic Databases

  15. Enforcement Models • Cell-level enforcement • Table Semantics model • Query Semantics model Limiting Disclosure in Hippocratic Databases

  16. Table Semantics Enforcement • “Mask” prohibited cells with the null value • Filter rows where the primary key is prohibited • Conceptually, query is performed on top of this “view” Limiting Disclosure in Hippocratic Databases

  17. Table Semantics Enforcement • SQL’s null value represents “no value” • Desirable semantics for prohibited values • Predicates applied to null never evaluate to true • Null does not join with other values • Null is not included when computing aggregates Limiting Disclosure in Hippocratic Databases

  18. Table Semantics Enforcement Table “Athletes” Consent Information Mask prohibited cells with null Filter rows where the primary key is prohibited Limiting Disclosure in Hippocratic Databases

  19. Enforcement Models • Cell-level enforcement • Table Semantics model • Query Semantics model Limiting Disclosure in Hippocratic Databases

  20. Query Semantics Enforcement • “Mask” prohibited cells with the null value • Execute the query on top of the masked table • Filter rows that are entirely null from the result set Limiting Disclosure in Hippocratic Databases

  21. Query Semantics Enforcement Mask prohibited cells with null Issue Query: SELECT Name, Age FROM Athletes Filter rows that are entirely null from result set Query Semantics Table Semantics Limiting Disclosure in Hippocratic Databases

  22. Query Modification Example (Table Semantics) SELECT Name FROM Athletes WHERE Name = ‘Michael Phelps’ SELECT CASE WHEN EXISTS (SELECT Name_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Name_Choice = 1) THEN Name ELSE null END FROM Athletes WHERE Name = ‘Michael Phelps’ AND EXISTS (SELECT Athlete#_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Athlete#_Choice = 1) Limiting Disclosure in Hippocratic Databases

  23. Database-level disclosure control • Database the best place to enforce limited disclosure • More efficient, flexible, and secure than an application-level approach • Need not fetch prohibited data from the database • When applied naively, an application-level approach leads to privacy leaks when applied at the cell level • Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30 Limiting Disclosure in Hippocratic Databases

  24. Consent Information # Athlete# Name Age Address Phone 1 √ √ √ √ √ Name Age 2 X X X X X Jenny Thompson 3 √ X X √ √ 4 √ √ X X X Example: Difficulties of application-level disclosure control Table “Athletes” Query the database; Retrieve results to application Check policy and consent info; replace prohibited cells with null Based on this query, it is easy to infer that Jenny Thompson’s age is greater than 30! Limiting Disclosure in Hippocratic Databases

  25. Database-level disclosure control • Database is a logical place to enforce limited disclosure • More efficient and flexible than an application-level rule engine approach • Need not fetch prohibited data from the database • When applied naively, an application-level approach leads to privacy leaks when applied at the cell level • Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30 • Alternative approach performs much query processing in the application • Even more complicated to compute aggregates and joins when some cells are prohibited! Limiting Disclosure in Hippocratic Databases

  26. Optimized Implementation of Opt-in and Opt-out Conditions • Important to note that SQL queries offer much flexibility for defining disclosure conditions • In practice simple opt-in and opt-out choices are often used to express subject consent and are extremely important • Sufficient for expressing P3P policy rules • Sufficient for expressing many HIPAA-mandated policies, for example. • Implemented several techniques for storing consent and optimizing this type of condition Limiting Disclosure in Hippocratic Databases

  27. Optimized Implementation of Opt-in and Opt-out Conditions • Several alternative storage techniques • Internal column (inline) representation • External, single table representation • External, multiple table representation Limiting Disclosure in Hippocratic Databases

  28. Optimized Implementation of Opt-in and Opt-out Conditions Internal Column representation Table “Athletes” Limiting Disclosure in Hippocratic Databases

  29. Optimized Implementation of Opt-in and Opt-out Conditions External, single table representation Table “Athletes” Consent Table Limiting Disclosure in Hippocratic Databases

  30. Optimized Implementation of Opt-in and Opt-out Conditions External, multiple table representation Positive Consent Tables Table “Athletes” Limiting Disclosure in Hippocratic Databases

  31. Overview of Performance Experiments • Implemented Query Modification algorithms on top of DB2 version 8.1 • Focused on measuring performance for unconditional rules, and those with opt-in and opt-out choices • Experimental setup • Synthetic dataset based on the Wisconsin Benchmark • Dual-processor 1.8 GHz AMD Machine running Windows 2000 Server • 2 gigabytes memory • 50 megabyte buffer pool • Queries run warm and cold • Here we report the warm numbers (error less than ±5% with 95% confidence) Limiting Disclosure in Hippocratic Databases

  32. 40 30 Elapsed Time (seconds) 20 10 Modified External Multiple Unmodified 0 0 20 40 60 80 100 Choice Selectivity (%) Modified Internal • Measured performance of a query selecting all records from a 5 million-record table • Compared performance of original and modified queries for varied choice selectivity • Not surprisingly, performance actually better for modified queries when we use privacy enforcement as an additional selection condition • Able to use indexes on choice values • Shows the importance of database-level privacy enforcement for performance Limiting Disclosure in Hippocratic Databases

  33. Measured overhead cost using a query that selects all records • Choice selectivity = 100% • Observed worst-case scenario where no rows are filtered due to privacy constraints, but incur all costs of cell-level checking • Full bar represents elapsed time • Bottom portion of bar is CPU time • Much of the cost of privacy enforcement is CPU cost, so scales well as queries become more I/O intensive Limiting Disclosure in Hippocratic Databases

  34. Additional Performance Results • Cost of rewriting queries is small • Must only be done once if query is pre-compiled • Found that query semantics enforcement model is often faster than table semantics because frequently more rows are filtered • Tradeoffs between choice storage techniques • Number of choices stored for a particular table • As more choices are stored, performance of internal representation suffers • Number of choices enforced for a particular query • As more choices are enforced, performance of external multiple representation suffers • Tradeoffs between query modification algorithms • Described in paper Limiting Disclosure in Hippocratic Databases

  35. Conclusions • Limited Disclosure is a necessary component of a comprehensive data privacy management system • Proposed a framework enforcing limited disclosure at the database level • More efficient and flexible than application-level disclosure control • Techniques also have broader use for other applications requiring policy-driven fine-grained disclosure control • Framework can be deployed to an existing environment with minimal modification to legacy applications and existing schemas • Query modification and consent storage approaches efficient enough to be viable in practice Limiting Disclosure in Hippocratic Databases

  36. Questions

More Related