1 / 16

Anonymizing Tables for Privacy Protection

Explore anonymization techniques for tables to protect privacy. Learn about k-anonymity models and algorithms for de-identifying sensitive information.

guzmanf
Download Presentation

Anonymizing Tables for Privacy Protection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tomás Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu

  2. An example: Medical Records

  3. Medical Records: De-identify&Release

  4. Quasi-identifiers: reveal less information k-anonymity model Not sufficient! [Swe02, SS98] Uniquely identify you! Public Database

  5. k-anonymity – Problem Definition • Input: Database consisting of n rows, each with m attributes drawn from a finite alphabet. • Goal: Suppress some entries in the table such that each modified row becomes identical to at least k-1 other rows. • More the suppression, lesser the utility of the modified table. • Objective: Minimize the number of suppressed entries.

  6. Medical Records: 2-anonymized table Suppressentries Cost = 10

  7. k-anonymity – Results • [MW04] • NP-hardness for a linear size alphabet • O(k log k) - approximation algorithm • NP-hardness (even for ternary alphabet) • O(k) - approximation for k-anonymity • 1.5 - approximation for 2-anonymity • 2 - approximation for 3-anonymity

  8. 2 3 2 3 3 1 O(k)-approximation algorithm (for k=3) • Create a complete graph s.t. • Each row vector in the table is a vertex. • Weight of an edge is the number of attributes on which the two rows differ (Hamming distance).

  9. O(k)-approximation algorithm (for k=3) • We create a forest as follows: • Each node picks its nearest neighbor and connects to it. • If the resulting graph has a component with only two nodes, connect this component to the second nearest neighbor of one of the two nodes.

  10. An example graph 3 2 7 5 10 9 9 7 12 7 4 5 1 1 3 2 Nearest-neighbor edge Other edges

  11. The forest obtained 3 2 4 1 1 3 2

  12. O(k)-approximation algorithm (for k=3) • The forest has: • Components of size at least 3. • The total cost of edges in the forest is no more than the cost of the optimal solution. • In optimal solution, each node has at least as many *s as its Hamming distance to its secondnearest neighbor. • Each node has at most as many *s as the cost of the tree containing the node. • If there is any component with size greater than 5, break it into components of size at least 3 (resp. k).

  13. The final partition 3 2 4 3 1 1 3 2

  14. Analysis of the algorithm • Cluster the row vectors according to this partition • Cost incurred ≤OPT * (size of largest partition) = 5 * OPT. • For general k, the cost of this solution is within max{3k-5,2k-1} of the cost of optimal solution.

  15. Better than O(k)-approximation? • Not possible, using only the graph representation • Lose information about the structure of the problem • There exist two instances with: • Same underlying graph • k-anonymity costs differing by a factor of O(k)

  16. Open problems • Lower bounds on the approximation factor (without assuming the graph representation) • Extend the k-anonymity model to account for changes in the database: • Handle inserts, deletes and updates

More Related