1 / 8

Shaibal Chakrabarty

Privacy Preserving Data Mining Introduction August 2 nd , 2013. Shaibal Chakrabarty. PPDM - Context. Motivation : Inherent tension in mining sensitive databases: We want to release aggregate information about the data , without leaking individual information about participants.

dswinehart
Download Presentation

Shaibal Chakrabarty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Preserving Data MiningIntroduction August 2nd, 2013 Shaibal Chakrabarty

  2. PPDM - Context Motivation: Inherent tension in mining sensitive databases: We want to release aggregate information about the data, without leaking individual information about participants. • Aggregate info: Number of A students in a school district. • Individual info: If a particular student is an A student. Problem: Exact aggregate info may leak individual info. Eg: Number of A students in district, and Number of A students in district not named Dan Waymel Goal: Method to protect individual info, release aggregate info.

  3. PPDM – What and How A growing number of data mining applications need to deal with data sources that are distributed, possibly proprietary, and sensitive to privacy. Financial transactions, health-care records, and network communication traffic are a few examples. Privacy is also becoming an increasingly important issue in data mining applications for counter-terrorism and homeland defense that may require creating profiles, constructing social network models, detecting terrorist communications from distributed privacy sensitive multi-party data. Combining such diverse data sets belonging to different parties may violate the privacy laws. Therefore we need algorithms that can mine the data while guaranteeing that the privacy of the data is not compromised. This has resulted in the development of several privacy-preserving data mining techniques. Many of these techniques work using randomized techniques to perturb the data and preserve the data privacy while still guaranteeing the invariance of the underlying patterns.

  4. Perturbation Based Approaches for Privacy Preserving Data Mining Goal: Distort data while still preserve some properties for data mining propose. • Additive Based • Multiplicative Based • Condensation based • Decomposition • Data Swapping

  5. PPDM - Methods Randomization approach Hide the original data by randomly modifying the data values using some additive noise still preserving the patterns of the original data (preserving the underlying probabilistic properties) Reconstruct the distribution of original data values from the perturbed data. Cannot reconstruct original values A decision tree classifier is built from the perturbed data from this reconstructed distribution. Privacy breaches Cryptographic approach– Party X –owns Database D1, Party Y –owns Database D2 Build a decision tree built on D1 and D2 without revealing information about D1 to party Y and about D2 to party X except what might be revealed from the decision tree. Horizontally partitioned data - Records (entities) split across parties Vertically partitioned data - Attributes split across parties

  6. PPDM – The Math (Randomization) Randomization Reconstruction

  7. PPDM – Who’s Who Agrawal R., Srikant R. Privacy-Preserving Data Mining. ACM SIGMOD Conference, 2000. “Random Data Perturbation Techniques and Privacy Preserving Data Mining”–HillolKargupta, SouptikGupta, QiWang, KrishnamoorthySivakumar C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations 4(2), January 2003. Privacy Preserving Cooperative Statistical Analysis –WenliangDu, MikhailJ. Atallah Defining Privacy for Data Mining –Chris Clifton, MuratKantarcioglu, JaideepVaidya Data Mining : Concepts and Techniques –JiaweiHan, MichelineKamber

  8. Future Directions Perturbation Based Approaches for Privacy Preserving Data Mining Privacy is a personal choice, so should enable individual adaptable (Liu, Kantarcioglu and Thuraisingham ICDM’06)

More Related