1 / 14

Deriving Private Information from Association Rule Mining Results

Deriving Private Information from Association Rule Mining Results. Zutao Zhu, Guan Wang, and Wenliang Du ICDE 2009. Outline. Motivation Problem Formulation Maximum Entropy Modeling Deriving Constraints From Association Rules Deriving Constraints From NAR-Association Rules Algorithm

whitby
Download Presentation

Deriving Private Information from Association Rule Mining Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE 2009

  2. Outline • Motivation • Problem Formulation • Maximum Entropy Modeling • Deriving Constraints From Association Rules • Deriving Constraints From NAR-Association Rules • Algorithm • Conclusion

  3. Motivation • Data publishing can provide enormous benefits to the society, however, due to privacy concerns, data cannot be published in their original forms. • To publish the sanitized version of the original data. • To publish the aggregate information from the original data, such as data mining results. • The objective of this paper is to develop a systematic method to quantify privacy disclosure caused by the publishing of data mining results.

  4. (Cont.) • Assumptions • The original dataset consists of two parts: • QI (Quasi-identifier) attributes • SA (Sensitive Attributes) Assume that adversaries have all the data of the QI attributes. Assume that adversaries know the domain of the SA.

  5. (Cont.) • The goal of privacy-preserving data publishing is to prevent adversaries from inferring any individual’s SA information, while making the published information as useful as possible. • Linking attack • The severity of linking attacks is decided by the conditional probability P(SA|QI). • While P(SA|QI) →1, the more certain adversaries can infer the SA value of an individual with QI.

  6. (Cont.) • Min_sup =0.3, and min_conf =0.8 • The domain of Salary is {50K+,50K-}. • The useful association rules are those of pattern QI →SA. • We can directly derive P(SA|QI) and P(QI,SA) from publishing association rules in Figure (b). • Even if the exact conf. and sup. of each rule is suppressed from the disclosure, we can still derive the inequalities.

  7. (Cont.) • If QI →SA is not an association rule, it also gives adversaries useful information. • Min_sup=0.6, min_conf=0.9 • The pattern “Gender = Female →Salary = 50K+” is not published.

  8. Problem Formulation • Let D be the original data set that is used to generate the data mining results (Ω). Let variable X represent SA attributes, and variable Q represent QI attributes. Given Ω and the QI part of all the records in D, derive P(X|Q) for all the combinations of Q and X values.

  9. (Cont.) • We treat P(X|Q) as a variable for each combination of X ∈ SA and Q∈ QI . • The goal of deriving P(X|Q) is to assign probability values to these variables. • Data mining results contain information about P(X|Q), so the assignment of these probability variables should be consistent with the information embedded in the data mining results. • The embedded information can be formulated as constraints, which are in the forms of equations or inequalities.

  10. Maximum Entropy (ME) principle • According to the principle of ME, when the entropy of these variables is maximized, the inference is the most unbiased. • Our problem becomes finding a distribution of P(X|Q), such that the following conditional entropy H(X|Q) is maximized.

  11. Deriving Constraints From Association Rules • To estimate P(X|Q) based on data mining results, we need to convert the knowledge embedded into equations or inequalities using P(X|Q) or P(Q, X) as variables. • We call these equations and inequalities ME constraints. • AR-constraints: two potential scenarios • Withhold the exact support and confidence. • With the exact support and confidence.

  12. Deriving Constraints From Non-Association Rules • If Q→X is not one of the published association rules, we can derive the following constraints:

  13. Algorithm to derive AR- and NAR-Constraints • Apriori-based algorithm

  14. Conclusion • It propose a quantitative analysis for the information disclosure of data mining results. • Thinking: • Sanitizing the original datasets before publishing data mining results. • Disguising the association rule, such that the privacy-preserving.

More Related