1 / 1

Anonymizing Data with Quasi-Sensitive Attribute Values

Anonymizing Data with Quasi-Sensitive Attribute Values. Pu Shi 1 , Li Xiong 1 , Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2 CIISE, Concordia University, Montreal, QC, Canana. Definitions. Problem Statement. Preliminary Results.

laken
Download Presentation

Anonymizing Data with Quasi-Sensitive Attribute Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi1, Li Xiong1, Benjamin C. M. Fung2 1Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2CIISE, Concordia University, Montreal, QC, Canana Definitions Problem Statement Preliminary Results The external knowledgetableE has each row as a pair (Li, Si), i = 1, 2, ..., |E|, where Li is a sensitive label and Si is a corresponding set of QS values. All sensitive labels that can be linked to the d tuples in a QI group G with quasi-identifying (QI) vector q is ∪di=1K(tpi), the sensitive label set of G.The attacker’s prior belief α(q,L) and posterior belief β(q,L)are the probabilities that a target tp with QI-vectorq is linked to a label L before and after the data release. Definition (QS (c,l)-diversity). A group G satisfies QS (c,l)-diversity if and only if p1 ≤c (pl + pl +1 + ... + p|∪di=1K(tpi)|), where p1, p2, ..., p |∪di=1K(tpi)|are the values of β(q,Li) in decreasing order. A table D∗ satisfies QS (c,l)-diversity if every group satisfies QS (c,l)-diversity. Definition (QS t-closeness). A group G satisfies QS t-closeness if and only if the distance between α(q,L) and β(q,L)is no more than a threshold t. A table D∗ satisfies QS t-closeness if every group satisfies QS t-closeness. We study the problem of anonymizing microdata with quasi-sensitive (QS) attributes which are not sensitive by themselves, but can be linked to external knowledge to reveal indirect sensitive information of an individual. (a) Original microdata with quasi-sensitive attribute symptoms (b) External knowledge that maps symptoms to disease (c) A generalized table that cannot prevent indirect disclosure of disease through symptoms Figure 1. Anonymizing data with QS attributes With the Mondrian generalization and our suppression algorithm implemented in C++, we conducted experiments with: 1) a dataset with 3000 tuples augmented from the Adult dataset, with 8 QI attributes and 9 synthesized QS terms per tuple, and 2) an external table with 3000 pieces of knowledge labels linked to random QSterms with Poison distribution. Figure 4. QS suppression for QS (c,l)-diversity showing adaptive QS suppression outperforms baseline DFS search significantly Algorithm Phase 1 (QI generalization). Given D, an intermediate dataset Dg is obtained that satisfies k-anonymity. Phase 2 (QS suppression). Given Dg, a suppression algorithm is used to remove proper QSvalues (items) until every QI group satisfies QS (c,l)-diversity or QS t-closeness. Contributions • Greedy search heuristics with dynamic reordering of tailsets that contain potential values to be removed in the next step to enable quick return of result • Dynamic updates when a solution with a lower cost is found to enable continuous improvement of the result within a bounded time period. • Figure 2. Disclosure risks with QS attributes • Formal notions of QS l-diversity and QS t-closeness that extend l-diversity and t-closeness to prevent indirect attribute disclosure due to QS attribute values. • A two-phase algorithm that combines generalization and value suppression to achieve QS l-diversity and QS t-closeness. Figure 5. Two-phase algorithm for QS t-closeness showing the trade-off between better privacy and smaller removal cost and benefit of the two-phase algorithm compared to generalization only approach. Figure 3. QS suppression search tree and algorithm features

More Related