Anonymizing Data with Quasi-Sensitive Attribute Values

Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi1, Li Xiong1, Benjamin C. M. Fung2 1Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2CIISE, Concordia University, Montreal, QC, Canana Definitions Problem Statement Preliminary Results The external knowledgetableE has each row as a pair (Li, Si), i = 1, 2, ..., |E|, where Li is a sensitive label and Si is a corresponding set of QS values. All sensitive labels that can be linked to the d tuples in a QI group G with quasi-identifying (QI) vector q is ∪di=1K(tpi), the sensitive label set of G.The attacker’s prior belief α(q,L) and posterior belief β(q,L)are the probabilities that a target tp with QI-vectorq is linked to a label L before and after the data release. Definition (QS (c,l)-diversity). A group G satisfies QS (c,l)-diversity if and only if p1 ≤c (pl + pl +1 + ... + p|∪di=1K(tpi)|), where p1, p2, ..., p |∪di=1K(tpi)|are the values of β(q,Li) in decreasing order. A table D∗ satisfies QS (c,l)-diversity if every group satisfies QS (c,l)-diversity. Definition (QS t-closeness). A group G satisfies QS t-closeness if and only if the distance between α(q,L) and β(q,L)is no more than a threshold t. A table D∗ satisfies QS t-closeness if every group satisfies QS t-closeness. We study the problem of anonymizing microdata with quasi-sensitive (QS) attributes which are not sensitive by themselves, but can be linked to external knowledge to reveal indirect sensitive information of an individual. (a) Original microdata with quasi-sensitive attribute symptoms (b) External knowledge that maps symptoms to disease (c) A generalized table that cannot prevent indirect disclosure of disease through symptoms Figure 1. Anonymizing data with QS attributes With the Mondrian generalization and our suppression algorithm implemented in C++, we conducted experiments with: 1) a dataset with 3000 tuples augmented from the Adult dataset, with 8 QI attributes and 9 synthesized QS terms per tuple, and 2) an external table with 3000 pieces of knowledge labels linked to random QSterms with Poison distribution. Figure 4. QS suppression for QS (c,l)-diversity showing adaptive QS suppression outperforms baseline DFS search significantly Algorithm Phase 1 (QI generalization). Given D, an intermediate dataset Dg is obtained that satisfies k-anonymity. Phase 2 (QS suppression). Given Dg, a suppression algorithm is used to remove proper QSvalues (items) until every QI group satisfies QS (c,l)-diversity or QS t-closeness. Contributions • Greedy search heuristics with dynamic reordering of tailsets that contain potential values to be removed in the next step to enable quick return of result • Dynamic updates when a solution with a lower cost is found to enable continuous improvement of the result within a bounded time period. • Figure 2. Disclosure risks with QS attributes • Formal notions of QS l-diversity and QS t-closeness that extend l-diversity and t-closeness to prevent indirect attribute disclosure due to QS attribute values. • A two-phase algorithm that combines generalization and value suppression to achieve QS l-diversity and QS t-closeness. Figure 5. Two-phase algorithm for QS t-closeness showing the trade-off between better privacy and smaller removal cost and benefit of the two-phase algorithm compared to generalization only approach. Figure 3. QS suppression search tree and algorithm features

Anonymizing Data with Quasi-Sensitive Attribute Values

Anonymizing Data with Quasi-Sensitive Attribute Values

Presentation Transcript

Data analysis with missing values sociology.ohio-state

Managing Sensitive Data

Are Protected Values Quantity Sensitive?

Associating Attribute Data In GIS

What Is Sensitive Data?

DATA MANAGEMENT: ATTRIBUTE COMPONENT

De- anonymizing Data

Attribute Data and Map Types

Geospatial Attribute Data

Attribute Data

Geospatial Attribute Data

Anonymizing Location-based data

Managing sensitive data

Integration of attribute data

Anonymizing Sequential Releases

A dialogue with FMUG: Sensitive Data & Filemaker

Chapter 9: ATTRIBUTE DATA INPUT AND MANAGEMENT 9.1 Attribute Data in GIS

FEATURE Attribute Attribute

Topics Related to Attribute Values

Ch. 5? Each attribute implies a domain (data type, set of values)

Constraining Attribute Values

Spatial and Attribute Data Management

Anonymizing Data with Quasi-Sensitive Attribute Values