Balancing Privacy and Utility: Comparing Randomization and K-Degree Anonymization in Social Network Publishing

Comparisons of Randomization and K-degree Anonymization Schemes for Privacy Preserving Social Network Publishing Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France

Motivation • Privacy Preserving Social Network Publishing • node-anonymization • cannot guarantee identity/link privacy due to subgraph queries. • Backstrom et al. WWW07, Hay et al. UMass TR07 • edge randomization • Random Add/Del, Random Switch • K-anonymity generalization • Hay et al. VLDB08, K-degree Liu&Terzi SIGMOD08, Zhou&Pei ICDE08 • Utility preserving randomization • Spectral feature preserving Ying&Wu SDM08 • Real space feature preserving based on Markov Chain Ying&Wu SDM09, Hanhijarvi et al. SDM09

Motivation • Attacks based on Background Knowledge • Attributes of vertices • Vertex degrees • Specific link relationships between target individuals • Neighborhoods of target individuals • Embedded subgraphs • Graph metric

Focus • We quantify identity disclosure and link disclosure under vertex degrees attacks for Rand Add/Del. • Identity disclosure is measured as the prob. of correctly linking a target individual to an anonymized node given the degree of the target individual. • Link disclosure as the prob. of existence of a sensitive link between two individuals given their known degrees. Details skipped • We compare Rand Add/Del with K-degree generalization in terms of utility preservation (under the same privacy disclosure threshold, i.e., 1/K)

Political books network Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". http://www-personal.umich.edu/˜mejn/netdata/

Degree variation due to randomization

Re-identification risks • Applying Bayesian Theorem The attacker does not know the original degree distribution.

Estimate original degree sequence Original degree sequence After randomization Estimated Add & delete 10% edges

Node re-identification risks • Nodes’ prior and posterior risks Given an individual α with degree dα and a randomized graph • Prior risk: • Posterior risks

Re-identification risks • Re-identification risks reduces as k increases; • Add/Del strategy can efficiently reduce the risk.

Protection vs. randomization k • Node’s absolute and relative protection measures • Absolute measure • Relative measure

Comparison • K-degree generalization(Liu&Terzi SIGMOD08) • to construct a K-degree anonymous graph where every node has the same degree with at least K-1 other nodes. • Random Add/Del • Determine perturbation magnitude k to satisfy identity disclosure < 1/K, and then perturb graph using k.

Utility features • Largest eigenvalue of Adjacency matrix: λ1 • Second smallest eigenvalue of Laplacian matrix: μ2 • Harmonic mean of shortest distance: • Modularity (community structure) • Transitivity(cluster coefficient) • Subgraph centrality

Observation • Both Rand Add/Del and K-degree generalization decrease structural properties. • K-degree generally better preserves structural features • K-degree chooses a subset of nodes ( which violate K-degree anonymity) for edge modification while Rand Add/Del treats all nodes/edges equally for randomization • We can improve Rand Add/Del by dividing the graph into blocks and apply randomization on each block. (next slide) • We expect Rand Add/Del is more robust to other attacks. (ongoing work) • We expect reconstruction methods can be designed on the purely randomized graph to recover features accurately. (ongoing work)

Block Add/Del

Conclusion • Quantify how well Rand Add/Del can protect node identity and link privacy under the vertex degree background knowledge attack • Compare with K-degree generalization scheme in terms of utility preservation Future Work • Other background knowledge attacks • Other randomization schemes • Reconstruction methods on the randomized graph

Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.

Balancing Privacy and Utility: Comparing Randomization and K-Degree Anonymization in Social Network Publishing

Balancing Privacy and Utility: Comparing Randomization and K-Degree Anonymization in Social Network Publishing

Presentation Transcript

University of North Carolina - Charlotte

Dr. Zongwu Cai University of North Carolina at Charlotte

Real Estate Classes Charlotte North Carolina

University of North Carolina at Charlotte

Meredith DiPietro , PhD U niversity of North Carolina at Charlotte

Haitao Zhang Assist. Prof., U. of North Carolina at Charlotte

Yang Zhang, Xin-Yu Wen, and Ying Pan North Carolina State University, Raleigh, NC 27695

Guang Guo Department of Sociology University of North Carolina at Chapel Hill

Guo Ling , MD, PhD Department of Anatomy

Differential Privacy Xintao Wu Oct 31, 2012

Xiaowei Ying, Kai Pan, Xintao Wu , Ling Guo Univ. of North Carolina at Charlotte

CCIM – Forecast Charlotte, North Carolina

Evaluating TDM: Charlotte, North Carolina Case Study

University of North Carolina at Charlotte CTL Summer Institute Friday, May 21, 2010

Anping Wu Susannah Benedetti University of North Carolina at Wilmington OLAC Biennial Conference

The Mid-Career Mentoring Program at the University of North Carolina at Charlotte

Nicholas Calcanes Resident of Charlotte, North Carolina

Medicare Supplement Plan Durham, Charlotte North Carolina

Craig Richardville - From Charlotte, North Carolina

Best Charlotte magazine in North Carolina - Relocationguide.biz

Scott Kissau, PhD, University of North Carolina at Charlotte

Benjamin Lok University of North Carolina at Charlotte Samir Naik Disney VR Studios