a tutorial of privacy preservation of graphs and social networks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Tutorial of Privacy-Preservation of Graphs and Social Networks PowerPoint Presentation
Download Presentation
A Tutorial of Privacy-Preservation of Graphs and Social Networks

Loading in 2 Seconds...

play fullscreen
1 / 127

A Tutorial of Privacy-Preservation of Graphs and Social Networks - PowerPoint PPT Presentation


  • 184 Views
  • Uploaded on

A Tutorial of Privacy-Preservation of Graphs and Social Networks. Xintao Wu, Xiaowei Ying University of North Carolina at Charlotte. National Freedom of Information. Data Protection Laws. National Laws. USA HIPAA for health care Passed August 21, 96

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Tutorial of Privacy-Preservation of Graphs and Social Networks' - selia


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a tutorial of privacy preservation of graphs and social networks

A Tutorial of Privacy-Preservation of Graphs and Social Networks

Xintao Wu, Xiaowei Ying

University of North Carolina at Charlotte

national laws
National Laws
  • USA
    • HIPAA for health care
      • Passed August 21, 96
      • lowest bar and the States are welcome to enact more stringent rules
        • California State Bill 1386
    • Grann-Leach-Bliley Act of 1999 for financial institutions
    • COPPA for childern’s online privacy
    • etc.
  • Canada
    • PIPEDA 2000
      • Personal Information Protection and Electronic Documents Act
      • Effective from Jan 2004
  • European Union (Directive 94/46/EC)
    • Passed by European Parliament Oct 95 and Effective from Oct 98.
    • Provides guidelines for member state legislation
    • Forbids sharing data with states that do not protect privacy
privacy breach
Privacy Breach
  • AOL's publication of the search histories of more than 650,000 of its users has yielded more than just one of the year's bigger privacy scandals. (Aug 6, 2006)

That database does not include names or user identities. Instead, it lists only a unique ID number for each user. AOL user 710794

    • an overweight golfer, owner of a 1986 Porsche 944 and 1998 Cadillac SLS, and a fan of the University of Tennessee Volunteers Men's Basketball team.
    • interested in the Cherokee County School District in Canton, Ga., and has looked up the Suwanee Sports Academy in Suwanee, Ga., which caters to local youth, and the Youth Basketball of America's Georgia affiliate.
    • regularly searches for "lolitas," a term commonly used to describe photographs and videos of minors who are nude or engaged in sexual acts.

Source: AOL's disturbing glimpse into users' lives By Declan McCullough , CNET News.com,

August 7, 2006, 8:05 PM PDT

privacy preserving data mining
Privacy Preserving Data Mining
  • Data mining
    • The goal of data mining is summary results (e.g., classification, cluster, association rules etc.) from the data (distribution)
  • Individual Privacy
    • Individual values in database must not be disclosed, or at least no close estimation can be got by attackers
    • Contractual limitations: privacy policies, corporate agreements
  • Privacy Preserving Data Mining
    • How to transform data such that
      • we can build a good data mining model (data utility)
      • while preserving privacy at the record level (privacy)?
ppdm on tabular data
PPDM on Tabular Data

69% unique on zip and birth date

87% with zip, birth date and gender

Generalization (k-anonymity, L-diversity, t-closeness etc.) and Randomization

Refer to a survey book [Aggarwal, 08]

ppdm tutorials on tabular data
PPDM Tutorials on Tabular Data
  • Privacy in data system, RakeshAgrawal, PODS03
  • Privacy preserving data mining, Chris Clifton, PKDD02, KDD03
  • Models and methods for privacy preserving data publishing and analysis, Johannes Gehrke, ICDM05, ICDE06, KDD06
  • Cryptographic techniques in privacy preserving data mining, HelgerLipmaa, PKDD06
  • Randomization based privacy preserving data mining, Xintao Wu, PKDD06
  • Privacy in data publishing, Johannes Gehrke & AshwinMachanavajjhala, S&P09
  • Anonymized data: genertion, models, usage, Graham Cormode & DiveshSrivastava, SIGMOD09
social network1
Social Network

Network of US political books

(105 nodes, 441 edges)

Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative".

social network2
Social Network
  • Network of the political blogs on the 2004 U.S. election (polblogs, 1,222 nodes and 16,714 edges)
social network3
Social Network
  • Collaboration network of scientists [Newman, PRE06]
more social network data
More Social Network Data
  • Newman’s collection
    • http://www-personal.umich.edu/~mejn/netdata/
  • Enron data
    • http://www.cs.cmu.edu/~enron/
  • Stanford large network dataset collection
    • http://snap.stanford.edu/data/index.html
graph mining
Graph Mining
  • A very hot research area
    • Graph properties such as degree distribution
    • Motif analysis
    • Community partition and outlier detection
    • Information spreading
    • Resiliency/robustness, e.g., against virus propagation
    • Spectral analysis
  • Research development
    • “Managing and mining graph data” by Aggarwal and Wang, Springer 2010.
    • “Large graph-mining: power tools and a practitioner’s guide” by Faloutsos et al. KDD09
network science and privacy
Network Science and Privacy

Source: Jeannette Wing, Computing research: a view from DC, SNOWBIRD, 2008

outline
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
social network data publishing
Social Network Data Publishing

Data miner

  • Data owner

release

threat of re identification
Threat of Re-identification
  • Attacker

attack

Ada’s sensitive information is disclosed.

  • Privacy breaches
    • Identify disclosure
    • Link disclosure
    • Attribute disclosure
deriving personal identifying information gross wpes05
Deriving Personal Identifying Information [Gross WPES05]
  • User profiles (e.g., photo, birth date, residence, interests, friend links) can be used to estimate personal identifying information such as SSN.

### - ## - ####

  • Users should pay attention to (default) privacy preference settings of online social networks.

Sequential no

Determined by zip code

Group no

https://secure.ssa.gov/apps10/poms.nsf/lnx/0100201030

active and passive attacks backstorm www07
Active and Passive Attacks [Backstorm WWW07]
  • Active attack outline
    • Join the network by creating some new user accounts;
    • Establish a highly distinguishable subgraph H among the attacking nodes;
    • Send links to targeted individuals from the attacking nodes;
    • In the released graph, identify the subgraph H among the attacking nodes;
    • The targeted individuals and their links are then identified.
active and passive attacks backstorm www071
Active and Passive Attacks [Backstorm WWW07]
  • Active attacks & subgraph H

The active attack is based on the subgraph H among the attackers:

    • No other subgraphs isomorphic to H;
    • Subgraph H has no non-trivial automorphism
    • Efficient to identify H regardless G;
active and passive attacks backstorm www072
Active and Passive Attacks [Backstorm WWW07]
  • Passive attacks outline
    • Observation: most nodes in the network already form a uniquely identifiable subgraph.
    • One adversary recruits k-1 of his neighbors to form the subgraph H of size k.
    • Work similarly to active attacks.

Drawback: Uniqueness of H is not guaranteed.

attacks by structural queries hay vldb08
Attacks by Structural Queries [Hay VLDB08]
  • Structural queries:

A structural query Q represent complete or partial structural information of a targeted individual that may be available to adversaries.

  • Structural queries and identity privacy:
attacks by structural queries hay vldb081
Attacks by Structural Queries [Hay VLDB08]
  • Degree sequence refinement queries
attacks by structural queries hay vldb082
Attacks by Structural Queries [Hay VLDB08]
  • Subgraph queries

The adversary is capable of gathering a fixed number of

edges around the targeted individual.

  • Hub fingerprint queries

A hub is a central node in a network. A hub fingerprint of node v is the node's connections to a set of designated hubs within a certain distance.

attacks by combining multiple graphs narayanan issp09
Attacks by Combining Multiple Graphs [Narayanan ISSP09]
  • Attack outline:
    • The attacker has two type of auxiliary information:
      • Aggregate: an auxiliary graph whose members overlap with the anonymized target graph
      • Individual: the detailed information on a very small number of individuals (called seeds) in both the auxiliary graph and the target graph.
    • Identify seeds in the target graph.
    • Identify more nodes by comparing the neighborhoods of the de-anonymized nodes in the auxiliary graph and the target graph (propagation).
deriving link structure of entire network korolova icde08
Deriving Link Structure of Entire Network [Korolova ICDE08]
  • A different threat in which
    • An adversary subverts user accounts to get local neighborhoods and pieces them together to build the entire network.
    • No underlying network is released.
  • A registered user often can see all the links and nodes incident to him within distance d from him.
    • d=0 if a user can see who he links to.
    • d=1 if a user can also see who links to all his friends.
  • Analysis showed that the number of local neighborhoods needed to cover a fraction of the entire network drops exponentially with increase of the lookahead parameter d.
outline1
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
privacy preserving social network publishing
Privacy Preserving Social Network Publishing
  • Naïve anoymization is not sufficient to prevent privacy breaches, mainly due to link structure based attacks.
  • Graph topology has to be modified via
    • Adding/deleting edges/nodes
    • Grouping nodes/edges into super-nodes and super-edges
  • How to quantify utility loss and privacy preservation in the perturbed (and anonymized) graph?
graph utility
Graph Utility
  • Utility heavily depends on mining tasks.
  • It is challenging to quantify the information loss in the perturbed graph data.
    • Unlike tabular data, we cannot use the sum of the information loss of each individual record.
    • We cannot use histograms to approximate the distribution of graph topology.
  • It is more challenging when considering both structure change and node attribute change.
graph utility1
Graph Utility
  • Topological features:
    • Structural characteristics of the graph.
    • Various measures form different perspectives.
    • Commonly used.
  • Spectral features:
    • Defined as eigenvalues of the graph's adjacency matrix or other derived matrices.
    • Closely related to many topological features.
    • Can provide global graph measures.
  • Aggregate queries:
    • Calculate the aggregate on some paths or subgraphs satisfying the query condition.
    • E.g.: the average distance from a medical doctor vertex to a teacher vertex in a network.
topological features
Topological Features
  • Topological features of networks
      • Harmonic mean of shortest distance
      • Transitivity(cluster coefficient)
      • Subgraph centrality
      • Modularity (community structure)
      • And many others (refer to: F. Costa et al., Characterization of Complex Networks: A Survey of measurements, 2006)
graph and matrix
Graph and Matrix
  • Adjacency matrix
    • For a undirected graph, A is a symmetric;
    • No self-links, diagonal entries of A are all 0
    • For a un-weighted graph, A is a 0-1 matrix
spectral features
Spectral Features
  • Spectral features of networks
  • Adjacency spectrum

Laplacian spectrum

topological vs spectral features
Topological vs. Spectral Features
  • Adjacency and Laplacian spectrum:
    • The maximum degree, chromatic number, clique number etc. are related to ;
    • Epidemic threshold of the virus propagates in the network is related to ;
    • Laplacian spectrum indicates the community structure:
      • k disconnected communities:
      • k loosely connected communities:
topological vs spectral features1
Topological vs. Spectral Features
  • Laplacian spectrum & communities

Disconnected communities:

Loosely connected communities:

topological vs spectral features2
Topological vs. Spectral Features
  • Eigenspace[Ying SDM09]
topological vs spectral features3
Topological vs. Spectral Features
  • Topological & spectral features are related
    • No. of triangles:
    • Sub-graph centrality:
    • Graph diameter:
outline2
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
k anonymity privacy preservation
K-anonymity Privacy Preservation
  • K-anonymity (Sweeney)
    • Each individual is identical with at least K-1 other individuals
  • A general definition for network data [Hay, VLDB08]
k anonymity
K-anonymity
  • Each node is identical with at least K-1 other nodes under topology-based attacks.
  • The adversary is assumed to have some knowledge of the target user:
    • node degree (K-degree)
    • (immediate) neighborhood (K-neighborhood)
    • arbitrary subgraph (K-automorphism etc.)
  • K-anonymity approach guarantees that no node in the released graph can be linked to a target individual with success prob. greater than 1/K.
k degree anonymity liu sigmod08
K-degree Anonymity [Liu SIGMOD08]
  • Attacking model:

The attackers know the degree of the targeted individual

k degree anonymity liu sigmod081
K-degree Anonymity [Liu SIGMOD08]
  • K-degree anonymous:
  • Optimize utility: minimize no. of added edges
k neighborhood anonymity zhou icde08
K-neighborhood Anonymity [Zhou ICDE08]
  • Attacking model

The attackers know the immediate neighborhood of a targeted individual.

k neighborhood anonymity zhou icde082
K-neighborhood Anonymity [Zhou ICDE08]
  • Algorithm outline
    • Extract the neighborhoods of all vertices in the network.
    • Compare and test all neighborhoods by neighborhood component coding
    • Organize vertices into groups and anonymize the neighborhoods of vertices in the same group until the graph satisfies K-neighborhood anonymity.

1-neighborhood of Ada

Naively Anonymized Graph

K-neighborhood Anonymity

k neighborhood anonymity zhou icde083
K-neighborhood Anonymity [Zhou ICDE08]
  • Graph utility:
      • The nodes have hierarchical label information.
      • Two ways to anonymize the neighborhoods: generalizing labels and adding edges.
      • Answer aggregate network queries as accurate as possible.
k automorphism anonymity zou vldb09
K-automorphism Anonymity [Zou VLDB09]
  • Attacking model:

The attackers can know any subgraph that contains the targeted individual.

    • Graph automorphism
k automorphism anonymity zou vldb092
K-automorphism Anonymity [Zou VLDB09]
  • Algorithm outline
    • Partition graph G into several groups of subgraphs, each group contains at least Ksubgraphs, and no subgraphs share a node.
    • Block Alignment: make subgraphs within each group isomorphic to each other.
    • Edge Copy: copy the edges across the subgraphs properly.
k symmetry model wu edbt10
K-symmetry Model [Wu EDBT10]
  • Attacking model:

The attackers can know any subgraph that contains the targeted individual.

    • K-symmetry approach:
    • A concept similar to K-automorphism (equivalent?)
    • Make graph K-symmetry by adding fake nodes
k isomorphism model cheng sigmod10
K-isomorphism Model [Cheng SIGMOD10]
  • Attacking model:

The attackers can know any subgraph that contains the targeted individual.

  • Insufficient protection on link privacy by K-automorphism approach

Example: the adversary can not identify Alice or Bob, but there must be a link between them.

k anonymity1
K-anonymity

Privacy

protection

K-security

K-automorphism

K-symmetry

K-neighborhood

K-degree

Utility preservation

k obfuscation bonchi icde111
K-obfuscation [Bonchi ICDE11]

Both cases respect 2-candidate anonymity.

  • K-candidate aims at guaranteeing a lower bound on the amount of uncertainty.
  • K-obfuscation measures the uncertainty.
  • The obfuscation level quantified by means of the entropy is always no less than the one based on a-posteriori belief probabilities.
outline3
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
generalization approach hay vldb08
Generalization Approach [Hay VLDB08]
  • Generalize nodes into super nodes and edges into super edges
generalization approach hay vldb081
Generalization Approach [Hay VLDB08]
  • The size of possible graph world:
  • Maximize the graph likelihood function

Simulated annealing algorithm

    • Start with a single partition containing all nodes;
    • Update state by splitting/merge partitions or move a node to a new partition.
anonymizing rick graph bhagat vldb09
Anonymizing Rick Graph [Bhagat VLDB09]
  • Hyper-graph

G(V,I,E) represents multiple types of interactions between entities

  • Attacking model

The attackers know part of the links and nodes in the graph

anonymizing rick graph bhagat vldb091
Anonymizing Rick Graph [Bhagat VLDB09]
  • Algorithm outline
    • Sort the nodes according to the attributes;
    • Group nodes into super nodes satisfying class safety property, and the size of each supper node is greater than K.

Class safety: each node cannot have interactions with two or more nodes from the same group

    • Replace the node identifiers by the label list.
outline4
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
basic graph randomization operations
Basic Graph Randomization Operations
  • Rand Add/Del: randomly add k false edges and delete k true edges (no. of edges unchanged)
  • Rand Switch: randomly switch a pair of edges, and repeat it for k times (nodes’ degree unchanged)
randomization
Randomization
  • Randomized response model [Warner 1965]

: Cheated in the exam : Didn’t cheat in the exam

Cheated in exam

Purpose

Purpose: Get the proportion( ) of population

members that cheated in the exam.

  • Procedure:

“Yes” answer

Didn’t cheat

Randomization device

Do you belong to A? (p)

Do you belong to ?(1-p)

“No” answer

As:

Unbiased estimate:

randomization agarawal sigmod00
Randomization [Agarawal SIGMOD00]

Alice’s age

30 | 70K | ...

50 | 40K | ...

...

Add random number to Age

Randomizer

Randomizer

65 | 20K | ...

25 | 60K | ...

...

30 becomes 65 (30+35)

Reconstruct

Distribution

of Age

Reconstruct

Distribution

of Salary

...

Classification

Algorithm

Model

reconstruction
Reconstruction

[Agarawal SIGMOD00]

  • Given
    • x1+y1, x2+y2, ..., xn+ynwhere xiare original values.
    • the probability distribution of noise Y
  • Estimate the probability distribution of X.
randomization on graph
Randomization on Graph
  • Link privacy
    • the prob. of existence a link (i,j) given the perturbed graph
  • Feature preservation randomization
    • Spectrum preserving randomization
    • Markov chain based feature preserving randomization
  • Reconstruction from randomized graph
link privacy posterior beliefs ying pakdd09
Link Privacy: Posterior Beliefs [Ying PAKDD09]
  • Prior probability:
  • Posterior probabilities
link privacy posterior beliefs ying pakdd091
Link Privacy: Posterior Beliefs [Ying PAKDD09]
  • Posterior probability and similarity measures

Similarity and proportion of existing edges – before randomization

Similarity and proportion of existing edges – after randomization

link privacy posterior beliefs ying pakdd092
Link Privacy: Posterior Beliefs [Ying PAKDD09]
  • Posterior probability and similarity measures
  • How to calculate posterior probability for general cases?

prior prob.

posterior prob. I

posterior prob. II

link privacy graph space ying sdm09
Link Privacy: Graph Space [Ying SDM09]
  • Exploit graph space to breach link privacy
link privacy graph space ying sdm091
Link Privacy: Graph Space [Ying SDM09]
  • Sample the graph space when the space is large
        • Start with the randomized graph, construct a Markov chain, and uniformly sample the graph space.
        • Generate N uniform graph samples

Empirical evaluations show that node pairs with highest probabilities have serious link disclosure risk (as high as 90%).

graph features under pure randomization
Graph Features Under Pure Randomization
  • Topological and spectral features change significantly along the randomization.

Can we better preserve the network structure?

(Networks of US political books, 105 nodes and 441 edges)

spectrum preserving randomization ying sdm08
Spectrum Preserving Randomization [Ying SDM08]
  • Graph spectrum is related to many real graph features.
  • Preserve graph features by preserving some eigenvalues.
spectrum preserving randomization ying sdm081
Spectrum Preserving Randomization [Ying SDM08]
  • Spectral Switch (apply to adjacency matrix):

Up-switch to increase the eigenvalue:

Down-switch to decrease the eigenvalue:

spectrum preserving randomization ying sdm082
Spectrum Preserving Randomization [Ying SDM08]
  • Spectral Switch (apply to Laplacian matrix):

Down-switch to decrease the eigenvalue:

Up-switch increase the eigenvalue:

markov chain based feature preserving randomization ying sdm09
Markov Chain Based Feature Preserving Randomization [Ying SDM09]
  • Preserve any graph feature S(G) within a small range
  • Feature range constraint specified by the user
  • Markov chain with feature range constraint

(uniformity on accessible graphs)

markov chain based feature preserving randomization ying sdm091
Markov Chain Based Feature Preserving Randomization [Ying SDM09]
  • Feature constraint can be used to breach link privacy

Original

Released

reconstruction from randomized graph wu sdm10
Reconstruction from Randomized Graph [Wu SDM10]
  • Motivation

From the randomized graph, can we reconstruct a graph whose features are closer to the true features?

reconstruction from randomized graph wu sdm101
Reconstruction from Randomized Graph [Wu SDM10]
  • Low rank approximation approach
    • Best rank r approximation by eigen-decomposition:
    • Discretize the low rank matrix
reconstruction from randomized graph wu sdm102
Reconstruction from Randomized Graph [Wu SDM10]
  • Effect of including significant negative eigenvalues

Original

r = 1

r = 2

r = 4

83

reconstruction from randomized graph wu sdm104
Reconstruction from Randomized Graph [Wu SDM10]
  • Feature value of the original graph, randomized graph, and the reconstructed graph

85

reconstruction from randomized graph wu sdm105
Reconstruction from Randomized Graph [Wu SDM10]
  • Reconstructed graphs do not jeopardize link privacy for real-world networks.

-- Privacy measured by the proportion of different edges:

86

reconstruction from randomized graph wu sdm106
Reconstruction from Randomized Graph [Wu SDM10]
  • Graphs with low rank may have privacy breached by reconstruction:
  • Reconstruction on synthetic low rank graphs

87

reconstructing randomized social networks features vuokko sdm10
Reconstructing Randomized Social Networks & Features [Vuokko SDM10]
  • Graph and feature data
    • Original Graph
    • Binary feature matrix
    • Two individuals with higher similarity in features are more likely to be connected in the graph.
  • Reconstruction problem
    • The maximum likelihood estimation is adopted to reconstruct the original graph and features.

88

random sparsification bonchi icde11
Random sparsification[Bonchi ICDE11]
  • only remove edges from the graph without adding new edges.
  • outperform random add/del in terms of utility preservation partially due to the small word phenomenon:
    • adding random long-haul edges brings nodes close,
    • while removing an edge does not bring nodes so much apart since there exit alternative paths.
  • utility vs. privacy trade-off for various randomization strategies.
outline5
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
edge weighted graph
Edge-weighted Graph
  • Edge weights could be sensitive, e.g., trustworthiness of user A according to user B, or transaction amount between two accounts.
anonymizing edge weighted graph das tr09
Anonymizing Edge-weighted Graph [Das TR09]
  • Some properties of edge weights in terms of some functions are preserved.
    • Relative distances between nodes for shortest paths or kNN queries.
  • A framework for edge weight anonymization of graph data that preserves linear properties.
    • A linear property can be expressed by a specific set of linear inequalities of edge weights.
    • Finding new weights for each edge is a linear programming problem.
gaussian randomization liu sdm09
Gaussian Randomization [Liu SDM09]
  • Perturb edge weights while preserving global and local utilities. Graph structure is unchanged.
  • Gaussian randomization multiplication
    • The original weight of each edge is multiplied by a random Gaussian noise with mean 1 and some variance.
    • In the original graph, if the shortest distance d(A,B) is much smaller than d(C,D), the order is high likely to be preserved.
  • Greedy perturbation
    • Preserve a set of shortest distances.
anonymizing multi graphs li sdm11
Anonymizing Multi-graphs [Li SDM11]
  • How to generate an anonymized collection of graphs where each graph corresponds to an individual’s behavior.
    • XML representation of attributes about an individual
    • Click-graph in a user-session
    • Route for a given individual in a time-period
  • Condensation based approach
    • create constrained clusters of size at least K
    • construct a super-template to represent properties of the group
    • generate anonymized graphs from super-template
computing privacy scores liu icdm09
Computing Privacy Scores [Liu ICDM09]
  • The privacy score measures the user’s potential privacy risk due to her online information sharing behaviors. It increases with
    • sensitivity of the information being shared
    • visibility of the revealed information in the network
computing privacy scores liu icdm091
Computing Privacy Scores [Liu ICDM09]
  • Item Response Theory based model
    • Used to measure the abilities of examinees, the difficulty of the questions, and the prob. of an examinee to answer a question correctly.
    • Each examinee is mapped to a user, and each question is mapped to a profile item. The difficulty parameter is to quantify the sensitivity of a profile item.
    • The true visibility is estimated from observed profiles.
outline6
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
output perturbation
Output Perturbation

Data miner

  • Data owner

Query f

Query result + noise

differential guarantee dwork tcc06
Differential Guarantee [Dwork, TCC06]

f count(#cancer)

  • K

f(x) + noise

3 + noise

f count(#cancer)

  • K

f(x’) + noise

2 + noise

Two databases (x, x’) differ in only one row.

differential guarantee
Differential Guarantee
  • Require that the prob. distribution is essentially the same independent of whether any individual opts in to, or opts out of the database.
  • Anything that can be learned about a respondent from a statistical database should be learnable without access to the database.
  • Independent of adversary knowledge.
  • Different from prior work on comparing an adversary's prior and posterior views of an individual.
outline7
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
differential privacy
Differential Privacy

[Dwork, TCC06 &Dwork CACM11]

  • -differential privacy

 is a privacy parameter: smaller  = stronger privacy

  • Two neighboring datasets are defined in terms of
    • Hamming distance |(x-x’)(x’-x)|=1
    • Symmetric distance
calibrating noise
Calibrating Noise
  • Laplace distribution
  • Sensitivity of function
    • global sensitivity
    • local sensitivity
  • Multiple queries
sensitivity
Sensitivity
  • L-1 distance for vector output
  • Complex functions or data mining tasks can be decomposed to a sequence of simple functions.
differential guarantee1
Differential Guarantee

f count(#cancer)

  • K

f(x) + Lap(1/)

Difference of Prob.

=2

=1

neighboring datasets
Neighboring Datasets
  • Two neighboring datasets are defined in terms of
    • Hamming distance |(x-x’)(x’-x)|=1
    • Symmetric distance
  • How about two data sets differing by k rows?
histogram query
Histogram Query

SELECT count(*)

FROM table

GROUP BY disease

cancer, heart, flu

[3, 2, 1]

[3+Lap(1/ ), 2+Lap(1/ ),1+Lap(1/ )]

recent development
Recent Development
  • [Xiao ICDE10] Dependencies among the queries can be exploited to improve the accuracy of responses.
  • [Li PODS10] Matrix mechanism for answering a workload of predicate counting queries
  • [Kifer SIGMOD11] Misconceptions of differential privacy.
outline8
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
private query answering on networks hay icdm09
Private Query Answering on Networks[Hay ICDM09]
  • Two neighboring graphs can be defined to differ by a single edge, K edges, or a single node.
  • edge -differential privacy
    • query output is indistinguishable whether any single edge is present or absent.
  • K-edge -differential privacy
    • query output is indistinguishable whether any set of k edges is present or absent.
  • node -differential privacy
    • query output is indistinguishable whether any single node (and all its edges) is present or absent.

Lap(f/)

Lap(f K/)

degree sequence
Degree Sequence
  • The list of degrees of each node in a graph
  • The degree sequence of a network may be sensitive as it can be used to determine the graph structure by incorporating other graph statistics
two equivalent queries
Two Equivalent Queries

Degree sequence D(G)=[1,1,3,3,3,3,2]

D(G’)=[1,1,3,3,2,2,2]

D=2, Lap(2/) for each component

# of nodes with degree i=0,…n-1

F(G)=[0,2,1,4,0,0,0]

F(G’)=[0,2,3,2,0,0,0]

F=4, Lap(4/) for each component

boosting accuracy hay icdm09
Boosting Accuracy [Hay ICDM09]

Rewrite query D to get S with constraint Cs

  • K

Submit S

Perturbed answer A(S)

Perform inference on A(S) with constraints Cs to derive a better estimation

formulating query d
Formulating Query D

Degree sequence D(G)=[1,1,3,3,3,3,2]

D(G’)=[1,1,3,3,2,2,2]

D=2, Lap(2/) for each component

Return the rank i-th degree

S(G)=[1,1,2,3,3,3,3]

Perturbed answer could be

[3,2,…]

+Lap(2/)

A new (and more accurate) sequence could be derived by computing the

closest non-decreasing sequence.

accurate motif analysis
Accurate Motif Analysis
  • Measures the frequency of occurrence of small subgraphs in a network, e.g., # of triangles

# of triangles is n-2

# of triangles is 0

High sensitivity!

weakening privacy
Weakening Privacy
  • Statistics such as transitivity, clustering coefficient, centrality, and path-lengths have high sensitivity values.
  • Possible techniques
    • [Nissim STOC07] Smooth sensitivity, adopting local sensitivity, i.e., the max. change between Q(I) and Q(I’) for any I’ in neighbor(I).
    • [Rastogi PODS09] Adversary privacy, limiting assumptions of the priori knowledge of the adversary
  • More exploration is needed on robust statistics and differential privacy.
model based data publishing
Model based Data Publishing

Data miner

  • Data owner

.

.

.

  • K

Build models (e.g., contingency table, power-law graph)

Release differentially private model parameters

Generate synthetic data using models with perturbed parameters

outline9
Outline
  • Attacks on Naively Anonymized Graph
  • Privacy Preserving Social Network Publishing
    • K-anonymity
    • Generalization
    • Randomization
    • Other Works
  • Output Perturbation
    • Background on differential privacy
    • Accurate analysis of private network data
references
References
  • [Agarawal, SIGMOD00] R. Aggarwal and R. Srikant. Privacy preserving data mining. SIGMOD, 2000.
  • [Aggarwal, 08] C. C. Aggarwal and P. S. Yu. Privacy-preserving data mining: models and algorithms. Springer, 2008.
  • [Backstrom, WWW07] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. WWW, 2007.
  • [Bhagat, VLDB09] S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava. Class-based graph anaonymization for social network data. VLDB, 2009.
  • [Bonchi, ICDE11] F. Bonchi, A. Gionis, and T. Tassa. Identity Obfuscation in Graphs Through the Information Theoretic Lens. ICDE, 2011.
  • [Campan, PinKDD08] A. Campan and T. M. Truta. A clustering approach for data and structural anonymity in social network data. PinKDD, 2008.
  • [Cheng, SIGMOD10] J. Cheng, A. Fu, and J. Liu. K-isomorphism: privacy preserving network publication against structural attacks. SIGMOD, 2010.
  • [Cormode, VLDB08] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. VLDB, 2008.
references1
References
  • [Das, TR09] S. Das, Omer Egecioglu, and A. E. Abbadi. Anonymizing edge weighted social network graphs. 2009.
  • [Dwork, CACM11] C. Dwork. A firm foundation for private data analysis. CACM, 2011.
  • [Dwork, TCC06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, 2006.
  • [Gross, WPES05] R. Gross and A. Acquisti. Information revelation and privacy in online social networks (the Facebook case). WPES, 2005.
  • [Hanhijarvi, SDM09] S. Hanhijarvi, G. C. Garriga, and K. Puolamaki. Randomization techniques for graphs. SDM, 2009.
  • [Hay, VLDB08] M. Hay, G. Miklau, D. Jensen, D. Towsely, and P. Weis. Resisting structural re-identification in anonymized social networks. VLDB, 2008.
  • [Hay, 07] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava. Anonymizing social networks. 2007.
references2
References
  • [Hay, 09] M. Hay, G. Miklau and D. Jensen. Enabling accurate analysis of private data analysis. 2009.
  • [Kifer, SIGMOD11] D. Kifer and A. Machanavajjhala. No Free Lunch in Data Privacy. SIGMOD, 2011.
  • [Korolova, ICDE08] A. Koroloca, R. Motwani, S. Nabar, and Y. Xu. Link privacy in social networks. ICDE, 2008.
  • [Li, PODS10] C. Li, M. Hay, V. Rastogi, G. Miklau, A. McGregor. Optimizing linear counting queries under differential privacy. PODS, 2010.
  • [Liu, SIGMOD08] K. Liu and E. Terzi. Towards identity anonymization on graphs. SIGMOD 2008.
  • [Liu, ICDM09] K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ICDM 2009.
  • [Liu, SDM09] L. Liu, J. Wang, J. Liu and J. Zhang. Privacy preserving in social networks against sensitive edge disclosure. SDM 2008.
  • [McSherry, FOCS07] F. McSherry and K. Talwar. Mechanism design via differential privacy. FOCS, 2007.
references3
References
  • [Narayanan, 09] A. Narayanan and V. Shmatikov. De-anonymizing social networks. 2009.
  • [Newman, PRE06] M. Newman. Physical Review E, 2006.
  • [Vuokko, SDM10] N. Vyokko and E. Terzi. Reconstructing randomized social networks. SDM, 2010.
  • [Wu, SDM10] L. Wu, X. Ying, and X. Wu. Reconstruction of randomized graph via low rank approximation. SDM, 2010.
  • [Wu, EDBT11] W. Wu, Y. Xiao, W. Wang, Z. He, and Z. Wang. k-symmetry model for identity anonymization in social networks. EDBT, 2011.
  • [Wu, 09] X. Wu, X. Ying, K. Liu, and L. Chen. A Survey of Algorithms for Privacy-Preservation of Graphs and Social Networks. 2009.
  • [Ying, SDM08] X. Ying and X. Wu. Randomizing social networks: a spectrum preserving approach. SDM, 2008.
  • [Ying, SDM09] X. Ying and X. Wu. Graph generation with prescribed feature constraints. SDM, 2009.
references4
References
  • [Ying, SDM09-2] X. Ying and X. Wu. On randomness measures for social networks. SDM, 2009.
  • [Ying, PAKDD09] X. Ying and X. Wu. On link privacy in randomizing social networks. PAKDD, 2009.
  • [Xiao, ICDE10] X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transformation. ICDE, 2010.
  • [Zheleva, PinKDD07] E. Zhelava and L. Getoor. Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007.
  • [Zhou, ICDE08] B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks. ICDE, 2008.
  • [Zou, VLDB09] L. Zou, L. Chen, and M. T. Ozsu. K-automorphism: A general framework for privacy preserving network publication. VLDB, 2009.
thank you

Thank You!

Questions?

Acknowledgments

This work was supported in part by U.S. National Science Foundation IIS-0546027 , CNS-0831204 and CCF-1047621.

Update version: http://dpl.sis.uncc.edu/ppsn-tut.PDF