1 / 24

Graph Generation with Prescribed Feature Constraints

Graph Generation with Prescribed Feature Constraints. Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada. Motivation. Publishing social networks: Privacy VS. Utility Privacy issue: anonymization is not enough

miron
Download Presentation

Graph Generation with Prescribed Feature Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Generation with Prescribed Feature Constraints Xiaowei YingXintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada

  2. Motivation Publishing social networks: Privacy VS. Utility • Privacy issue: anonymization is not enough Active/passive attacks[Backstrom, et. al., WWW07] Subgraph attacks [M. Hay et. al., VLDB08] • K-anonymity in social networks [B. Zhou, et. al. ICDE08] [K. Liu et. al., SIGMOD08] • Randomization approach Local topology is changed – reduce re-identification risk Links are randomized – link privacy is pretected

  3. Motivation Publishing social networks: Privacy VS. Utility • Randomization Approach -- Pure randomization can’t preserve many topological features. [Ying SDM08] -- the largest eigenvalue of adjacency matrix -- the second smallest eigenvalue of Laplacian matrix -- harmonic mean of shortest distance -- transitivity How to generate graphs preserving data utility?

  4. Motivation • Generate graphs for testing data mining results -- Generate a set of graph samples s.t. a feature of the samples satisfies a specified distribution.

  5. Switch and Uniform Graph Generator Uniform switch procedure [Taylor, 1981] -- Preserves the degree sequence/distribution • Accessibility: can access all the graph with the given degree sequence • Uniformity: all such graphs have the same probability to be generated • Application: empirically learning the property of graph features given degree seq.

  6. Graph Generator with FRC How to generate a graph: • with the given degree sequence • with the feature range constraint (FRC): uniformity for accessible graphs

  7. Graph Generator and Privacy Issues Privacy risks introduced by FRC Attackers know: • The released graph preserve the true degree sequence • The true graph has its S feature within range R What attackers can do? With the released graph, attackers can explore the graph space

  8. Graph Generator and Privacy Issues • Graph space : {G: with the given degree seq. & } • Uniformly sample the graph space: Attacker’s confidence on link (i,j)

  9. FRC Can Jeopardize Privacy --A real network example Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". http://www-personal.umich.edu/˜mejn/netdata/

  10. FRC Can Jeopardize Privacy --A real network example Polbook network 105 nodes, 441 edges The attacker simply takes t node pairs with the highest probabilities as candidate links Top candidates can seriously jeopardize privacy!! Some features jeopardize privacy, and some others not

  11. FRC Can Jeopardize Privacy -- More real network examples Polbook network 105 nodes, 441 edges Enron email network 151 nodes, 869 edges

  12. FRC Can Jeopardize Privacy -- A theoretical result

  13. FRC Can Jeopardize Privacy -- A theoretical result Conclusion: If the FRC specifies a sub-space close to the true graph, privacy is seriously breached

  14. Graph Generator with FDC Feature Distribution Constraint (FDC) Natural distribution f(x) Uniform generator: • gives the natural distribution of feature S, highly skewed in the range • How to generate graphs s.t. • with given degree seq. • features value has the target distribution g(x) Target distribution g(x)

  15. Graph Generator with FDC • Based on Metropolis-Hastings method • Accept ratio depends on target distr. g(x) & natural distr. f(x)

  16. Graph Generator with FDC Evaluation Natural distribution: Target distribution:

  17. Summary • Graph generator with feature range constraint • Attackers can sample the graph space near the true graph and breach the privacy. • Graph generator with feature distribution constraint • Generate a set of graphs samples for statistical testing

  18. Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.

  19. Graph Generator and Privacy Issues • Example: graphs with degree sequence {3,2,2,2,3}. • Is node 1 and 5 connected? Published graph True graph

  20. Graph Generator with FDC Problem of generator with FRC: Uniform generator: • gives the natural distribution of feature S • highly skewed in the range • generates biased feature value Real-world graph Range

More Related