1 / 21

Turning Privacy Leaks into Floods : Surreptitious Discovery of Social Network Friendships

Turning Privacy Leaks into Floods : Surreptitious Discovery of Social Network Friendships. Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur U. Asuncion. Problem Definition. Discover the friendships. Problem Definition. Discover the friendships. Leveraging Information Leaks.

margie
Download Presentation

Turning Privacy Leaks into Floods : Surreptitious Discovery of Social Network Friendships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur U. Asuncion

  2. Problem Definition • Discover the friendships

  3. Problem Definition • Discover the friendships

  4. Leveraging Information Leaks • Leak: Friendship list can be viewed by friends-of-friends. This allows: • Given two people, X and Y, we can tell whether X and Y have a friend in common. • Leverage: We use this to discover the friends list for members of the network

  5. Abstracting the Problem • Viewed abstractly, we are trying to learn binary attribute vectors.

  6. Group Testing • Input:n items, numbered 0,1, …, n-1, at most d of which are defective. • Output: the indices of the defective items. • Items can be grouped into subsets, each of which can be tested to see it contains a defective item or not. • Goal: minimize the total number of tests • Original problem: Testing blood samples.

  7. Testing Schemes • Non-adaptive: All tests must be done in parallel • Adaptive: Tests can be done sequentially • Adaptive is easier, but our framework requires a non-adaptive approach

  8. Facebook Application • Each member has a “vector” of friendships • For any member M, the system returns a bit for whether M has a friend in common with the attacker, even if M restricts this information to friends-of-friends • We can use non-adaptive scheme to learn friendship relationships in any sub-community in Facebook.

  9. DNA Application • DNA sequences are stored in a database, D. • For any sequence Q, the database returns a score for how close Q is to each sequence in D • We form a binary vector w.r.t. places where mutations happen relative to a reference string R • We can use non-adaptive scheme to learn DNA strings in D.

  10. Netflix Application • Movie ratings vectors are stored in a database, D. • For any vector V, the database returns a score for how close V is to each vector in the database • We can form a binary attribute vector for movies • We can use non-adaptive scheme to learn ratings vectors in D.

  11. n t M Matrix View of Testing • A non-adaptive testing regimen can be viewed as a t x n binary matrix M: • M[i,j] = 1 if and only if test i includes item j • M is d-disjunct if the Boolean sum of any d columns does not contain any other column. • An item is defective iff all its tests are positive • M is d-separable if the Boolean sums of each set of at most d columns are distinct (harder analysis algorithm)

  12. Randomized Approach • Use a randomized approach motivated by Bloom filtering. • Construct a matrix M, but relax requirements • Given a set D of d columns in M and a column j, say j is distinguishable from D if there is a row i such that M[i,j]=1 but M[i,j’]=0 for each j’ in D. • M is D-distinguishable if, for a particular collection D of subsets, the matrix M will find them distinguishable.

  13. Constructing the Matrix • Given t (set in the analysis), let M be a 2t x n matrix defined randomly: • For each column j, choose t/d rows of M at random and set these entries to 1. • that is, we “inject” j into those t/d tests

  14. Technique for Social Networks • Insert a small set of network members • Form connections with random network members • Test common-friends condition for the fictional members Image from http://www.politicsforum.org/images/flame_warriors/flame_53.php

  15. Exploiting Sparse Data Sets Histogram of differences from R: Table of sizes, lengths, and differences from R:

  16. Number of Tests Needed in Theory 1st column: To clone entire database with high probability 2nd column: To clone sparsest 50% of database with high probability 3rd column: To clone entire database with probability 1

  17. Different Choices for “d” • Tradeoff: • The smaller the “d”, the faster we can recover sparse vectors • With very small “d”, it can take a long time to recover the vectors that are not so sparse. • But most vectors are sparse so we generally want a pretty small “d” Attack on a Netflix user who has rated 98 movies. With smaller “d”, the rate of convergence is faster.

  18. Here we vary “d” on the x-axis and we plot the mean and median number of tests required across the vectors in the database. Different choices for “d”

  19. More tests are needed for vectors which are further from the reference R (but note most vectors are close to R). We also see the tradeoff between various “d” Distance from R

  20. Thresholding Behavior • There are critical values of our estimated value for d:

  21. Conclusion and Future Work • We have presented a way to turn privacy leaks into floods, with a number of applications: • Social networks • DNA databases • Ratings vectors • Future work: extend our approach to non-binary vectors (e.g., friends and foes)

More Related