1 / 51

Privacy in a Mobile-Social World

Privacy in a Mobile-Social World. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Administrivia. http://www.cs.duke.edu/courses/fall12/compsci590.3/ Wed/Fri 1:25 – 2:40 PM “Reading Course + Project” No exams! Every class based on 1 (or 2) assigned papers that students must read.

ahanu
Download Presentation

Privacy in a Mobile-Social World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy in a Mobile-Social World CompSci 590.03Instructor: AshwinMachanavajjhala Lecture 1 : 590.03 Fall 12

  2. Administrivia http://www.cs.duke.edu/courses/fall12/compsci590.3/ • Wed/Fri 1:25 – 2:40 PM • “Reading Course + Project” • No exams! • Every class based on 1 (or 2) assigned papers that students must read. • Projects: (60% of grade) • Individual or groups of size 2-3 • Class Participation (other 40%) • Office hours: by appointment Lecture 1 : 590.03 Fall 12

  3. Administrivia • Projects: (60% of grade) • Theory/algorithms for privacy • Implement/adapt existing work to new domains • Participate in WSDM Data Challenge: De-anonymization • Goals: • Literature review • Some original research/implementation • Timeline (details will be posted on the website soon) • ≤Sep 28: Choose Project (ideas will be posted … new ideas welcome) • Oct 12: Project proposal(1-4 pages describing the project) • Nov 16: Mid-project review (2-3 page report on progress) • Dec 5&7: Final presentations and submission (6-10 page conference style paper + 10-15 minute talk) Lecture 1 : 590.03 Fall 12

  4. Why you should take this course? • Privacy is (one of) the most important grand challenges in managing today’s data! • “What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud”, SurajitChaudhuri, Microsoft Research • “Big data: The next frontier for innovation, competition, and productivity”, McKinsey Global Institute Report, 2011 Lecture 1 : 590.03 Fall 12

  5. Why you should take this course? • Privacy is (one of) the most important grand challenges in managing today’s data! • Very active field and tons of interesting research. We will read papers in: • Data Management (SIGMOD, VLDB, ICDE) • Theory (STOC, FOCS) • Cryptography/Security (TCC, SSP, NDSS) • Machine Learning (KDD, NIPS) • Statistics (JASA) Lecture 1 : 590.03 Fall 12

  6. Why you should take this course? • Privacy is (one of) the most important grand challenges in managing today’s data! • Very active field and tons of interesting research. • Intro to research by working on a cool project • Read scientific papers about an exciting data application • Formulate a problem • Perform a scientific evaluation Lecture 1 : 590.03 Fall 12

  7. Today • Bird’s-eye view introduction to big-data and privacy • Privacy attacks in the real-world • (In)formal problem statement • Course overview • (If there is time) A privacy preserving algorithm Lecture 1 : 590.03 Fall 12

  8. Introduction Lecture 1 : 590.03 Fall 12

  9. Data Explosion: Internet Estimated User Data Generated per day [Ramakrishnan 2007] • 8-10 GB public content • ~4 TB private content Lecture 1 : 590.03 Fall 12

  10. Data Explosion: Social Networks • 91% of online users … • 25% of all time spent online … • 200 million tweets a day … • millions of posts a day … • 6 billion photos a month … Lecture 1 : 590.03 Fall 12

  11. Data Explosion: Mobile • ~5 billion mobile phones in use! Lecture 1 : 590.03 Fall 12

  12. Big-Data impacts all aspects of our life Lecture 1 : 590.03 Fall 12

  13. The value in Big-Data … Personalized News Interests Top Searches Recommended links +43% clicks vs. editor selected +79% clicks vs. randomly selected +250% clicks vs. editorial one size fits all Lecture 1 : 590.03 Fall 12

  14. The value in Big-Data … “If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. ” McKinsey Global Institute Report Lecture 1 : 590.03 Fall 12

  15. Personal Big-Data Person 1 Person 2 Person 3 Person N r1 r2 r3 rN Google Census DB DB Hospital Information Retrieval Researchers Recommen-dationAlgorithms Medical Researchers Doctors Economists DB Lecture 1 : 590.03 Fall 12

  16. Sometimes users can control and know who sees their information … Lecture 1 : 590.03 Fall 12

  17. … but not always !! Lecture 1 : 590.03 Fall 12

  18. The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Name • SSN • Visit Date • Diagnosis • Procedure • Medication • Total Charge • Zip • Birth date • Sex Medical Data Lecture 1 : 590.03 Fall 12

  19. The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Name • SSN • Visit Date • Diagnosis • Procedure • Medication • Total Charge • Name • Address • Date Registered • Party affiliation • Date last voted • Zip • Birth date • Sex Medical Data Voter List Lecture 1 : 590.03 Fall 12

  20. The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Governor of MA uniquely identified using ZipCode, Birth Date, and Sex. • Name linked to Diagnosis • Name • SSN • Visit Date • Diagnosis • Procedure • Medication • Total Charge • Name • Address • Date Registered • Party affiliation • Date last voted • Zip • Birth date • Sex Medical Data Voter List Lecture 1 : 590.03 Fall 12

  21. The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Governor of MA uniquely identified using ZipCode, Birth Date, and Sex. 87 % of US population • Name • SSN • Visit Date • Diagnosis • Procedure • Medication • Total Charge • Name • Address • Date Registered • Party affiliation • Date last voted • Zip • Birth date • Sex Quasi Identifier Medical Data Voter List Lecture 1 : 590.03 Fall 12

  22. AOL data publishing fiasco … “… Last week AOL did another stupid thing … … but, at least it was in the name of science…” Alternet, August 2006 Lecture 1 : 590.03 Fall 12

  23. AOL data publishing fiasco … AOL “anonymously” released a list of 21 million web search queries. Ashwin222 Ashwin222 Ashwin222 Ashwin222 Pankaj156 Pankaj156 Cox12345 Cox12345 Cox12345 Cox12345 Ashwin222 Ashwin222 Uefa cup Uefa champions league Champions league final Champions league final 2007 exchangeability Proof of deFinitti’s theorem Zombie games Warcraft Beatles anthology Ubuntu breeze Grammy 2008 nominees Amy Winehouse rehab Lecture 1 : 590.03 Fall 12

  24. AOL data publishing fiasco … AOL “anonymously” released a list of 21 million web search queries. UserIDs were replaced by random numbers … 865712345 865712345 865712345 865712345 236712909 236712909 112765410 112765410 112765410 112765410 865712345 865712345 Uefa cup Uefa champions league Champions league final Champions league final 2007 exchangeability Proof of deFinitti’s theorem Zombie games Warcraft Beatles anthology Ubuntu breeze Grammy 2008 nominees Amy Winehouse rehab Lecture 1 : 590.03 Fall 12

  25. Privacy Breach [NYTimes 2006] Lecture 1 : 590.03 Fall 12

  26. Privacy breaches on the rise… Lecture 1 : 590.03 Fall 12

  27. Privacy Breach: Informal Definition A data sharing mechanism M that allows an unauthorized party to learn sensitive information about any individual, which could not have learnt without access to M. Lecture 1 : 590.03 Fall 12

  28. Statistical Privacy (Trusted Collector) Problem Utility: Privacy: No breach about any individual Server DB Individual 1 Individual 2 Individual 3 Individual N r1 r2 r3 rN Lecture 1 : 590.03 Fall 12

  29. Statistical Privacy (Untrusted Collector) Problem Server f ( ) DB Individual 1 Individual 2 Individual 3 Individual N r1 r2 r3 rN Lecture 1 : 590.03 Fall 12

  30. Statistical Privacy in real-world applications • Trusted Data Collectors Lecture 1 : 590.03 Fall 12

  31. Statistical Privacy in real-world applications • Untrusted Data Collectors Lecture 1 : 590.03 Fall 12

  32. Statistical Privacy: Key Problems What is a right definition of privacy? How to develop mechanisms that trade-off privacy for utility? Lecture 1 : 590.03 Fall 12

  33. What is Privacy? • “… the ability to determine for ourselves when, how, and to what extent information about us is communicated to others …” Westin, 1967 • Privacy intrusion occurs when new information about an individual is released. Parent, 1983 Lecture 1 : 590.03 Fall 12

  34. Anonymity • The property that an individual’s record is indistinguishable from many other individual’s records. • K-Anonymity : popular definition where many = k-1 • Used for • Social network anonymization • Location privacy • Anonymous routing Lecture 1 : 590.03 Fall 12

  35. Privacy is not Anonymity • Bob’s record is indistinguishable from records of other Cancer patients • We can infer Bob has Cancer ! • “New Information” principle • Privacy is breached if releasing D (or f(D)) allows an adversary to learn sufficient new information. • New Information = distance(adversary’s prior belief, adversary’s posterior belief after seeing D) • New Information can’t be 0 if the output D or f(D) should be useful. Lecture 1 : 590.03 Fall 12

  36. Privacy Definitions • Many privacy definitions • L-diversity, T-closeness, M-invariance, ε- Differential privacy, E- Privacy, … • Definitions differs in • What information is considered sensitive • Specific attribute (disease) vs all possible properties of an individual • What is the adversary’s prior • All values are equally likely vs Adversary knows everything about all but one individuals • How is new information measured • Information theoretic measures • Pointwise absolute distance • Pointwise relative distance Lecture 1 : 590.03 Fall 12

  37. No Free Lunch • Why can’t we have a single definition for privacy? • For every adversarial prior and every property about an individual, new information is bounded by some constant. • No Free Lunch Theorem: For every algorithm that outputs a D with even a sliver of utility, there is some adversary with a prior such that privacy is not guaranteed. Lecture 1 : 590.03 Fall 12

  38. Algorithms for Privacy • Basic Building Blocks • Generalization or coarsening of attributes • Suppression of outliers • Perturbation • Adding noise • Sampling Lecture 1 : 590.03 Fall 12

  39. Algorithms for Privacy • Build complex algorithms by piecing together building blocks. • But, each building block leads to some information disclosure. And, information disclosure may not add up linearly. • If A1 releases the fact that Bob’s salary is <= 50,000, while A2 releases the fact that Bob’s salary is >= 50,000; then we know Bob’s salary is exactly 50,000. • Composition of Privacy • Algorithms may be reverse-engineered. • If algorithm perturbs x by adding 1, then x can be reconstructed. • Simulatability of Algorithms Lecture 1 : 590.03 Fall 12

  40. Algorithms for Privacy • Anonymous/Private Data Publishing • Medical/Census Data, Search Logs, Social Networks, Location GPS traces • Answering Statistical Counting Queries • Number of students enrolled in this class categorized by gender, nationality • Data Cubes (database), Marginals (statistics) • Social Network Analysis • Measures of centrality (what is the degree distribution? How many triangles?) • Streaming Algorithms • Continuously monitor number of cars crossing a toll booth. • Location Privacy, Health … Lecture 1 : 590.03 Fall 12

  41. Algorithms for Privacy • Game Theory • Can I participate in an auction without the output of the auction revealing my private utility function? • Modern advertising is based on auction design. • Auctions and Mechanism Design • Machine Learning • Regress disease and gender/location/age • Inside tip: Big open area. Much theory – doesn’t work in practice • Recommendations • Think netflix, amazon… • Advertising Lecture 1 : 590.03 Fall 12

  42. Course Outline http://www.cs.duke.edu/courses/fall12/compsci590.3/ Theory/Algorithms (Lectures 1-18) Applications (Lectures 19-25) Project Presentations (Lectures 26, 27) Skip to end >>> Lecture 1 : 590.03 Fall 12

  43. Randomized Response Lecture 1 : 590.03 Fall 12

  44. Case Study: Census Data Collection • N respondents asked a sensitive “yes/no” question. • Surveyor wants to compute fraction π who answer “yes”. • Respondents don’t trust the surveyor. • What should the respondents do? Lecture 1 : 590.03 Fall 12

  45. Randomized Response • Flip a coin • heads with probability p, and • tails with probability 1-p (p > ½) • Answer question according to the following table: Lecture 1 : 590.03 Fall 12

  46. Utility Analysis • π: True fraction of respondents answering “yes” • p: Probability coin falls heads • Yi = 1, if the ith respondent says “yes” = 0, if the ith respondent says “no” P(Yi = 1) = (True answer = yes AND coin = heads) OR (True answer = no AND coin = tails) = πp + (1-π)(1-p) = pyes P(Yi = 0) = π(1-p) + (1-π)p = pno Lecture 1 : 590.03 Fall 12

  47. Utility Analysis • Suppose n1 out of N people replied “yes”, and rest said “no” • What is the best estimate for π ? • Likelihood: L = nCn1 pyesn1pno(n-n1) • Most likely value of π: (by setting dL/dπ = 0) πhat = {n1/n – (1-p)}/(2p-1) Lecture 1 : 590.03 Fall 12

  48. Privacy • Adversary’s prior belief: P(Bob’s true answer is“yes”) = θ • Suppose Bob answers “yes”. P(Bob’s true answer is “yes” | Bob says “yes”) = P(Bob says “yes” AND Bob’s true answer is “yes”) / P(Bob says yes) = P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”)P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”)+ P(Bob says “yes” | Bob’s true answer is “no”)P(Bob’s true answer is “no”) = pθ / pθ + (1-p)(1-θ) ≤ p/(1-p) θ Lecture 1 : 590.03 Fall 12

  49. Privacy • Adversary’s prior belief: P(Bob’s true answer is“yes”) = θ • Suppose Bob answers “yes”.Adversary’s posterior belief: P(Bob’s true answer is “yes” | Bob says “yes”) ≤ p/(1-p) θ Adversary’s posterior belief is always bounded by p/1-p times the adversary’s prior belief (irrespective of what the prior is) Lecture 1 : 590.03 Fall 12

  50. Privacy vs Utility tradeoff • When p = 1 (return truthful answer) • p/1-p = infinity : no privacy • πhat = n1/n = true answer • When p = ½ (return random answer) • p/1-p = 1: perfect privacy • We cannot estimate πhat since the answers are independent of the input. • Pyes= πp + (1-π)(1-p) = ½(π + 1 – π) = ½ = Pno Lecture 1 : 590.03 Fall 12

More Related