privacy in a mobile social world n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Privacy in a Mobile-Social World PowerPoint Presentation
Download Presentation
Privacy in a Mobile-Social World

Loading in 2 Seconds...

play fullscreen
1 / 51

Privacy in a Mobile-Social World - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Privacy in a Mobile-Social World. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Administrivia. http://www.cs.duke.edu/courses/fall12/compsci590.3/ Wed/Fri 1:25 – 2:40 PM “Reading Course + Project” No exams! Every class based on 1 (or 2) assigned papers that students must read.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Privacy in a Mobile-Social World' - ahanu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
privacy in a mobile social world

Privacy in a Mobile-Social World

CompSci 590.03Instructor: AshwinMachanavajjhala

Lecture 1 : 590.03 Fall 12

administrivia
Administrivia

http://www.cs.duke.edu/courses/fall12/compsci590.3/

  • Wed/Fri 1:25 – 2:40 PM
  • “Reading Course + Project”
    • No exams!
    • Every class based on 1 (or 2) assigned papers that students must read.
  • Projects: (60% of grade)
    • Individual or groups of size 2-3
  • Class Participation (other 40%)
  • Office hours: by appointment

Lecture 1 : 590.03 Fall 12

administrivia1
Administrivia
  • Projects: (60% of grade)
    • Theory/algorithms for privacy
    • Implement/adapt existing work to new domains
    • Participate in WSDM Data Challenge: De-anonymization
  • Goals:
    • Literature review
    • Some original research/implementation
  • Timeline (details will be posted on the website soon)
    • ≤Sep 28: Choose Project (ideas will be posted … new ideas welcome)
    • Oct 12: Project proposal(1-4 pages describing the project)
    • Nov 16: Mid-project review (2-3 page report on progress)
    • Dec 5&7: Final presentations and submission (6-10 page conference style paper + 10-15 minute talk)

Lecture 1 : 590.03 Fall 12

why you should take this course
Why you should take this course?
  • Privacy is (one of) the most important grand challenges in managing today’s data!
    • “What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud”, SurajitChaudhuri, Microsoft Research
    • “Big data: The next frontier for innovation, competition, and productivity”, McKinsey Global Institute Report, 2011

Lecture 1 : 590.03 Fall 12

why you should take this course1
Why you should take this course?
  • Privacy is (one of) the most important grand challenges in managing today’s data!
  • Very active field and tons of interesting research. We will read papers in:
    • Data Management (SIGMOD, VLDB, ICDE)
    • Theory (STOC, FOCS)
    • Cryptography/Security (TCC, SSP, NDSS)
    • Machine Learning (KDD, NIPS)
    • Statistics (JASA)

Lecture 1 : 590.03 Fall 12

why you should take this course2
Why you should take this course?
  • Privacy is (one of) the most important grand challenges in managing today’s data!
  • Very active field and tons of interesting research.
  • Intro to research by working on a cool project
    • Read scientific papers about an exciting data application
    • Formulate a problem
    • Perform a scientific evaluation

Lecture 1 : 590.03 Fall 12

today
Today
  • Bird’s-eye view introduction to big-data and privacy
  • Privacy attacks in the real-world
  • (In)formal problem statement
  • Course overview
  • (If there is time) A privacy preserving algorithm

Lecture 1 : 590.03 Fall 12

introduction
Introduction

Lecture 1 : 590.03 Fall 12

data explosion internet
Data Explosion: Internet

Estimated User Data Generated per day [Ramakrishnan 2007]

  • 8-10 GB public content
  • ~4 TB private content

Lecture 1 : 590.03 Fall 12

data explosion social networks
Data Explosion: Social Networks
  • 91% of online users …
  • 25% of all time spent online …
  • 200 million tweets a day …
  • millions of posts a day …
  • 6 billion photos a month …

Lecture 1 : 590.03 Fall 12

data explosion mobile
Data Explosion: Mobile
  • ~5 billion mobile phones in use!

Lecture 1 : 590.03 Fall 12

the value in big data
The value in Big-Data …

Personalized News Interests

Top Searches

Recommended links

+43% clicks

vs. editor selected

+79% clicks

vs. randomly selected

+250% clicks

vs. editorial one size fits all

Lecture 1 : 590.03 Fall 12

the value in big data1
The value in Big-Data …

“If US healthcare were to use big data

creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year.

McKinsey Global Institute Report

Lecture 1 : 590.03 Fall 12

personal big data
Personal Big-Data

Person 1

Person 2

Person 3

Person N

r1

r2

r3

rN

Google

Census

DB

DB

Hospital

Information Retrieval Researchers

Recommen-dationAlgorithms

Medical Researchers

Doctors

Economists

DB

Lecture 1 : 590.03 Fall 12

sometimes users can control and know who sees their information
Sometimes users can control and know who sees their information …

Lecture 1 : 590.03 Fall 12

but not always
… but not always !!

Lecture 1 : 590.03 Fall 12

the massachusetts governor privacy breach sweeney ijufks 2002
The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]
  • Name
  • SSN
  • Visit Date
  • Diagnosis
  • Procedure
  • Medication
  • Total Charge
  • Zip
  • Birth date
  • Sex

Medical Data

Lecture 1 : 590.03 Fall 12

the massachusetts governor privacy breach sweeney ijufks 20021
The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]
  • Name
  • SSN
  • Visit Date
  • Diagnosis
  • Procedure
  • Medication
  • Total Charge
  • Name
  • Address
  • Date Registered
  • Party affiliation
  • Date last voted
  • Zip
  • Birth date
  • Sex

Medical Data

Voter List

Lecture 1 : 590.03 Fall 12

the massachusetts governor privacy breach sweeney ijufks 20022
The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]
  • Governor of MA uniquely identified using ZipCode, Birth Date, and Sex.
  • Name linked to Diagnosis
  • Name
  • SSN
  • Visit Date
  • Diagnosis
  • Procedure
  • Medication
  • Total Charge
  • Name
  • Address
  • Date Registered
  • Party affiliation
  • Date last voted
  • Zip
  • Birth date
  • Sex

Medical Data

Voter List

Lecture 1 : 590.03 Fall 12

the massachusetts governor privacy breach sweeney ijufks 20023
The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]
  • Governor of MA uniquely identified using ZipCode, Birth Date, and Sex.

87 % of US population

  • Name
  • SSN
  • Visit Date
  • Diagnosis
  • Procedure
  • Medication
  • Total Charge
  • Name
  • Address
  • Date Registered
  • Party affiliation
  • Date last voted
  • Zip
  • Birth date
  • Sex

Quasi Identifier

Medical Data

Voter List

Lecture 1 : 590.03 Fall 12

aol data publishing fiasco
AOL data publishing fiasco …

“… Last week AOL did another stupid thing …

… but, at least it was in the name of science…”

Alternet, August 2006

Lecture 1 : 590.03 Fall 12

aol data publishing fiasco1
AOL data publishing fiasco …

AOL “anonymously” released a list of 21 million web search queries.

Ashwin222

Ashwin222

Ashwin222

Ashwin222

Pankaj156

Pankaj156

Cox12345

Cox12345

Cox12345

Cox12345

Ashwin222

Ashwin222

Uefa cup

Uefa champions league

Champions league final

Champions league final 2007

exchangeability

Proof of deFinitti’s theorem

Zombie games

Warcraft

Beatles anthology

Ubuntu breeze

Grammy 2008 nominees

Amy Winehouse rehab

Lecture 1 : 590.03 Fall 12

aol data publishing fiasco2
AOL data publishing fiasco …

AOL “anonymously” released a list of 21 million web search queries.

UserIDs were replaced by random numbers …

865712345

865712345

865712345

865712345

236712909

236712909

112765410

112765410

112765410

112765410

865712345

865712345

Uefa cup

Uefa champions league

Champions league final

Champions league final 2007

exchangeability

Proof of deFinitti’s theorem

Zombie games

Warcraft

Beatles anthology

Ubuntu breeze

Grammy 2008 nominees

Amy Winehouse rehab

Lecture 1 : 590.03 Fall 12

privacy breach
Privacy Breach

[NYTimes 2006]

Lecture 1 : 590.03 Fall 12

privacy breaches on the rise
Privacy breaches on the rise…

Lecture 1 : 590.03 Fall 12

privacy breach informal definition
Privacy Breach: Informal Definition

A data sharing mechanism M

that allows

an unauthorized party

to learn sensitive information about any individual,

which could not have learnt without access to M.

Lecture 1 : 590.03 Fall 12

statistical privacy trusted collector problem
Statistical Privacy (Trusted Collector) Problem

Utility:

Privacy: No breach about any individual

Server

DB

Individual 1

Individual 2

Individual 3

Individual N

r1

r2

r3

rN

Lecture 1 : 590.03 Fall 12

statistical privacy untrusted collector problem
Statistical Privacy (Untrusted Collector) Problem

Server

f ( )

DB

Individual 1

Individual 2

Individual 3

Individual N

r1

r2

r3

rN

Lecture 1 : 590.03 Fall 12

statistical privacy in real world applications
Statistical Privacy in real-world applications
  • Trusted Data Collectors

Lecture 1 : 590.03 Fall 12

statistical privacy in real world applications1
Statistical Privacy in real-world applications
  • Untrusted Data Collectors

Lecture 1 : 590.03 Fall 12

statistical privacy key problems
Statistical Privacy: Key Problems

What is a right definition of privacy?

How to develop mechanisms that

trade-off privacy for utility?

Lecture 1 : 590.03 Fall 12

what is privacy
What is Privacy?
  • “… the ability to determine for ourselves when, how, and to what extent information about us is communicated to others …” Westin, 1967
  • Privacy intrusion occurs when new information about an individual is released. Parent, 1983

Lecture 1 : 590.03 Fall 12

anonymity
Anonymity
  • The property that an individual’s record is indistinguishable from many other individual’s records.
  • K-Anonymity : popular definition where many = k-1
  • Used for
    • Social network anonymization
    • Location privacy
    • Anonymous routing

Lecture 1 : 590.03 Fall 12

privacy is not anonymity
Privacy is not Anonymity
  • Bob’s record is indistinguishable from records of other Cancer patients
    • We can infer Bob has Cancer !
  • “New Information” principle
    • Privacy is breached if releasing D (or f(D)) allows an adversary to learn sufficient new information.
    • New Information = distance(adversary’s prior belief, adversary’s posterior belief after seeing D)
    • New Information can’t be 0 if the output D or f(D) should be useful.

Lecture 1 : 590.03 Fall 12

privacy definitions
Privacy Definitions
  • Many privacy definitions
    • L-diversity, T-closeness, M-invariance, ε- Differential privacy, E- Privacy, …
  • Definitions differs in
    • What information is considered sensitive
      • Specific attribute (disease) vs all possible properties of an individual
    • What is the adversary’s prior
      • All values are equally likely vs Adversary knows everything about all but one individuals
    • How is new information measured
      • Information theoretic measures
      • Pointwise absolute distance
      • Pointwise relative distance

Lecture 1 : 590.03 Fall 12

no free lunch
No Free Lunch
  • Why can’t we have a single definition for privacy?
    • For every adversarial prior and every property about an individual, new information is bounded by some constant.
  • No Free Lunch Theorem: For every algorithm that outputs a D with even a sliver of utility, there is some adversary with a prior such that privacy is not guaranteed.

Lecture 1 : 590.03 Fall 12

algorithms for privacy
Algorithms for Privacy
  • Basic Building Blocks
    • Generalization or coarsening of attributes
    • Suppression of outliers
    • Perturbation
    • Adding noise
    • Sampling

Lecture 1 : 590.03 Fall 12

algorithms for privacy1
Algorithms for Privacy
  • Build complex algorithms by piecing together building blocks.
  • But, each building block leads to some information disclosure. And, information disclosure may not add up linearly.
    • If A1 releases the fact that Bob’s salary is <= 50,000, while A2 releases the fact that Bob’s salary is >= 50,000; then we know Bob’s salary is exactly 50,000.
    • Composition of Privacy
  • Algorithms may be reverse-engineered.
    • If algorithm perturbs x by adding 1, then x can be reconstructed.
    • Simulatability of Algorithms

Lecture 1 : 590.03 Fall 12

algorithms for privacy2
Algorithms for Privacy
  • Anonymous/Private Data Publishing
    • Medical/Census Data, Search Logs, Social Networks, Location GPS traces
  • Answering Statistical Counting Queries
    • Number of students enrolled in this class categorized by gender, nationality
    • Data Cubes (database), Marginals (statistics)
  • Social Network Analysis
    • Measures of centrality (what is the degree distribution? How many triangles?)
  • Streaming Algorithms
    • Continuously monitor number of cars crossing a toll booth.
    • Location Privacy, Health …

Lecture 1 : 590.03 Fall 12

algorithms for privacy3
Algorithms for Privacy
  • Game Theory
    • Can I participate in an auction without the output of the auction revealing my private utility function?
    • Modern advertising is based on auction design.
    • Auctions and Mechanism Design
  • Machine Learning
    • Regress disease and gender/location/age
    • Inside tip: Big open area. Much theory – doesn’t work in practice
  • Recommendations
    • Think netflix, amazon…
  • Advertising

Lecture 1 : 590.03 Fall 12

course outline
Course Outline

http://www.cs.duke.edu/courses/fall12/compsci590.3/

Theory/Algorithms (Lectures 1-18)

Applications (Lectures 19-25)

Project Presentations (Lectures 26, 27)

Skip to end >>>

Lecture 1 : 590.03 Fall 12

randomized response
Randomized Response

Lecture 1 : 590.03 Fall 12

case study census data collection
Case Study: Census Data Collection
  • N respondents asked a sensitive “yes/no” question.
  • Surveyor wants to compute fraction π who answer “yes”.
  • Respondents don’t trust the surveyor.
  • What should the respondents do?

Lecture 1 : 590.03 Fall 12

randomized response1
Randomized Response
  • Flip a coin
    • heads with probability p, and
    • tails with probability 1-p (p > ½)
  • Answer question according to the following table:

Lecture 1 : 590.03 Fall 12

utility analysis
Utility Analysis
  • π: True fraction of respondents answering “yes”
  • p: Probability coin falls heads
  • Yi = 1, if the ith respondent says “yes”

= 0, if the ith respondent says “no”

P(Yi = 1) = (True answer = yes AND coin = heads) OR (True answer = no AND coin = tails)

= πp + (1-π)(1-p) = pyes

P(Yi = 0) = π(1-p) + (1-π)p = pno

Lecture 1 : 590.03 Fall 12

utility analysis1
Utility Analysis
  • Suppose n1 out of N people replied “yes”, and rest said “no”
  • What is the best estimate for π ?
  • Likelihood: L = nCn1 pyesn1pno(n-n1)
  • Most likely value of π: (by setting dL/dπ = 0)

πhat = {n1/n – (1-p)}/(2p-1)

Lecture 1 : 590.03 Fall 12

privacy
Privacy
  • Adversary’s prior belief: P(Bob’s true answer is“yes”) = θ
  • Suppose Bob answers “yes”.

P(Bob’s true answer is “yes” | Bob says “yes”)

= P(Bob says “yes” AND Bob’s true answer is “yes”) / P(Bob says yes)

= P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”)P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”)+ P(Bob says “yes” | Bob’s true answer is “no”)P(Bob’s true answer is “no”)

= pθ / pθ + (1-p)(1-θ) ≤ p/(1-p) θ

Lecture 1 : 590.03 Fall 12

privacy1
Privacy
  • Adversary’s prior belief: P(Bob’s true answer is“yes”) = θ
  • Suppose Bob answers “yes”.Adversary’s posterior belief:

P(Bob’s true answer is “yes” | Bob says “yes”) ≤ p/(1-p) θ

Adversary’s posterior belief is always bounded by p/1-p times the adversary’s prior belief (irrespective of what the prior is)

Lecture 1 : 590.03 Fall 12

privacy vs utility tradeoff
Privacy vs Utility tradeoff
  • When p = 1 (return truthful answer)
    • p/1-p = infinity : no privacy
    • πhat = n1/n = true answer
  • When p = ½ (return random answer)
    • p/1-p = 1: perfect privacy
    • We cannot estimate πhat since the answers are independent of the input.
    • Pyes= πp + (1-π)(1-p) = ½(π + 1 – π) = ½ = Pno

Lecture 1 : 590.03 Fall 12

next class
Next Class
  • Attacks on naively anonymized data
    • Netflix recommendations
    • Social networks

Lecture 1 : 590.03 Fall 12