do you trust your recommender an exploration of privacy and trust in recommender systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems PowerPoint Presentation
Download Presentation
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

Loading in 2 Seconds...

play fullscreen
1 / 30

Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems. Dan Frankowski , Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl University of Minnesota. Story: Finding “Subversives”.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems' - may


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
do you trust your recommender an exploration of privacy and trust in recommender systems

Do You Trust Your Recommender?An Exploration of Privacy and Trust in Recommender Systems

Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl

University of Minnesota

story finding subversives
Story: Finding “Subversives”

“.. few things tell you as much about a person as the books he chooses to read.”

– Tom Owad, applefritter.com

session outline
Session Outline
  • Exposure: undesired access to a person’s information
    • Privacy Risks
    • Preserving Privacy
  • Bias and Sabotage: manipulating a trusted system to manipulate users of that system
why do i care
Why Do I Care?
  • As a businessperson
    • The nearest competitor is one click away
    • Lose your customer’s trust, they will leave
    • Lose your credibility, they will ignore you
  • As a person
    • Let’s not build Big Brother
risk of exposure in one slide

Private Dataset

Public Dataset

YOU

YOU

algorithms

Risk of Exposure in One Slide

+

+

Seems bad. How can privacy be preserved?

Your private data linked!

=

slide6

movielens.org

  • Started ~1995
  • Users rate movies ½ to 5 stars
  • Users get recommendations
  • Private: no one outside GroupLens can see user’s ratings
slide7

Anonymized Dataset

  • Released 2003
  • Ratings, some demographic data, but no identifiers
  • Intended for research
  • Public: anyone can download
slide8

movielens.org Forums

  • Started June 2005
  • Users talk about movies
  • Public: on the web, no login to read
  • Can forum users be identified in our anonymized dataset?
research questions
Research Questions
  • RQ1: RISKS OF DATASET RELEASE: What are risks to user privacy when releasing a dataset?
  • RQ2: ALTERING THE DATASET: How can dataset owners alter the dataset they release to preserve user privacy?
  • RQ3: SELF DEFENSE: How can users protect their own privacy?
motivation privacy loss
Motivation: Privacy Loss
  • MovieLens forum users did not agree to reveal ratings
  • Anonymized ratings + public forum data = privacy violation?
  • More generally: dataset 1 + dataset 2 = privacy risk?
    • What kind of datasets?
    • What kinds of risks?
vulnerable datasets
Vulnerable Datasets
  • We talk about datasets from a sparse relation space
    • Relates people to items
    • Is sparse (few relations per person from possible relations)
    • Has a large space of items
example sparse relation spaces
Example Sparse Relation Spaces
  • Examples
    • Customer purchase data from Target
    • Songs played from iTunes
    • Articles edited in Wikipedia
    • Books/Albums/Beers… mentioned by bloggers or on forums
    • Research papers cited in a paper (or review)
    • Groceries bought at Safeway
  • We look at movie ratings and forum mentions, but there are many sparse relation spaces
risks of re identification
Risks of re-identification
  • Re-identification is matching a user in two datasets by using some linking information (e.g., name and address, or movie mentions)
  • Re-identifying to an identified dataset (e.g., with name and address, or social security number) can result in severe privacy loss
story finding medical records sweeney 2002
Story: Finding Medical records (Sweeney 2002)

87% of people in 1990 U.S. census identifiable by these!

Former Governor of Massachusetts

the rebus form
The Rebus Form

+

=

Governor’smedical records!

related work
Related Work
  • Anonymizing datasets: k-anonymity
    • Sweeney 2002
  • Privacy-preserving data mining
    • Verykios et al 2004, Agrawal et al 2000, …
  • Privacy-preserving recommender systems
    • Polat et al 2003, Berkovsky et al 2005, Ramakrishnan et al 2001
  • Text mining of user comments and opinions
    • Drenner et al 2006, Dave et al 2003, Pang et al 2002
rq1 risks of dataset release
RQ1: Risks of Dataset Release
  • RQ1: What are risks to user privacy when releasing a dataset?
  • RESULT: 1-identification rate of 31%
  • Ignores rating values entirely!
  • Can do even better if text analysis produces rating value
  • Rarely-rated items were more identifying
glorious linking assumption
Glorious Linking Assumption
  • People mostly talk about things they know => People tend to have rated what they mentioned
  • Measured P(u rated m | u mentioned m) averaged over all forum users: 0.82
algorithm idea
Algorithm Idea

All Users

Users who

rated a rarely

rated item

Users who

rated a

popular item

Users who

rated both

slide20

More mentions => better re-identification

  • >=16 mentions and we often 1-identify
rq2 altering the dataset
RQ2: ALTERING THE DATASET
  • How can dataset owners alter the dataset they release to preserve user privacy?
  • Perturbation: change rating values
    • Oops, Scoring doesn’t need values
  • Generalization: group items (e.g., genre)
    • Dataset becomes less useful
  • Suppression: hide data
    • IDEA: Release a ratings dataset suppressing all “rarely-rated” items
slide22

Drop 88% of items to protect current users against 1-identification

  • 88% of items => 28% ratings
rq3 self defense
RQ3: SELF DEFENSE
  • RQ3: How can users protect their own privacy?
  • Similar to RQ2, but now per-user
  • User can change ratings or mentions. We focus on mentions
  • User can perturb, generalize, or suppress. As before, we study suppression
slide24

Suppressing 20% of mentions dropped 1-ident some, but not all

  • Suppressing >20% is not reasonable for a user
another strategy misdirection
Another Strategy: Misdirection
  • What if users mention items they did NOT rate? This might misdirect a re-identification algorithm
  • Create a misdirection list of items. Each user takes an unrated item from the list and mentions it. Repeat until not identified.
  • What are good misdirection lists?
    • Remember: rarely-rated items are identifying
slide26

Rarely-rated items don’t misdirect!

  • Popular items do better, though 1-ident isn’t zero
  • Better to misdirect to a large crowd
  • Rarely-rated items are identifying, popular items are misdirecting
exposure what have we learned
Exposure: What Have We Learned?
  • REAL RISK
    • Re-identification can lead to loss of privacy
    • We found substantial risk of re-identification in our sparse relation space
    • There are a lot of sparse relation spaces
    • We’re probably in more and more of them available electronically
  • HARD TO PRESERVE PRIVACY
    • Dataset owner had to suppress a lot of their dataset to protect privacy
    • Users had to suppress a lot to protect privacy
    • Users could misdirect somewhat with popular items
advice keep customer s trust
Advice: Keep Customer’s Trust
  • Share data rarely
    • Remember the governor: (zip + birthdate + gender) is not anonymous
  • Reduce exposure
    • Example: Google will anonymize search data older than 24 months
aol 650k users 20m queries
Data wants to be free

Government subpoena, research, commerce

People do not know the risks

AOL was text, this is items

NY Times: 4417749 searched for “dog that urinates on everything.”

AOL: 650K users, 20M queries
discussion 1 exposure
Discussion #1: Exposure
  • Examples of sparse relation spaces?
  • Examples of re-identification risks?
  • How to preserve privacy?