Exploring linkability of user reviews
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Exploring Linkability of User Reviews PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Exploring Linkability of User Reviews. Mishari Almishari and Gene Tsudik University of California, Irvine. Roadmap. Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion. Motivation. Increasing P opularity of Reviewing Sites

Download Presentation

Exploring Linkability of User Reviews

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Exploring linkability of user reviews

Exploring Linkabilityof User Reviews

Mishari Almishari and Gene Tsudik

University of California, Irvine


Roadmap

Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Motivation

Motivation

Increasing Popularity of Reviewing Sites

Yelp, more than 39M visitors and 15M reviews in 2010


Example

Example

category

Rating


Motivation1

Motivation

Rising awareness of privacy


Motivation2

Motivation

How is it applied?

Traceability/Linkability

Linkability of Ad hoc Reviews

Linkablility of Several Accounts


Exploring linkability of user reviews

Goal

Assess the linkability in user reviews


Roadmap1

Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Data set

Data Set

  • 1 Million Reviews

  • 2000 Users

  • more than 300 reviews


Exploring linkability of user reviews

Problem Settings


Exploring linkability of user reviews

Problem Settings


Exploring linkability of user reviews

IR: Identified Record

AR: Anonymous Record

IR

AR

Problem Formulation

IR

AR

IR

AR

AR

IR


Exploring linkability of user reviews

TOP-X Linkability

X: 1 and 10

1, 5, 10, 20,…60

Anonymous Record (AR)

Problem Settings

Matching Model

Identified Records (IR’s)


Methodologies

Methodologies

(1) Naïve Bayesian Model

Decreasing Sorted List of IRs

(2) Kullback-LeiblerDivergence (KLD)

Increasing Sorted List of IRs

Maximum-Likelihood Estimation


Tokens

Tokens

  • Unigram:

    • “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”

    • 26 values

  • Digram

    • “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”

    • 676 values

  • Rating

    • 5 values

  • Category

    • 28 values


Roadmap2

Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Nb unigram

NB -Unigram

Unigram Results

Linkability Ratio

Size 60,

LR 83%/ Top-1

LR 96% Top-10

Anonymous Record Size


Digram results

Digram Results

NB -Digram

Size 20,

LR 97%/ Top-1

Size10,

LR 88%/ Top-1

Linkability Ratio

Anonymous Record Size


Improvement 1 combining lexical and non lexical ones

Improvement (1): Combining Lexical and non-Lexical ones

NB Model

Gain, up to 20%

Linkability Ratio

Anonymous Record Size

Size 30,

60 % To 80%

Size 60,

83 % To 96%


What about restricting identified record ir size

What about Restricting Identified Record (IR) Size?

NB Model

KLD Model

Linkability Ratio

Linkability Ratio

Anonymous Record Size

Anonymous Record Size

Performed better for smaller IR

Affected by IR size

Size 20 or less,

improved


Improvement 2 matching all ir s at once

Improvement (2): Matching All IR’s At Once

v4

v2

v3

v1

v7

v5

v6

v8

v9

v10

v12

v11

v15

v14

v13

v16


Matching all results

Matching All Results

Restricted IR

Full IR

Linkability Ratio

Linkability Ratio

Anonymous Record Size

Anonymous Record Size

Gain, up to 16%

Gain, up to 23%

Size 30,

From 74% To 90%

Size 20,

From 35% To 55%


Improvement 3 for small ir size

Improvement (3): For Small IR Size

Changing it to:

0.5

+ Review Length

Gain up to 5%

Size 10,

89% To 92%

Linkability Ratio

Size 7,

79% To 84%

Anonymous Record Size


Roadmap3

Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Discussion

Discussion

  • Unigram and Scalability

    • 26 VS 676

    • 59 VS 676

    • Less than 10%

  • Prolific Users

    • On the long run, will be prolific

  • Anonymous Record Size

    • A set of 60 reviews, less than 20% of minimum contribution

  • Detecting Spam Reviews


Roadmap4

Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Future work

Future Work

  • Improving more for Small AR’s

    • Other Probabilistic Models

    • Using Stylometry

  • Review Anonymization

  • Exploring Linkability in other Preference Databases


Conclusion

Conclusion

  • Extensive Study to Assess Linkability of User Reviews

    • For large set of users

    • Using very simple features

  • Users are very exposed even with simple features and large number of authors

Takeaway Point:

Reviews can be accurately de-anonymized using alphabetical letter distributions


Questions

Questions?


  • Login