Exploring linkability of user reviews
Download
1 / 29

Exploring Linkability of User Reviews - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

Exploring Linkability of User Reviews. Mishari Almishari and Gene Tsudik University of California, Irvine. Roadmap. Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion. Motivation. Increasing P opularity of Reviewing Sites

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Exploring Linkability of User Reviews ' - mindy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Exploring linkability of user reviews

Exploring Linkabilityof User Reviews

Mishari Almishari and Gene Tsudik

University of California, Irvine


Roadmap
Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Motivation
Motivation

Increasing Popularity of Reviewing Sites

Yelp, more than 39M visitors and 15M reviews in 2010


Example
Example

category

Rating


Motivation1
Motivation

Rising awareness of privacy


Motivation2
Motivation

How is it applied?

Traceability/Linkability

Linkability of Ad hoc Reviews

Linkablility of Several Accounts


Goal

Assess the linkability in user reviews


Roadmap1
Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Data set
Data Set

  • 1 Million Reviews

  • 2000 Users

  • more than 300 reviews


Problem Settings


Problem Settings


IR: Identified Record

AR: Anonymous Record

IR

AR

Problem Formulation

IR

AR

IR

AR

AR

IR


TOP-X Linkability

X: 1 and 10

1, 5, 10, 20,…60

Anonymous Record (AR)

Problem Settings

Matching Model

Identified Records (IR’s)


Methodologies
Methodologies

(1) Naïve Bayesian Model

Decreasing Sorted List of IRs

(2) Kullback-LeiblerDivergence (KLD)

Increasing Sorted List of IRs

Maximum-Likelihood Estimation


Tokens
Tokens

  • Unigram:

    • “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”

    • 26 values

  • Digram

    • “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”

    • 676 values

  • Rating

    • 5 values

  • Category

    • 28 values


Roadmap2
Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Nb unigram
NB -Unigram

Unigram Results

Linkability Ratio

Size 60,

LR 83%/ Top-1

LR 96% Top-10

Anonymous Record Size


Digram results
Digram Results

NB -Digram

Size 20,

LR 97%/ Top-1

Size10,

LR 88%/ Top-1

Linkability Ratio

Anonymous Record Size


Improvement 1 combining lexical and non lexical ones
Improvement (1): Combining Lexical and non-Lexical ones

NB Model

Gain, up to 20%

Linkability Ratio

Anonymous Record Size

Size 30,

60 % To 80%

Size 60,

83 % To 96%


What about restricting identified record ir size
What about Restricting Identified Record (IR) Size?

NB Model

KLD Model

Linkability Ratio

Linkability Ratio

Anonymous Record Size

Anonymous Record Size

Performed better for smaller IR

Affected by IR size

Size 20 or less,

improved


Improvement 2 matching all ir s at once
Improvement (2): Matching All IR’s At Once

v4

v2

v3

v1

v7

v5

v6

v8

v9

v10

v12

v11

v15

v14

v13

v16


Matching all results
Matching All Results

Restricted IR

Full IR

Linkability Ratio

Linkability Ratio

Anonymous Record Size

Anonymous Record Size

Gain, up to 16%

Gain, up to 23%

Size 30,

From 74% To 90%

Size 20,

From 35% To 55%


Improvement 3 for small ir size
Improvement (3): For Small IR Size

Changing it to:

0.5

+ Review Length

Gain up to 5%

Size 10,

89% To 92%

Linkability Ratio

Size 7,

79% To 84%

Anonymous Record Size


Roadmap3
Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Discussion
Discussion

  • Unigram and Scalability

    • 26 VS 676

    • 59 VS 676

    • Less than 10%

  • Prolific Users

    • On the long run, will be prolific

  • Anonymous Record Size

    • A set of 60 reviews, less than 20% of minimum contribution

  • Detecting Spam Reviews


Roadmap4
Roadmap

Introduction

Data Set & Problem Settings

Linkability Results & Improvements

Discussion

Future Work & Conclusion


Future work
Future Work

  • Improving more for Small AR’s

    • Other Probabilistic Models

    • Using Stylometry

  • Review Anonymization

  • Exploring Linkability in other Preference Databases


Conclusion
Conclusion

  • Extensive Study to Assess Linkability of User Reviews

    • For large set of users

    • Using very simple features

  • Users are very exposed even with simple features and large number of authors

Takeaway Point:

Reviews can be accurately de-anonymized using alphabetical letter distributions



ad