spotrank a robust voting system for social news websites n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SpotRank : A Robust Voting System for Social News Websites PowerPoint Presentation
Download Presentation
SpotRank : A Robust Voting System for Social News Websites

Loading in 2 Seconds...

play fullscreen
1 / 34

SpotRank : A Robust Voting System for Social News Websites - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

SpotRank : A Robust Voting System for Social News Websites. Thomas Largillier , Guillaume Peyronnet , Sylvain Peyronnet Univ Paris- Sud LRI, Nalrem Mdeias , Univ Paris- Sud LRI WICOW’10 January 26 2011 Presented by Somin Kim. Outline. Introduction Related Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

SpotRank : A Robust Voting System for Social News Websites


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
spotrank a robust voting system for social news websites

SpotRank: A Robust Voting System for Social News Websites

Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet

Univ Paris-Sud LRI, NalremMdeias, Univ Paris-Sud LRI

WICOW’10

January 26 2011

Presented by Somin Kim

outline
Outline
  • Introduction
  • Related Work
  • SpotRank Algorithm
  • Experiments
  • Conclusion
introduction
Introduction
  • In social news website, users share content they found on the web and can vote for the news they like the most
    • Voting for a news is then considered as a recommendation
    • News with a sufficient number of recommendations are displayed on front page.
introduction1
Introduction
  • It is tempting for a user to use malicious techniques in order to obtain a good visibility for his websites
    • Being on the front page of a website such as Digg seems to be very interesting and thousands of unique visitors are obtained within one day
  • The top users are acting together in order to have websites they support displayed on the front page
    • Using daily mailing list
    • Posting hundreds of links
    • Voting for themselves
outline1
Outline
  • Introduction
  • Related Work
  • SpotRank Algorithm
  • Experiments
  • Conclusion
related work
Related Work
  • Spam countermeasures for social websites
    • Identification-based methods : detection of spam and spammers
    • Ranked-based methods : demotion of spam
    • Limit-based methods : preventing spam by making spam content difficult to publish
  • A related field of research
    • Machine learning based ranking framework for social media
    • Detection of click fraud in the Pay Per Click
    • Giving to users a good selection of news
  • We focus on techniques that demote votes that are malicious, or done by users known to be malicious
outline2
Outline
  • Introduction
  • Related Work
  • SpotRank Algorithm
    • Framework and principle
    • Proposing a spot
    • Voting for a spot
    • Detecting cabals
  • Experiments
  • Conclusion
spotrank algorithm framework and principle
SpotRank AlgorithmFramework and principle
  • U : a community of users who use the voting system
  • S : the set of spots
    • Spot : news or content proposed by any user
  • V : the set of all votes
    • Vote is a triple of (u, s, v) where u, v ∈U and s ∈ S
  • Some notations :
spotrank algorithm framework and principle1
SpotRank AlgorithmFramework and principle
  • Two votes do not necessarily have the same value
    • A score to each vote will be assigned depending on many factors
    • The higher the score of a spot, the closer to the first place is the spot.
  • Pertinence
    • The pertinence of a user depends on the pertinence of the spots he voted for, and vice versa
spotrank algorithm proposing a spot
SpotRank AlgorithmProposing a spot
  • When a user proposes a spot it is necessary to initialize its score
      • n : the number of spots proposed by the user in the last 24 hours
      • m : the number of spots previously posted from the user’s IP in the last 20 minutes
  • With this formula, we prevent the effective “spot bombing” from spammers
spotrank algorithm voting for a spot
SpotRank AlgorithmVoting for a spot
  • Once a spot has been proposed, it can be “pushed” to the front page according to its score
    • The base score of a vote is the pertinence of the voter
    • This value is then modified according to several criteria to provide its score
  • The voting part is the most important part where the spammers will concentrate
    • We propose a set of filters whose aim is to counter all the attacks a spammer could think of
spotrank algorithm voting for a spot1
SpotRank AlgorithmVoting for a spot
  • Base value of a vote : pertinence
    • Pert(u) is the mean value of the pertinence of the spots u voted for
    • Pert(s) is its score divided by the number of votes it received
spotrank algorithm voting for a spot2
SpotRank AlgorithmVoting for a spot
  • High frequency voting
    • A typical spammer votes for a lot of spots in a short amount of time
      • α4 is the time interval that is reasonable between two votes
spotrank algorithm voting for a spot3
SpotRank AlgorithmVoting for a spot
  • Abusiveone-way voting
    • A typical spammer uses several accounts
      • One clean account to propose spots
      • Several disposable accounts to vote for the spots proposed by the clean account
    • Users that vote only for one specific user will have their vote becoming useless
spotrank algorithm voting for a spot4
SpotRank AlgorithmVoting for a spot
  • Quick voting
    • The behavior of a spammer is to propose a spot and to quickly vote for it
      • A spammer will not stay a long time on one given website
    • To avoid quick voting we block any vote in the first minute of appearance of the spot s on the site and after that we use a stair function time(s)
      • t : current time
spotrank algorithm voting for a spot5
SpotRank AlgorithmVoting for a spot
  • Multiple avatars and physical community
    • SpotRank demotes votes for a given spot if they come from the same IP address
      • A typical spammer will have many accounts, sometimes he will also have automatic voting mechanisms
      • These voting bots are often located on only a few servers, so they share the same IP address (or only very few IPs addresses)
        • n : number of previous votes from this IP address
spotrank algorithm voting for a spot6
SpotRank AlgorithmVoting for a spot
  • Avoiding the voting list effect
    • A group of people can unite their efforts in order to promote their own spots
      • This is classically done through daily mailing lists
    • if a user u votes for a user u’ and both users are in the same cluster then the value of the vote is weighted by the inverse of the size of this cluster
spotrank algorithm voting for a spot7
SpotRank AlgorithmVoting for a spot
  • Summary : Computation of the actual score of a vote
spotrank algorithm voting for a spot8
SpotRank AlgorithmVoting for a spot
  • Computation of the score of a spot
    • The score of a spot is simply the sum of all votes for this spot and of the initial score of the spot
      • The score of a spot s is updated each time a user votes for it, but also periodically since the value of time decay varies over time
    • Time decay is used to promote new spots against old strong spots
spotrank algorithm detecting cabals
SpotRank AlgorithmDetecting cabals
  • We propose here to regroup people that massively vote between themselves
    • We use the following algorithm that should be run regularly to identify new cabals and actualize the existing ones
outline3
Outline
  • Introduction
  • Related Work
  • SpotRank Algorithm
  • Experiments
    • Log analysis of spotrank.fr
    • Human evaluation
  • Conclusion
experiments
Experiments
  • In order to collect data about the behavior of SpotRank, spotrank.fr has been launched
    • The data were collected from 09/07/2009 to 10/26/2009
    • 15600 visits, 43000 page views
    • Average time spent by a visitor on the website : 2:37 minutes
  • We estimated that at least 10 to 15% of accounts belong to spammers
experiments log analysis of spotrank fr
ExperimentsLog analysis of spotrank.fr
  • % of users with regard to pertinence
    • As time goes and the number of users grows, the pertinence of the users tends to spread more

2009/07/23 2009/09/08 2009/10/26

    • Two categories of users
      • the non-relevant users : pertinence (u) < 10
        • It contains mainly spammers
      • the relevant users : pertinence(u) > 50 (except newcomers )
experiments log analysis of spotrank fr1
ExperimentsLog analysis of spotrank.fr
  • % of low and high pertinent users with regard to time (during 3 months)
    • The percentage of non-relevant users including spammers is decreasing while the percentage of relevant users is increasing
experiments log analysis of spotrank fr2
ExperimentsLog analysis of spotrank.fr
  • # users versus # proposed spots
    • Majority of users proposes a few spots (less than 3)
    • There are few people with a oddly high number of proposed spots
      • Most of them are spammers
experiments log analysis of spotrank fr3
ExperimentsLog analysis of spotrank.fr
  • % users with regard to # votes
    • Most users don’t vote a lot
    • The people that vote the most are clearly the ones we suspect to be spammers
experiments log analysis of spotrank fr4
ExperimentsLog analysis of spotrank.fr
  • # votes versus their scores
    • Most of the votes have very low score
    • Most legitimate users seems to have votes with score between 5 and 50
experiments human evaluation
ExperimentsHuman evaluation
  • We compared the top “stories” of spotrank.fr and two other major social news websites in France
  • Survey protocol
    • Collect the first five spot on each website periodically
    • Generate a webpage containing a shuffle of list of 15 news
    • Each webpage is sent to a volunteer who has to tell for each news if,
      • Yes, it is relevant for the news to appear on the front page of a social news website
      • No, it is not relevant for the news to appear on the front page of a social news website
      • DnK, he is not able to determine if the news deserve to be on the front page or not
      • Err, the news was not accessible when he tried
experiments human evaluation1
ExperimentsHuman evaluation
  • # answers of each type
    • The ranking given by SpotRank is of higher quality than two others
    • The filtering of SpotRank gives clearer results
experiments human evaluation2
ExperimentsHuman evaluation
  • Rank with regard to the number of Yes, No, DnK
    • User satisfaction survey show clearly that the filtering of SpotRank is perceived to be of high quality

Yes No DnK

outline4
Outline
  • Introduction
  • Related Work
  • SpotRank Algorithm
  • Experiments
  • Conclusion
conclusion
Conclusion
  • We presented a robust voting system for social news website
    • to demote the effect of manipulation
  • SpotRank clearly outperforms real competitors in a real life web ecosystem