P2P recommender systems : a (small) survey

P2P recommendersystems: a (small) survey Giulio Rossetti

Talk Outline

What are Recommender Systems? • RSs are a classof information filteringsystemthatseek to predict: • the rating or, • preference • thatuserwouldgive to • an item(suchas music, books, or movies) or • social element(e.g. people or groups) • theyhadnotyetconsidered, usinga model built from the characteristics of • items(content-basedapproaches) or • user'ssocial environment(collaborative filteringapproaches)

Whyrecommendersistems? Nowadays the amount of information we are retrievinghavebecomeincreasinglyenormous(Big Data) Whatwereallyneedis a technologythat can assist usfindresources of interestamong the overwhelmingdata available “[…] a personalized information filteringused to eitherpredictwhether a particularuserwilllike a particular item (predictionproblem) or to identify a set of Nitemsthatwill be of interest to a certainuser.”

Well-known families of approaches

CentralizedApproaches Twomain family of metodologieswerestudied in recentyears: • User-based CF • are CF algorithmsthat work on the assumptionthateachuserbelongs to a group of similarbehavingusers. The basis for the recommendationiscomposed by itemsthat are liked by users. Items are recommendedbased on userstastes. The algorithmconsidersthatuserswho are similar (havesimilarattributes) will be interested on sameitems. • Item-based CF • are a CF algorithmsthat look at the similaritybetweenitems to make a prediction. The idea isthatusers are mostlikely to purchaseitemsthat are similar to the onesalreadybought in the past; so by analyzing the purchasing information we can have an idea aboutwhat he maywant in the future.

P2P: Motivations The need for efficientdecentralizedrecommendersystemshasbeenappreciated for some time, both for the intrinsicadvantages of decentralization and the necessity of integratingrecommendersystemsintoP2P applications. The twomainadvantagesgathered are: • the predictions can be distributedamongallusers, removing the need for a costlycentral server and enhancingscalability • a decentralizedrecommenderimproves the privacy of the users for thereis no centralentitystoringowning the private information of the users.

P2P Recommendersystems: a small survey

User-Based Collaborative FilteringOrmándi, I. Hegedas and M. Jelasity NodeBalancingissue: Overlaytopologiesdefined by nodesimilarityhaveoftenhighlyunbalanceddegreedistributions (i.e. power-law). Overlay management:how can be builded and maintained the best possibleoverlay for computingrecommendationscores(taking care bandwith of usageat the nodes)? Desiderata: a minimal, uniformloadfrom overlay management evenwhen the in-degreedistribution of the expectedoverlaygraphisunbalanced Approaches:BuddiCast, kNN (Random Sampling & T-MAN)

BuddiCast • Eachnodelocalviewcontains a full descriptor of the node’sneighbors (i.e. ratings). Computing reccomendations do notload the network (local information approach). • Loadbalancing: • Block list: Ifa nodecommunicates with anotherpeer, itis put on the block list for few hours. • Candidate list: containsclosepeers for potentialcommunication • Random list: containsrandom samples from the network. • For overlaymaintenance, eachnodeconnectsto the best node from the candidate list with probabilityα, and to a random list with probability1−α, and exchangesitsbuddy list with the selectedpeer.

kNN: Random Samples • Everynodehas a localviewof sizekthatcontainsnodedescriptors. • Eachnodeisinitializedwith k random samplesfrom the network, whichiterativelyapproximate the kNNgraph. • The convergenceisbased on an iterative random samplingprocess. • Random nodesare insertedinto the view (whichisimplementedas a boundedpriorityqueue) • The queue’spriorityisbased on the similarityfunctionprovided by the recommendermodule.

kNN: T-Man sampling • Overlaymanaged with the T-MANalgorithm: • T-MAN periodicallyupdates the node’sview (of size k) by: • selectinga peernode to communicatewith • exchangingitsview with the peer • mergingthe twoviews and keeping the closest k descriptors • Peer (communitication) selectionmethods: • Global:selects the node from the wholenetwork randomly • View:selects the node from the viewuniformlyatrandom • Proportional:selectsa nodefrom viewbut with differentprobabilitydistribution • Best:selects the mostsimilarnodewithoutanyrestriction

User-based CF: Observations • In unbalanceddistributioncasesisnotoptimal to use the kNN (T-Man Best) view (a more relaxedone can givebetterrecommendationperformance) • Overlayconstructionconvergesreasonably fast even in the case of random updates or with T-MAN • T-MAN with Globalselectionis a goodchoice: • ithasa fullyuniformloaddistributioncombined with an acceptableconvergencespeed, whichisbetterthanthat of the random viewupdate

P2PRec: a social based P2P recommendersystemDraidiand Pacitti The idea: recommend high qualitydocumentsrelated to querytopics and contentshold by friends (or FOAF), who are experton the topicsrelated to the query. Assumptions: • eachnoderepresents a peerlabelled with the contentsitstores and itstopics of interests; • expertise isdeducedbased on the contentsstored by a user; • the topicseachpeerisinterested in are calculatedby analyzing the documents he holds; • to disseminate information aboutexpertsisadopted a semantic-basedgossipalgorithmsthatprovidescalability, robustnessand loadbalancing.

How P2Precworks • LatentDirichletAllocation(LDA) isused to automatically model the topics in the system • Training - Global level: identification of the complete set of topics • Inference - local (node) level: extraction of the topics of interest for the user • Dissemination of local information by a gossip algorithm • FOAF descriptor: topics of interest, trust level • At each gossip exchange, eachuseruchecksitslocal-view for relevantsimilarpeer with respecttopics of interests and friendship networks: • Iffounded, a demand of friendshipislaunched. • Querying • A key-word queryqisassociated a TTL and isroutedrecursively in a P2P top-k manner

Social GraphEmbeddingA. Kermarrec, V. Leroy and G. Trédan A proximitymetricbetweenusersenable to predictpotentialrelevant future relationships (Link Prediction) SoCS (Social Coordinate System) • Fullydistribuitedalgorithmthatembeds a social graph in an Eucliedeanspace • Nodesgetsassigned coordinate w.r.t. their social position • Community structureispreserved Force-basedembedding (FBE): Edgesrepresentsprings and nodesrepresentelectricallyequallychargedparticles. Edges(springs) attract the verticesthey link, whereasvertices(particles) repulse eachother. The embeddingisachieved once the systemreaches an equilibrium.

SoCSAlgorithm Social Neighbors: Nodesthathaveclose social positions. Graphneighborsand social neighbors of a nodeare notnecessarily the same. Eachnoderegularlyupdatesits position in the social space: • first gathers the positions of itsgraph and social neighbors • usingthesepositions computesthe forcesthat are applied to it, and derivesitsupdated social position • a gossip protocolprovidesto the node a list of itsnew social neighbors • thislist isthenused to compute new positions Similaritymetrics: SoCSwillrecommend to a nodeitsclosest social neighborsthat are notalreadygraphneighbors. • Common Neighbors, Jaccard, Adamic\Adar, PathLength, Katz…

SoCSAlgorithm (2) SoCSrelies on gossip to discover the social neighbors. Eachnoderuns a clusteringalgorithm (NeighborsPeer Sampling- NPS) in order to maintain and update its social neighbors list. Gossip protocolshavebeenshown to be cheap, robustagainstchurn, and to converge quickly

Decentralized Random WalksA. Kermarrec, V. Leroy, A. Moin and C. Thraves The application of random walks to decentralizedenvironmentsisdifferent from the centralizedversion. • CentralizedRS: Random walks are usedasclusteringmechanism(e.g. community discovery) • Decentralized RS: CD infeasible: the knowledge of eachpeerabout the P2P network islimited to itsneighborhood. ProposedApproach • Eachpeerisprovided with a neighborhoodcomposed of a small set of similarpeers by means of an epidemic (gossip) protocol; • Ratingsfor unknownitemsare estimatedby a random walk on the neighborhood. • Once peershavestabilizedtheirneighborhoodthey can calculaterecommendationsindipendently • Similaritymeasure: PearsonCorrelation, Jaccard

Random Walksobservedproperties The users in the neighborhood are modeledasMarkov Chain graphvertices, and a random walkisapplied on thisgraph. • A Markovchain can be represented by a directedgraphwherevertices are the states of the chain and edgesrepresent the transitionprobabilities from one state to another. Results: • Random walkworkswellwhen the data is so sparse thatclassicsimilaritymeasuresfail to detectmeaningful relation betweenusers; • Increasing the neighborhoodsize the accuracyincrease; • decentralizeduser-basedapproachesperformbetter (lowcomplexyty, high precision) thantheiritem-basedcounterparts in P2P recommenderapplications; • Cosine similarityperformedbetter in decentralized item-basedalgorithms, whilePearsoncorrelationworkedbetter for decentralizeduser-basedalgorithms

Conclusions • P2P Recommendersystemsare needed in order to overcomescalabilityand privacyissues • Severalapproacheswereanalyzed • Eachonerelying (to some extent) to gossip algorithm in order to maintain and update the overlay network • Allmostall the discussedapproachestakle the problem with a user-basedsimilaritystrategyexploitingclassical network theoryapproaches; • Unsupervised Link Prediction • Community Discovery • Force directedembedding

Bibliography • D. Almazro and G. Shahatah. A surveypaper on recommendersystems (2010) • F. Draidi and E. Pacitti. Demo of P2Prec: a Social-based P2P Recommendation System. (2011) • A. Kermarrec, V. Leroy, A. Moin and C. Thraves. Application of random walks to decentralizedrecommendersystems. (2010) • A. Kermarrec, V. Leroy and G. Trédan. Distributed social graphembedding. (2011) • R. Ormándi, I. Hegedas and M. Jelasity. Overlay management for fullydistributeduser-based collaborative filtering. (2010) …questions?

P2P recommender systems : a (small) survey

P2P recommender systems : a (small) survey

Presentation Transcript

ECT 250: Survey of e-commerce technology

Television Creators: Small Screen Auteurs davidlavery

Office of Community and Rural Affairs

North Carolina Geological Survey

OUR 5 MAJOR SENSORY SYSTEMS

Pulley Systems

Chapter 13

CIS 105 Survey of Computer Information Systems

Tutorial: Recommender Systems International Joint Conference on Artificial Intelligence Barcelona, July 17, 2011

Effective Tier II Systems: From Classroom to Small Group

A Survey of Power-Saving Techniques for Storage Systems

Optimizing Recommender Systems as a Submodular Bandits Problem

Survey Design and Analysis

Small Systems Chapter 8

ECE 720T5 Fall 2012 Cyber-Physical Systems

ECE 720T5 Fall 2011 Cyber-Physical Systems

Learning to Recommend

The Research Advances in P2P Systems

Hypernetworks in systems of systems of systems

July 2013

Introduction

2009 Rheumatology Economic Survey