P2P recommendersystems: a (small) survey Giulio Rossetti
What are Recommender Systems? • RSs are a classof information filteringsystemthatseek to predict: • the rating or, • preference • thatuserwouldgive to • an item(suchas music, books, or movies) or • social element(e.g. people or groups) • theyhadnotyetconsidered, usinga model built from the characteristics of • items(content-basedapproaches) or • user'ssocial environment(collaborative filteringapproaches)
Whyrecommendersistems? Nowadays the amount of information we are retrievinghavebecomeincreasinglyenormous(Big Data) Whatwereallyneedis a technologythat can assist usfindresources of interestamong the overwhelmingdata available “[…] a personalized information filteringused to eitherpredictwhether a particularuserwilllike a particular item (predictionproblem) or to identify a set of Nitemsthatwill be of interest to a certainuser.”
CentralizedApproaches Twomain family of metodologieswerestudied in recentyears: • User-based CF • are CF algorithmsthat work on the assumptionthateachuserbelongs to a group of similarbehavingusers. The basis for the recommendationiscomposed by itemsthat are liked by users. Items are recommendedbased on userstastes. The algorithmconsidersthatuserswho are similar (havesimilarattributes) will be interested on sameitems. • Item-based CF • are a CF algorithmsthat look at the similaritybetweenitems to make a prediction. The idea isthatusers are mostlikely to purchaseitemsthat are similar to the onesalreadybought in the past; so by analyzing the purchasing information we can have an idea aboutwhat he maywant in the future.
P2P: Motivations The need for efficientdecentralizedrecommendersystemshasbeenappreciated for some time, both for the intrinsicadvantages of decentralization and the necessity of integratingrecommendersystemsintoP2P applications. The twomainadvantagesgathered are: • the predictions can be distributedamongallusers, removing the need for a costlycentral server and enhancingscalability • a decentralizedrecommenderimproves the privacy of the users for thereis no centralentitystoringowning the private information of the users.
User-Based Collaborative FilteringOrmándi, I. Hegedas and M. Jelasity NodeBalancingissue: Overlaytopologiesdefined by nodesimilarityhaveoftenhighlyunbalanceddegreedistributions (i.e. power-law). Overlay management:how can be builded and maintained the best possibleoverlay for computingrecommendationscores(taking care bandwith of usageat the nodes)? Desiderata: a minimal, uniformloadfrom overlay management evenwhen the in-degreedistribution of the expectedoverlaygraphisunbalanced Approaches:BuddiCast, kNN (Random Sampling & T-MAN)
BuddiCast • Eachnodelocalviewcontains a full descriptor of the node’sneighbors (i.e. ratings). Computing reccomendations do notload the network (local information approach). • Loadbalancing: • Block list: Ifa nodecommunicates with anotherpeer, itis put on the block list for few hours. • Candidate list: containsclosepeers for potentialcommunication • Random list: containsrandom samples from the network. • For overlaymaintenance, eachnodeconnectsto the best node from the candidate list with probabilityα, and to a random list with probability1−α, and exchangesitsbuddy list with the selectedpeer.
kNN: Random Samples • Everynodehas a localviewof sizekthatcontainsnodedescriptors. • Eachnodeisinitializedwith k random samplesfrom the network, whichiterativelyapproximate the kNNgraph. • The convergenceisbased on an iterative random samplingprocess. • Random nodesare insertedinto the view (whichisimplementedas a boundedpriorityqueue) • The queue’spriorityisbased on the similarityfunctionprovided by the recommendermodule.
kNN: T-Man sampling • Overlaymanaged with the T-MANalgorithm: • T-MAN periodicallyupdates the node’sview (of size k) by: • selectinga peernode to communicatewith • exchangingitsview with the peer • mergingthe twoviews and keeping the closest k descriptors • Peer (communitication) selectionmethods: • Global:selects the node from the wholenetwork randomly • View:selects the node from the viewuniformlyatrandom • Proportional:selectsa nodefrom viewbut with differentprobabilitydistribution • Best:selects the mostsimilarnodewithoutanyrestriction
User-based CF: Observations • In unbalanceddistributioncasesisnotoptimal to use the kNN (T-Man Best) view (a more relaxedone can givebetterrecommendationperformance) • Overlayconstructionconvergesreasonably fast even in the case of random updates or with T-MAN • T-MAN with Globalselectionis a goodchoice: • ithasa fullyuniformloaddistributioncombined with an acceptableconvergencespeed, whichisbetterthanthat of the random viewupdate
P2PRec: a social based P2P recommendersystemDraidiand Pacitti The idea: recommend high qualitydocumentsrelated to querytopics and contentshold by friends (or FOAF), who are experton the topicsrelated to the query. Assumptions: • eachnoderepresents a peerlabelled with the contentsitstores and itstopics of interests; • expertise isdeducedbased on the contentsstored by a user; • the topicseachpeerisinterested in are calculatedby analyzing the documents he holds; • to disseminate information aboutexpertsisadopted a semantic-basedgossipalgorithmsthatprovidescalability, robustnessand loadbalancing.
How P2Precworks • LatentDirichletAllocation(LDA) isused to automatically model the topics in the system • Training - Global level: identification of the complete set of topics • Inference - local (node) level: extraction of the topics of interest for the user • Dissemination of local information by a gossip algorithm • FOAF descriptor: topics of interest, trust level • At each gossip exchange, eachuseruchecksitslocal-view for relevantsimilarpeer with respecttopics of interests and friendship networks: • Iffounded, a demand of friendshipislaunched. • Querying • A key-word queryqisassociated a TTL and isroutedrecursively in a P2P top-k manner
Social GraphEmbeddingA. Kermarrec, V. Leroy and G. Trédan A proximitymetricbetweenusersenable to predictpotentialrelevant future relationships (Link Prediction) SoCS (Social Coordinate System) • Fullydistribuitedalgorithmthatembeds a social graph in an Eucliedeanspace • Nodesgetsassigned coordinate w.r.t. their social position • Community structureispreserved Force-basedembedding (FBE): Edgesrepresentsprings and nodesrepresentelectricallyequallychargedparticles. Edges(springs) attract the verticesthey link, whereasvertices(particles) repulse eachother. The embeddingisachieved once the systemreaches an equilibrium.
SoCSAlgorithm Social Neighbors: Nodesthathaveclose social positions. Graphneighborsand social neighbors of a nodeare notnecessarily the same. Eachnoderegularlyupdatesits position in the social space: • first gathers the positions of itsgraph and social neighbors • usingthesepositions computesthe forcesthat are applied to it, and derivesitsupdated social position • a gossip protocolprovidesto the node a list of itsnew social neighbors • thislist isthenused to compute new positions Similaritymetrics: SoCSwillrecommend to a nodeitsclosest social neighborsthat are notalreadygraphneighbors. • Common Neighbors, Jaccard, Adamic\Adar, PathLength, Katz…
SoCSAlgorithm (2) SoCSrelies on gossip to discover the social neighbors. Eachnoderuns a clusteringalgorithm (NeighborsPeer Sampling- NPS) in order to maintain and update its social neighbors list. Gossip protocolshavebeenshown to be cheap, robustagainstchurn, and to converge quickly
Decentralized Random WalksA. Kermarrec, V. Leroy, A. Moin and C. Thraves The application of random walks to decentralizedenvironmentsisdifferent from the centralizedversion. • CentralizedRS: Random walks are usedasclusteringmechanism(e.g. community discovery) • Decentralized RS: CD infeasible: the knowledge of eachpeerabout the P2P network islimited to itsneighborhood. ProposedApproach • Eachpeerisprovided with a neighborhoodcomposed of a small set of similarpeers by means of an epidemic (gossip) protocol; • Ratingsfor unknownitemsare estimatedby a random walk on the neighborhood. • Once peershavestabilizedtheirneighborhoodthey can calculaterecommendationsindipendently • Similaritymeasure: PearsonCorrelation, Jaccard
Random Walksobservedproperties The users in the neighborhood are modeledasMarkov Chain graphvertices, and a random walkisapplied on thisgraph. • A Markovchain can be represented by a directedgraphwherevertices are the states of the chain and edgesrepresent the transitionprobabilities from one state to another. Results: • Random walkworkswellwhen the data is so sparse thatclassicsimilaritymeasuresfail to detectmeaningful relation betweenusers; • Increasing the neighborhoodsize the accuracyincrease; • decentralizeduser-basedapproachesperformbetter (lowcomplexyty, high precision) thantheiritem-basedcounterparts in P2P recommenderapplications; • Cosine similarityperformedbetter in decentralized item-basedalgorithms, whilePearsoncorrelationworkedbetter for decentralizeduser-basedalgorithms
Conclusions • P2P Recommendersystemsare needed in order to overcomescalabilityand privacyissues • Severalapproacheswereanalyzed • Eachonerelying (to some extent) to gossip algorithm in order to maintain and update the overlay network • Allmostall the discussedapproachestakle the problem with a user-basedsimilaritystrategyexploitingclassical network theoryapproaches; • Unsupervised Link Prediction • Community Discovery • Force directedembedding
Bibliography • D. Almazro and G. Shahatah. A surveypaper on recommendersystems (2010) • F. Draidi and E. Pacitti. Demo of P2Prec: a Social-based P2P Recommendation System. (2011) • A. Kermarrec, V. Leroy, A. Moin and C. Thraves. Application of random walks to decentralizedrecommendersystems. (2010) • A. Kermarrec, V. Leroy and G. Trédan. Distributed social graphembedding. (2011) • R. Ormándi, I. Hegedas and M. Jelasity. Overlay management for fullydistributeduser-based collaborative filtering. (2010) …questions?