1 / 53

Ranking Systems: Manipulability and Efficiency

Ranking Systems: Manipulability and Efficiency. Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6). *Work supported by NSF. ITR-0325453 . Ranking and Reputations. Reputations are important Webpage ranking: links are “recommendations”

preston
Download Presentation

Ranking Systems: Manipulability and Efficiency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ranking Systems: Manipulability and Efficiency Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6) *Work supported by NSF. ITR-0325453

  2. Ranking and Reputations • Reputations are important • Webpage ranking: links are “recommendations” • High ranks lead to more “clicks” • P2P: choosing partners • Ebay: reputations are crucial (and quite valuable). • Higher reputations lead to higher prices • PGP: web of trust. • Spam and DDoS protections

  3. Problems with Reputation Systems • Gaming reputation systems is becoming a serious problem. • P2P: seti@home, Kazaa-lite • Webpage ranking: link spamming • Note: most (all?) current reputation systems are ad-hoc • No formal requirements etc.

  4. A research agenda:Understanding the tradeoffs between manipulability and efficiency • Quantify the manipulability of ranking systems. • Quantify the efficiency of ranking systems. • Find the ranking systems that are on the efficient frontier and maximize various objectives.

  5. Today’s talk (some first steps) • A framework for manipulability (w/Alice Cheng) • Characterization of manipulability of ranking systems. • Empirical analysis of PageRank on the WWW (w/Alice Cheng) • Evaluating the Efficiency of ranking mechanisms (work in progress)

  6. Part I: Goals and Approach • Our goal: create a formalism for analyzing and designing reputation systems that are robust to attacks. • Here we focus on sybils, but although this is important in itself, our goals are much broader. • Note: the definitions were harder than the proofs. • Approach: Game theory, mechanism design (i.e., Arrows Theorem)

  7. Trust Graphs 1 3 2 • Most reputation systems use trust graphs: • G=(V,E) • e=(i,j) then T(e) = i’s (direct) trust of j. • higher T(e) is better • Reputation function: f(G)i = reputation of i. • Rank: i outranks j if f(G)i >f(G)j • Note: we focus on rank • Why use a trust graph? • Many (most?) interactions are 1st time interactions • (i,j)E 1 1 2 3

  8. Some Representative Reputation Systems • Pagerank and related systems (Brin and Page 98, Kleinberg 98, Guha et. al. 04) • Start at an arbitrary node and then take a random walk on the graph. • Flow methods (e.g., Flake et. al. 02, Chuang and Stoica 02) • Compute the max flow from i to j. • Shortest path method. • Let c(e)=1/T(e) then find the shortest path from i to j in terms of c’s.

  9. Pagerank = Random Walk on Graph

  10. Maxflow = compute flow from a chosen source to a node t s

  11. Shortest Path t s

  12. Sybils • A single “agent” can replicate itself under a variety of pseudonyms.

  13. Sybil Attacks • Sybils are essentially unavoidable (Douceur 02) • Sybil clouds can forge trust among each other. • Using strong cryptography to prevent them is expensive and awkward.

  14. Sybils in Practice • Web ranking: Create a large number of dummy websites and then all link to each other. • P2P: create a large number of peers and then give each other high ratings • Ebay: fake transactions with yourself. • Amazon shopping: post high evaluations of your own products.

  15. Robustness Against Sybils • Pagerank: not robust. • Empirically, can increase pageranks dramatically with a few sybils. (more later) • Max-flow: value robust but not rank robust. • Shortest path: robust.

  16. Robustness: Pagerank • Pagerank: not robust.

  17. Robustness: Pagerank • Pagerank: not robust. • Create a “flower”

  18. Robustness: Maxflow • Max-flow: Designed for value robustness • Flow into and out of sybil cloud cannot be changed! Min cut s Sybil Cloud

  19. Robustness: Maxflow • Max-flow: not rank robust • b is higher ranked than a [1] Min cut a 1 0.7 b 0.5 [1.2]

  20. Robustness: Maxflow • Max-flow: not rank robust • a is higher ranked than b [1] a 1 0 b 0.5 [0.5]

  21. Robustness: Shortest Path • Shortest path: robust • a is higher ranked than b [1] a c=1 c=1 b c=3 [2]

  22. Robustness: Shortest Path • Shortest path: robust • a is higher ranked than b • a can harm b, but a is already higher ranked than b • b cannot hurt a, since it is not on the shortest path to a [1] a c=1 c=3 b c=3 [3]

  23. Sybilproofness • Def: A sybil strategy for node i in G=(V,E) is G’=(V’,E’) and U’V’, such that by collapsing U’, G is obtained. (T’s are added together) • Def: f is k-sybilproof if there does not exist any pair of nodes i,j and a sybil strategy for i such that f(G)i< f(G)j and f(G’)r> f(G)j for rU and |U’|k+1. • Def: f is sybilproof if it is k-sybilproof for all k>0. • Key: sybils can only forge recommendations among each other.

  24. Results: Symmetric Reputations • Def: A reputation function is symmetric if it is covariant under graph isomorphism. • Theorem: There is no nontrivial symmetric sybilproof mechanism. • In fact, for any G, any node (except the top one) can improve their ranking via sybils • Theorem: There is no nontrivial symmetric k-sybilproof mechanism, for any k1. • (How often this occurs for small k is open.)

  25. Proof (via the butterfly) j s i G U’ • Sybilproofness: by symmetry, f(G’)j=f(G’)s • K-sybilproofness: build G’ one sybil at a time

  26. Results: Non-Symmetric • Theorem: There exist sybilproof reputation functions. (e.g., shortest path) • Def: Given a root node sV, let P be the set of all collections of edge disjoint paths* from s to i. Let g be a function from paths to reals and  be an (addition-like) operator on the reals.

  27. Results: Non-Symmetric • Let f(G)i=max{P  P}{pP} g(p) • Max flow: g(p)=min{T(e)|ep}, =+ • Shortest path:g(p)=min{T(e)|ep}, =min • Other generalizations • Leaky pipes etc.

  28. Results: Non-Symmetric • Theorem: f as defined above is value sybilproof assuming • If p’ is an extension of p, then g(p’)<g(p). •  is nondecreasing and g is nondecreasing with respect to T. • If p=p’+p’’ then g(p)=g(p’)  g(p’’)

  29. Results: Non-Symmetric • Theorem: f as defined above is ranksybilproof iff =max, assuming: • For any p there exist an extension p’ such that g(p)=g(p’). • I.e., f depends on the maximal path.

  30. Summary (Part I) • A framework for the analysis of the manipulability of ranking systems. • Key distinction: rank vs. value • Result 1: all symmetric ranking systems are manipulable. • Result 2: “flow based” ranking systems are not value manipulable but are rank manipulable. • Result 3: “path based” ranking systems are not manipulable.

  31. Part II: Empirical Analysis of PageRank • (Joint with Alice Cheng) • (Inspired by Zhang et. al. on collusion) • Stanford web matrix -- ~280k pages. • Question:How often are a small number of sybils helpful? • Answer: Surprisingly often!

  32. Value Magnification: 1 sybil

  33. Value Magnification – by # of sybils

  34. Rank as a function of old Rank -- 1-Sybil

  35. Effect of e on values

  36. e on ranks

  37. Summary of Empirical • Analytic approximations for these. • PageRank is quite manipulable • Especially for low ranked pages • (but that’s where automated methods are supposed to work!)

  38. Part III: Quantifying the Efficiency of Ranking Mechanisms • Work in progress – some preliminary results. • Is FlowRank or PageRank better than PathRank?

  39. Model • Random graph model (descriptive, not constructive) • Follow the intuition behind pagerank • Pages link more to “better pages” • Better pages are more selective. • Pr(link)=f(qi,qj) • Increasing in qj • FOSD in qi • Average outdegree = k, (n∞) • (many results have k∞, and miss important aspects of ranking.)

  40. Finding “Baddies” • 2 layer example: • ½ nodes are H and ½ L • L’s link uniformly at random • H’s link to H with (relative) probability (1+a) and to L’s with (1-a). • a=0, random graph • a=1, two tiered graph

  41. Statistical Inference • Now, ranking is a problem of statistical inference • G is a random variable • r is a statistical estimate of true qualities • Note: unlike most inference problems we only have a single sample

  42. 3 methods • PageRank • InRank: rank by indegree • MLRank: compute a maximum likelihood estimate.

  43. Results • Pr(error)=Pr(ri>rj|qi<qj) • InRank: difference of Poissons • PageRank: two stage calculation • First by quality then statistical manipulations of PageRank equations. • MLRank: find a subgraph with the maximal number of edges. • NP complete • Implemented a greedy algorithm

  44. Results PageRank PageRank InRank Pr(error) InRank MLRank MLRank a

  45. Results • InRank better than PageRank when graph is close to random and vice versa. (General Theorem) • Differences can be significant! • MLRank is significantly better.

  46. Some Intuition • Case a=0 (Sketch -- ignoring special cases) • PageRank • rj’s are iid (in limit) • InRank • Theorem: PageRank is more random. • (But, also need to consider expected values)

More Related