ranking systems manipulability and efficiency l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Ranking Systems: Manipulability and Efficiency PowerPoint Presentation
Download Presentation
Ranking Systems: Manipulability and Efficiency

Loading in 2 Seconds...

play fullscreen
1 / 53

Ranking Systems: Manipulability and Efficiency - PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on

Ranking Systems: Manipulability and Efficiency. Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6). *Work supported by NSF. ITR-0325453 . Ranking and Reputations. Reputations are important Webpage ranking: links are “recommendations”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Ranking Systems: Manipulability and Efficiency


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ranking systems manipulability and efficiency

Ranking Systems: Manipulability and Efficiency

Eric Friedman*, ORIE

Cornell University

(Currently visiting: Dept of CS,

U.C. Berkeley, 2005-6)

*Work supported by NSF. ITR-0325453

ranking and reputations
Ranking and Reputations
  • Reputations are important
    • Webpage ranking: links are “recommendations”
      • High ranks lead to more “clicks”
    • P2P: choosing partners
    • Ebay: reputations are crucial (and quite valuable).
      • Higher reputations lead to higher prices
    • PGP: web of trust.
    • Spam and DDoS protections
problems with reputation systems
Problems with Reputation Systems
  • Gaming reputation systems is becoming a serious problem.
    • P2P: seti@home, Kazaa-lite
    • Webpage ranking: link spamming
  • Note: most (all?) current reputation systems are ad-hoc
    • No formal requirements etc.
a research agenda understanding the tradeoffs between manipulability and efficiency
A research agenda:Understanding the tradeoffs between manipulability and efficiency
  • Quantify the manipulability of ranking systems.
  • Quantify the efficiency of ranking systems.
  • Find the ranking systems that are on the efficient frontier and maximize various objectives.
today s talk some first steps
Today’s talk (some first steps)
  • A framework for manipulability (w/Alice Cheng)
    • Characterization of manipulability of ranking systems.
  • Empirical analysis of PageRank on the WWW (w/Alice Cheng)
  • Evaluating the Efficiency of ranking mechanisms (work in progress)
part i goals and approach
Part I: Goals and Approach
  • Our goal: create a formalism for analyzing and designing reputation systems that are robust to attacks.
    • Here we focus on sybils, but although this is important in itself, our goals are much broader.
  • Note: the definitions were harder than the proofs.
  • Approach: Game theory, mechanism design (i.e., Arrows Theorem)
trust graphs
Trust Graphs

1

3

2

  • Most reputation systems use trust graphs:
    • G=(V,E)
    • e=(i,j) then T(e) = i’s (direct) trust of j.
    • higher T(e) is better
  • Reputation function: f(G)i = reputation of i.
  • Rank: i outranks j if f(G)i >f(G)j
    • Note: we focus on rank
  • Why use a trust graph?
    • Many (most?) interactions are 1st time interactions
      • (i,j)E

1

1

2

3

some representative reputation systems
Some Representative Reputation Systems
  • Pagerank and related systems (Brin and Page 98, Kleinberg 98, Guha et. al. 04)
    • Start at an arbitrary node and then take a random walk on the graph.
  • Flow methods (e.g., Flake et. al. 02, Chuang and Stoica 02)
    • Compute the max flow from i to j.
  • Shortest path method.
    • Let c(e)=1/T(e) then find the shortest path from i to j in terms of c’s.
sybils
Sybils
  • A single “agent” can replicate itself under a variety of pseudonyms.
sybil attacks
Sybil Attacks
  • Sybils are essentially unavoidable (Douceur 02)
  • Sybil clouds can forge trust among each other.
    • Using strong cryptography to prevent them is expensive and awkward.
sybils in practice
Sybils in Practice
  • Web ranking: Create a large number of dummy websites and then all link to each other.
  • P2P: create a large number of peers and then give each other high ratings
  • Ebay: fake transactions with yourself.
  • Amazon shopping: post high evaluations of your own products.
robustness against sybils
Robustness Against Sybils
  • Pagerank: not robust.
    • Empirically, can increase pageranks dramatically with a few sybils. (more later)
  • Max-flow: value robust but not rank robust.
  • Shortest path: robust.
robustness pagerank
Robustness: Pagerank
  • Pagerank: not robust.
robustness pagerank21
Robustness: Pagerank
  • Pagerank: not robust.
    • Create a “flower”
robustness maxflow
Robustness: Maxflow
  • Max-flow: Designed for value robustness
    • Flow into and out of sybil cloud cannot be changed!

Min cut

s

Sybil

Cloud

robustness maxflow23
Robustness: Maxflow
  • Max-flow: not rank robust
    • b is higher ranked than a

[1]

Min cut

a

1

0.7

b

0.5

[1.2]

robustness maxflow24
Robustness: Maxflow
  • Max-flow: not rank robust
    • a is higher ranked than b

[1]

a

1

0

b

0.5

[0.5]

robustness shortest path
Robustness: Shortest Path
  • Shortest path: robust
    • a is higher ranked than b

[1]

a

c=1

c=1

b

c=3

[2]

robustness shortest path26
Robustness: Shortest Path
  • Shortest path: robust
    • a is higher ranked than b
    • a can harm b, but a is already higher ranked than b
    • b cannot hurt a, since it is not on the shortest path to a

[1]

a

c=1

c=3

b

c=3

[3]

sybilproofness
Sybilproofness
  • Def: A sybil strategy for node i in G=(V,E) is G’=(V’,E’) and U’V’, such that by collapsing U’, G is obtained. (T’s are added together)
  • Def: f is k-sybilproof if there does not exist any pair of nodes i,j and a sybil strategy for i such that f(G)i< f(G)j and f(G’)r> f(G)j for rU and |U’|k+1.
  • Def: f is sybilproof if it is k-sybilproof for all k>0.
  • Key: sybils can only forge recommendations among each other.
results symmetric reputations
Results: Symmetric Reputations
  • Def: A reputation function is symmetric if it is covariant under graph isomorphism.
  • Theorem: There is no nontrivial symmetric sybilproof mechanism.
    • In fact, for any G, any node (except the top one) can improve their ranking via sybils
  • Theorem: There is no nontrivial symmetric k-sybilproof mechanism, for any k1.
    • (How often this occurs for small k is open.)
proof via the butterfly
Proof (via the butterfly)

j

s

i

G

U’

  • Sybilproofness: by symmetry, f(G’)j=f(G’)s
  • K-sybilproofness: build G’ one sybil at a time
results non symmetric
Results: Non-Symmetric
  • Theorem: There exist sybilproof reputation functions. (e.g., shortest path)
  • Def: Given a root node sV, let P be the set of all collections of edge disjoint paths* from s to i. Let g be a function from paths to reals and  be an (addition-like) operator on the reals.
results non symmetric31
Results: Non-Symmetric
  • Let f(G)i=max{P  P}{pP} g(p)
  • Max flow: g(p)=min{T(e)|ep}, =+
  • Shortest path:g(p)=min{T(e)|ep}, =min
  • Other generalizations
    • Leaky pipes etc.
results non symmetric32
Results: Non-Symmetric
  • Theorem: f as defined above is value sybilproof assuming
    • If p’ is an extension of p, then g(p’)<g(p).
    •  is nondecreasing and g is nondecreasing with respect to T.
    • If p=p’+p’’ then g(p)=g(p’)  g(p’’)
results non symmetric33
Results: Non-Symmetric
  • Theorem: f as defined above is ranksybilproof iff =max, assuming:
    • For any p there exist an extension p’ such that g(p)=g(p’).
  • I.e., f depends on the maximal path.
summary part i
Summary (Part I)
  • A framework for the analysis of the manipulability of ranking systems.
  • Key distinction: rank vs. value
  • Result 1: all symmetric ranking systems are manipulable.
  • Result 2: “flow based” ranking systems are not value manipulable but are rank manipulable.
  • Result 3: “path based” ranking systems are not manipulable.
part ii empirical analysis of pagerank
Part II: Empirical Analysis of PageRank
  • (Joint with Alice Cheng)
  • (Inspired by Zhang et. al. on collusion)
  • Stanford web matrix -- ~280k pages.
  • Question:How often are a small number of sybils helpful?
  • Answer: Surprisingly often!
summary of empirical
Summary of Empirical
  • Analytic approximations for these.
  • PageRank is quite manipulable
    • Especially for low ranked pages
      • (but that’s where automated methods are supposed to work!)
part iii quantifying the efficiency of ranking mechanisms
Part III: Quantifying the Efficiency of Ranking Mechanisms
  • Work in progress – some preliminary results.
  • Is FlowRank or PageRank better than PathRank?
model
Model
  • Random graph model (descriptive, not constructive)
  • Follow the intuition behind pagerank
    • Pages link more to “better pages”
    • Better pages are more selective.
    • Pr(link)=f(qi,qj)
      • Increasing in qj
      • FOSD in qi
    • Average outdegree = k, (n∞)
    • (many results have k∞, and miss important aspects of ranking.)
finding baddies
Finding “Baddies”
  • 2 layer example:
    • ½ nodes are H and ½ L
    • L’s link uniformly at random
    • H’s link to H with (relative) probability (1+a) and to L’s with (1-a).
    • a=0, random graph
    • a=1, two tiered graph
statistical inference
Statistical Inference
  • Now, ranking is a problem of statistical inference
    • G is a random variable
    • r is a statistical estimate of true qualities
    • Note: unlike most inference problems we only have a single sample
3 methods
3 methods
  • PageRank
  • InRank: rank by indegree
  • MLRank: compute a maximum likelihood estimate.
results
Results
  • Pr(error)=Pr(ri>rj|qi<qj)
  • InRank: difference of Poissons
  • PageRank: two stage calculation
    • First by quality then statistical manipulations of PageRank equations.
  • MLRank: find a subgraph with the maximal number of edges.
    • NP complete
    • Implemented a greedy algorithm
results48
Results

PageRank

PageRank

InRank

Pr(error)

InRank

MLRank

MLRank

a

results49
Results
  • InRank better than PageRank when graph is close to random and vice versa. (General Theorem)
  • Differences can be significant!
  • MLRank is significantly better.
some intuition
Some Intuition
  • Case a=0 (Sketch -- ignoring special cases)
  • PageRank
    • rj’s are iid (in limit)
  • InRank
  • Theorem: PageRank is more random.
  • (But, also need to consider expected values)
concluding comments
Concluding Comments
  • Reputation systems should be designed from requirements and subject to formal validation.
    • Ex: What problem does pagerank solve? How well does it do it?
    • Ex: Why is Flowrank better than Pathrank? Is it? When and why?
  • Aside: fighting link spam
    • Results show that most of the proposed methods can be defeated!
    • Perhaps they work so well because they are not being used and spammers haven’t tried to defeat them. Endogeneity is important!
concluding comments52
Concluding Comments
  • Reputation systems are important and deserve formal, careful, study!
    • Axiomatic analyses.
    • Econometric analyses.
  • Lots of challenging open problems!