- 156 Views
- Uploaded on

Download Presentation
## Ranking Systems: Manipulability and Efficiency

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Ranking Systems: Manipulability and Efficiency

Eric Friedman*, ORIE

Cornell University

(Currently visiting: Dept of CS,

U.C. Berkeley, 2005-6)

*Work supported by NSF. ITR-0325453

Ranking and Reputations

- Reputations are important
- Webpage ranking: links are “recommendations”
- High ranks lead to more “clicks”
- P2P: choosing partners
- Ebay: reputations are crucial (and quite valuable).
- Higher reputations lead to higher prices
- PGP: web of trust.
- Spam and DDoS protections

Problems with Reputation Systems

- Gaming reputation systems is becoming a serious problem.
- P2P: seti@home, Kazaa-lite
- Webpage ranking: link spamming
- Note: most (all?) current reputation systems are ad-hoc
- No formal requirements etc.

A research agenda:Understanding the tradeoffs between manipulability and efficiency

- Quantify the manipulability of ranking systems.
- Quantify the efficiency of ranking systems.
- Find the ranking systems that are on the efficient frontier and maximize various objectives.

Today’s talk (some first steps)

- A framework for manipulability (w/Alice Cheng)
- Characterization of manipulability of ranking systems.
- Empirical analysis of PageRank on the WWW (w/Alice Cheng)
- Evaluating the Efficiency of ranking mechanisms (work in progress)

Part I: Goals and Approach

- Our goal: create a formalism for analyzing and designing reputation systems that are robust to attacks.
- Here we focus on sybils, but although this is important in itself, our goals are much broader.
- Note: the definitions were harder than the proofs.
- Approach: Game theory, mechanism design (i.e., Arrows Theorem)

Trust Graphs

1

3

2

- Most reputation systems use trust graphs:
- G=(V,E)
- e=(i,j) then T(e) = i’s (direct) trust of j.
- higher T(e) is better
- Reputation function: f(G)i = reputation of i.
- Rank: i outranks j if f(G)i >f(G)j
- Note: we focus on rank
- Why use a trust graph?
- Many (most?) interactions are 1st time interactions
- (i,j)E

1

1

2

3

Some Representative Reputation Systems

- Pagerank and related systems (Brin and Page 98, Kleinberg 98, Guha et. al. 04)
- Start at an arbitrary node and then take a random walk on the graph.
- Flow methods (e.g., Flake et. al. 02, Chuang and Stoica 02)
- Compute the max flow from i to j.
- Shortest path method.
- Let c(e)=1/T(e) then find the shortest path from i to j in terms of c’s.

Sybils

- A single “agent” can replicate itself under a variety of pseudonyms.

Sybil Attacks

- Sybils are essentially unavoidable (Douceur 02)
- Sybil clouds can forge trust among each other.
- Using strong cryptography to prevent them is expensive and awkward.

Sybils in Practice

- Web ranking: Create a large number of dummy websites and then all link to each other.
- P2P: create a large number of peers and then give each other high ratings
- Ebay: fake transactions with yourself.
- Amazon shopping: post high evaluations of your own products.

Robustness Against Sybils

- Pagerank: not robust.
- Empirically, can increase pageranks dramatically with a few sybils. (more later)
- Max-flow: value robust but not rank robust.
- Shortest path: robust.

Robustness: Pagerank

- Pagerank: not robust.

Robustness: Pagerank

- Pagerank: not robust.
- Create a “flower”

Robustness: Maxflow

- Max-flow: Designed for value robustness
- Flow into and out of sybil cloud cannot be changed!

Min cut

s

Sybil

Cloud

Robustness: Shortest Path

- Shortest path: robust
- a is higher ranked than b
- a can harm b, but a is already higher ranked than b
- b cannot hurt a, since it is not on the shortest path to a

[1]

a

c=1

c=3

b

c=3

[3]

Sybilproofness

- Def: A sybil strategy for node i in G=(V,E) is G’=(V’,E’) and U’V’, such that by collapsing U’, G is obtained. (T’s are added together)
- Def: f is k-sybilproof if there does not exist any pair of nodes i,j and a sybil strategy for i such that f(G)i< f(G)j and f(G’)r> f(G)j for rU and |U’|k+1.
- Def: f is sybilproof if it is k-sybilproof for all k>0.
- Key: sybils can only forge recommendations among each other.

Results: Symmetric Reputations

- Def: A reputation function is symmetric if it is covariant under graph isomorphism.
- Theorem: There is no nontrivial symmetric sybilproof mechanism.
- In fact, for any G, any node (except the top one) can improve their ranking via sybils
- Theorem: There is no nontrivial symmetric k-sybilproof mechanism, for any k1.
- (How often this occurs for small k is open.)

Proof (via the butterfly)

j

s

i

G

U’

- Sybilproofness: by symmetry, f(G’)j=f(G’)s
- K-sybilproofness: build G’ one sybil at a time

Results: Non-Symmetric

- Theorem: There exist sybilproof reputation functions. (e.g., shortest path)
- Def: Given a root node sV, let P be the set of all collections of edge disjoint paths* from s to i. Let g be a function from paths to reals and be an (addition-like) operator on the reals.

Results: Non-Symmetric

- Let f(G)i=max{P P}{pP} g(p)
- Max flow: g(p)=min{T(e)|ep}, =+
- Shortest path:g(p)=min{T(e)|ep}, =min
- Other generalizations
- Leaky pipes etc.

Results: Non-Symmetric

- Theorem: f as defined above is value sybilproof assuming
- If p’ is an extension of p, then g(p’)<g(p).
- is nondecreasing and g is nondecreasing with respect to T.
- If p=p’+p’’ then g(p)=g(p’) g(p’’)

Results: Non-Symmetric

- Theorem: f as defined above is ranksybilproof iff =max, assuming:
- For any p there exist an extension p’ such that g(p)=g(p’).
- I.e., f depends on the maximal path.

Summary (Part I)

- A framework for the analysis of the manipulability of ranking systems.
- Key distinction: rank vs. value
- Result 1: all symmetric ranking systems are manipulable.
- Result 2: “flow based” ranking systems are not value manipulable but are rank manipulable.
- Result 3: “path based” ranking systems are not manipulable.

Part II: Empirical Analysis of PageRank

- (Joint with Alice Cheng)
- (Inspired by Zhang et. al. on collusion)
- Stanford web matrix -- ~280k pages.
- Question:How often are a small number of sybils helpful?
- Answer: Surprisingly often!

Summary of Empirical

- Analytic approximations for these.
- PageRank is quite manipulable
- Especially for low ranked pages
- (but that’s where automated methods are supposed to work!)

Part III: Quantifying the Efficiency of Ranking Mechanisms

- Work in progress – some preliminary results.
- Is FlowRank or PageRank better than PathRank?

Model

- Random graph model (descriptive, not constructive)
- Follow the intuition behind pagerank
- Pages link more to “better pages”
- Better pages are more selective.
- Pr(link)=f(qi,qj)
- Increasing in qj
- FOSD in qi
- Average outdegree = k, (n∞)
- (many results have k∞, and miss important aspects of ranking.)

Finding “Baddies”

- 2 layer example:
- ½ nodes are H and ½ L
- L’s link uniformly at random
- H’s link to H with (relative) probability (1+a) and to L’s with (1-a).
- a=0, random graph
- a=1, two tiered graph

Statistical Inference

- Now, ranking is a problem of statistical inference
- G is a random variable
- r is a statistical estimate of true qualities
- Note: unlike most inference problems we only have a single sample

3 methods

- PageRank
- InRank: rank by indegree
- MLRank: compute a maximum likelihood estimate.

Results

- Pr(error)=Pr(ri>rj|qi<qj)
- InRank: difference of Poissons
- PageRank: two stage calculation
- First by quality then statistical manipulations of PageRank equations.
- MLRank: find a subgraph with the maximal number of edges.
- NP complete
- Implemented a greedy algorithm

Results

- InRank better than PageRank when graph is close to random and vice versa. (General Theorem)
- Differences can be significant!
- MLRank is significantly better.

Some Intuition

- Case a=0 (Sketch -- ignoring special cases)
- PageRank
- rj’s are iid (in limit)
- InRank
- Theorem: PageRank is more random.
- (But, also need to consider expected values)

Concluding Comments

- Reputation systems should be designed from requirements and subject to formal validation.
- Ex: What problem does pagerank solve? How well does it do it?
- Ex: Why is Flowrank better than Pathrank? Is it? When and why?
- Aside: fighting link spam
- Results show that most of the proposed methods can be defeated!
- Perhaps they work so well because they are not being used and spammers haven’t tried to defeat them. Endogeneity is important!

Concluding Comments

- Reputation systems are important and deserve formal, careful, study!
- Axiomatic analyses.
- Econometric analyses.
- Lots of challenging open problems!

Download Presentation

Connecting to Server..