Some Maths of use in the Computer Systems World

1 / 70

# Some Maths of use in the Computer Systems World - PowerPoint PPT Presentation

Some Maths of use in the Computer Systems World. Milan Vojnović Microsoft Research Cambridge, UK. CCA Industrial Seminar, University of Cambridge, October 25, 2012. Outline. Online contests Distributed systems Algorithms for big data. TCO Contest Design. Some Questions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Some Maths of use in the Computer Systems World' - kadeem-byers

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Some Maths of use in the Computer Systems World

Milan Vojnović

Microsoft ResearchCambridge, UK

CCA Industrial Seminar, University of Cambridge, October 25, 2012

Outline
• Online contests
• Distributed systems
• Algorithms for big data
Some Questions
• How do we best design an online contest?
• How to allocate prizes?
• How to infer skills of players?
• There is, in fact, quite some maths behind it !
Standard All-Pay Contest

)

Prize allocation

Game Model

Payoff:

Valuation

or ability

or skill

Winning probability

Production cost

Game Model (cont’d)
• Strategically equivalent towhere = marginal production cost
Types of Games
• Complete information: assumed to be common knowledge
• Incomplete information (or “private values”): assumed to be a sample from a distribution , where is a common knowledge
• Special case: i. i. d. valuations
Nash Equilibrium
• A vector of efforts is said to be a pure-strategy Nash equilibrium if for every player :
• Mixed-strategy: players use randomized strategies
Complete Information Game
• There exists no pure-strategy Nash equilibrium
• There exist a mixed-strategy Nash equilibrium, not necessarily unique
• Fully characterized by Baye, Kovenock and de Vries (1996)
Example of Two Players
• Equilibrium bid distributions

0

0

Example of Two Players (cont’d)
• Expected total effort:
• Expected maximum individual effort:
Incomplete Information Game
• Assume that valuations are i.i.d. according to distribution F on [0,1]
• There exists a symmetric pure-strategy Nash equilibrium
• Total expected effort:
Optimality of the Winner-Take-All
• Suppose a contest owner wants to maximize the expected total effort, or maximum individual effort
• Theorem: For the contest among players with i.i.d. valuations, it is optimal to allocate the whole prize to the best performing player[Moldovanu and Sela 2001, Chawla, Hartline and Sivan 2012]
Optimal Contest
• For a contest among players with i. i. d. valuations according to distribution the expected maximum individual effort is maximized by that maximizes where[Chawla, Hartline and Sivan, 2012]
• The result follows from celebrated revenue equivalence theorem [Myerson, 81]
• Replacing with corresponds to maximizing the expected total effort

virtual valuation

Optimal Contest (cont’d)
• Suppose is a monotone non-decreasing function for every positive integer
• The optimal all-pay contest uses the winner-take-all prize allocation rule with minimum valuation
• Ex. : the minimum required effort
Parallel Contests

[DiPalantino and V., 2009]

Parallel Contests
• There exists a symmetric Bayes-Nash equilibrium
• Segmentation into skill levels
Reward vs. Participation
• In a many auctions limit, the expected number of participants per contest:
• = expected number per auction
• = fraction of auctions of class
• Diminishing returns of participation with reward
Does this Make Any Practical Sense ?
• Rewards vs. participation observed at Taskcn

any rate

once a month

every fourth day

every second day

Statistical Inference of Skill
• Probabilistic model: skill to performance
• Ex. where represents skill of player
• Ex 1. probit
• Ex 2. logit
Maximum Likelihood Approach
• Ex. Bradley-Terry Model [Zermelo, 1029]
• Iterative solver: given an irreducible matrix of outcomes , and initial value :Guaranteed convergence to unique limit point (up to a multiplicative constant) [Ford, 1957]
Online Algorithms
• Elo Rating System:
• More complicated variant
• Arbitrary number of participants in a contest
Bayesian Approach
• Skill assumed to be a random variable
• Ex.
• Posterior distribution adjusted based on observed match outcomes
• Ex. Graphical model
• Examples:
• Glicko rating system
• TrueSkill™ (Xbox Halo game)
Outline
• Online contests
• Distributed systems
• Algorithms for big data
A Catalogue of Problems
• Ranking of information items
• Information diffusion
• Opinion formation
• Distributed hypothesis testing
Distributed Mode Selection
• Majority selection: given two alternatives, find the majority winner (a.k.a. consensus, quantile)
• Plurality selection: given alternatives, find a plurality winner
• Distributed preference data across nodes in a system
• How do simple algorithms with bounded memory and communication perform ?

0

1

1

0

0

1

0

1

1

• Each node to correctly decide whether 0 or 1 is preferred by majority of nodes

0

1

0

0

0

1

0

1

1

Questions
• Design an efficient algorithm
• Probability of correctness
• Convergence speed
Related WorkClassical Voter Model
• Note copies the state of the contacted node
• Binary memory and communication
• Probability of incorrect outcome: [More general result: Hassin and Peleg, 2001]

0

1

1

0

0

1

0

1

Related Workm-ary Hypothesis Testing
• Q: How many states does S need to decide the correct hypothesis w. p. going to 1 with the number of observations ?
• m+1 necessary and sufficient (Koplowitz, 1975)

000110111110100011

Hi

S

i. i. d. mean

Ternary Algorithm
• Three states: 0, e, 1

0

0

e

1

e

1

0

0

e

e

0

1

[Perron, Vasudevanand V., 2009]

Ternary Dynamics
• Assume complete graph
• Markov process
• U = number of nodes in state 0
• V = number of nodes in state 1
• n = total number of nodes
Probability of Error
• Theorem: Assume ,
Probability of Error (Cont’d)
• Lemma: is a solution ofwith boundary conditions for , and , for
• Equivalence to the probability of hitting
Ballot Theorem Argument

Number of paths

from to that do not intersect the line

The Error Exponent
• Theorem: For , , large
• Ob.: Exponential decay for large
Plurality Selection
• m alternatives
• 2m states: weak strong

m

2

1

s

s

s’

s’

s’

s’

s’

s’

s

s

s

s

s’

s’

observer

[Jung, Kim and V., 2012]

Convergence Time
• Suppose
• Given is -convergence time:
Quaternary Algorithm
• Four states
• Update rules: swap and annihilate

0

e0

e1

1

0

e0

0

e1

0

1

e0

e1

e0

1

e1

1

e0

0

e0

0

e1

e0

e1

e0

1

e1

1

e1

Probability of Correctness
• For any connected graph, convergence to the correct state w. p. 1[Benezit et al, 2010]
Convergence Time
• Each edge activated at instances of a Poisson process with rate
• Define the family of matrices , for every non-empty subset of nodes :
Convergence Time (cont’d)
• Lemma: For any finite graph G, there exists such that if is an eigenvalue of matrix then
Convergence Time (cont’d)
• Theorem: Let be the smallest time at which none of the nodes is in the minority state or 1. Then, it holds

[Draief and V., 2012]

Example: Complete Graph
• Lemma:
• Corollary: where
• Compare with the ternary algorithm
Outline
• Online contests
• Distributed Systems
• Algorithms for Big Data
Example Types of Problems
• Database queries
• Ex. Sum, histogram, quantiles, property testing
• Machine learning
• Ex. Classification, distributed regression, multi-arm bandits
• Combinatorial optimization
• Ex. graph partition
Systems
• Batch processing
• Real-time analytics
• Graph processing
• Machine learning

Cosmos

Storm

Impala

Pregel

Apache Giraph

TLC

SIGMA

Vowpal Wabbit

Example: Count Tracking
• Problem: continuously track a partial sum of values within relative accuracy

Type of input stream

Communication cost

General case

[Huang, Yi and Zhang, 2012]

Monotonic partial sum

Random permutation

Upper Bound
• Push based scheme:
• Each site reports to the coordinator upon receiving the -th update with probability

S

S1

S = S1+ … + Sk

Mi = 1

Sk

S

S, S1

site

coordinator

S, Sk

Xi

site

Lower Bound
• A key query problem:
• Necessary and sufficient: messages to answer the query correctly with constant probability of error

If , is it or ?

1

2

k

i. i. d.

Some Figures of Scale

Currently, about 1 billion users with average degree 130, and 1.13 trillion "likes” per month

More than1.5 billion of social relations

Web graph had over a trillion edges in 2011

Graph Partitioning Problem
• Given a graph and , partition the vertices into partitions such that the number of vertices per partition is balanced and the number of cut-edges is minimized
• Graph presented as a stream of vertices
• Applications: parallel computations, community detection
• Streaming algorithm for graph partitioning with provable approximation guarantees ?
End of Talk
• I must meditate further on thisJoseph Louis Lagrange