Efficient algorithms for online decinsion problems

Efficient algorithms foronline decinsion problems Adam Kalai, Santosh Vempala Seminar on Experts and Bandits, Fall 17/18 Ran Hochshtet

Contents • Online decision problem • N Experts • Online shortest paths • Tree update problem

Introduction • Online decision problem • No knowledge of the future • Each period we pick choice • Pay • Goal: Minimize the regret

Linear generalization • Series of decisions from infinite set • leads to state • Making decision in state costs • Total cost =

Linear generalization • - state • - decision • M computes the best single decision in hindsight

Predicting from expert advice • experts • Each period we pick expert • Pay • Goal: Minimize the regret

Predicting from expert advice • experts problem • - the costs vector

Motivation • Consider example with two experts • The costs are: • Follow the leader always incurs cost of 1The total cost is • Using perturbations we can achieve

On each period t: • Choose uniformly at random from the cube • Use

On each period t: • Choose at random according to the exponential distribution • Use

Notations • for • for all • for all • for all

Theorem 1.1 • Let be a state sequence • (a) Running with gives:

Theorem 1.1 • Let be a state sequence • (b) For nonnegative , gives:

Theorem 1.1 • If or are known: • For FPL:

Experts problem • It seems that , • In our algorithm the worst case is wheneach period only one expert incurs cost Min-cost is b After we choose b, there is a chance we choose c

Experts problem • , • By Theorem (b):

Online shortest paths • Input • Directed Graph - • Pair of nodes • Each period pick path from to • Then times on all edges are revealed • The cost is the sum of times on the chosen path

Online shortest paths • is the number of edges • - the times vector

Online shortest paths • Use • On each period • For each edge pick from exp. Distribution (same as ) • = the total times on edge so far • Use shortest path in the graph with weights

Online shortest paths • , • By Theorem :

Proof of Theorem - • “Be the leader” – use instead of • “Be the leader” has no regret • Prove by induction

Proof of Theorem - • We show that: • For – trivial • Induction step from to :

Lemma • We want to show that perturbations do not hurt too much • Still “be the leader” algorithm • For any state sequence , any and any vectors

Proof of Lemma • Pretend

Proof of Lemma

Proof of Theorem - • Use , for all • No need to choose new each period • Applying Lemma :

Proof of Theorem - • Now we return to use instead of • We need to show that:

Proof of Theorem - • Key idea: the distributions over and are similar • If the cubes are identical, i.e. , then • If they overlap on fraction of their volume:

Lemma • For any , the cubes and overlap in at least a fraction

Proof of Lemma • Take a random point ,if , then for some , • With union bound we get:

Proof of Theorem - • By lemma : • Each period the difference between using and is at most • We get:

The tree update problem • Maintain a binary search tree over items • There is an unknown sequence of accesses • The cost is the number of comparisons • Equals to the depth in the tree

The tree update problem • We can solve the problem with • Each period we find the best tree so far, and use it • The problem: • For each access we do expensive computation

The tree update problem • Follow the lazy leading tree: • For , let and choose randomly from • Start with best tree as if there were accesses to node • After each access to item :(a) (b) if theni. ii. Compute best tree as if there were accesses to node

Calling the oracle can be a computationally expensive • We want to minimize the numbers of times we use • Trick: use as often as possible

is equivalent to in terms of expected cost • rarely calls the oracle • rarely changes decision from one period to the next

Once, choose uniformly • Determine a grid: • On period . Use where is the unique point in • If - no need to re-evaluate

Lemma • For any fixed sequence of states and (also and ) have identical expectations on each period . • The probability of (or ) performing an update is at most .

Proof of Lemma • chooses a uniformly random grid of spacing • There is exactly one grid point inside • By symmetry is uniformly distributed over • Same as - uniform over

Lemma 3.2:For any , the cubes and overlap in at least a fraction Proof of Lemma • In each update: • The grid point of is not in the cube • By lemma 3.2:

Summary • Online decision problem • N Experts • Online shortest paths • Tree update problem

THANKS! ANY QUESTIONS?

Efficient algorithms for online decinsion problems