1 / 1

The Forgetron: A kernel-based Perceptron on a fixed budget

Ofer Dekel Shai Shalev-Shwartz Yoram Singer. The Forgetron: A kernel-based Perceptron on a fixed budget. The Hebrew University Jerusalem, Israel. Problem Setting. The Forgetron. Proof Technique. Mistake Bound. Online learning: Choose an initial hypothesis f 0 For t=1,2,…

Download Presentation

The Forgetron: A kernel-based Perceptron on a fixed budget

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ofer Dekel Shai Shalev-Shwartz Yoram Singer The Forgetron: A kernel-based Perceptron on a fixed budget The Hebrew University Jerusalem, Israel Problem Setting The Forgetron Proof Technique Mistake Bound • Online learning: • Choose an initial hypothesis f0 • For t=1,2,… • Receive an instance xt and predict sign(ft(xt)) • If (yt ft(xt) <= 0) • update the hypothesis f • Goal: minimize the number of prediction mistakes • Kernel-based hypotheses • Example: the dual Perceptron • is always 1 • Initial hypothesis: • Update rule: • Competitive analysis • Compete against any g: • Goal: A provably correct algorithm on a budget: |It| <= B • Initilize: • For t=1,2… • define • Receive an instance xt, predict sign(ft(xt)), and then receive yt • If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update: • If |I’t|<= B skip the next two steps • define rt = min It • Progress measure: can be rewritten as • The Perceptron update leads to positive progress: • The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ? • A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1 • Budget B >= 84 • Competitor g s.t. • Then, 1 2 3 ... ... t-1 t Deviation due to Shrinking Step (1) - Perceptron Experiments • Case I: the shrinking is projection onto the ball of radius U = ||g||: • Case II: aggressive shrinking define • Compare to Crammer, Kandola, Singer (NIPS’03) • Display online error vs. budget constraint 1 2 3 ... ... t-1 t hinge lossmax { 0 , 1 - ytg(xt) } MNIST USPS Step (2) - Shrinking Deviation due to Removal define • If example (x,y) is activated on round r and deactivated on round t then, Hardness Result synthetic (10% label noise) census-income (adult) 1 2 3 ... ... t-1 t • For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss. • for each ft based on B examples, exists x in X such that ft(x) = 0 • norm of the competitor is • we cannot compete with hypotheses whose norm larger than Step (3) - Removal Perceptron’s progress Deviation due to removal: 1 2 3 ... ... t-1 t

More Related