batch online learning n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
batch online learning PowerPoint Presentation
Download Presentation
batch online learning

Loading in 2 Seconds...

play fullscreen
1 / 13

batch online learning - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

transductive. i.i.d. i.i.d. [Littlestone89]. batch online learning. Toyota Technological Institute (TTI). Adam Kalai. Sham Kakade. Batch learning vs. Agnostic model [Kearns,Sch- apire,Sellie94]. (x 1 ,y 1 )…(x n ,y n ) 2 X £ { – , + }. –. X. +.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'batch online learning' - sawyer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
batch online learning

transductive

i.i.d.

i.i.d.

[Littlestone89]

batch online learning

Toyota Technological Institute (TTI)

Adam Kalai

Sham Kakade

batch learning vs
Batch learning vs.

Agnostic model[Kearns,Sch-

apire,Sellie94]

(x1,y1)…(xn,yn) 2X£ {–,+}

X

+

dist 

+

+

+

+

+

+

+

Alg.H

+

+

(x1,y1),…,(xn,yn)

h 2F

+

+

+

+

Def. H learns F if, 8:

E[err(h)]·minf2Ferr(f)+n-c

and H runs in time poly(n)

+

+

x1

+

+

+

+

+

+

+

+

+

+

Online learning

arbitrary

dist.  over X£ {–,+}

X

h

h1

ERM = “best on data”

Familyof functions F(e.g. halfspaces)

batch learning vs1
Batch learning vs.

(x1,y1)…(xn,yn) 2X£ {–,+}

X

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

x2

x1

+

+

+

+

+

h2

+

+

+

+

+

Online learning

arbitrary

dist.  over X£ {–,+}

X

h

ERM = “best on data”

Familyof functions F(e.g. halfspaces)

batch learning vs2
Batch learning vs.

(x1,y1)…(xn,yn) 2X£ {–,+}

X

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

x3

x2

x1

+

+

+

+

+

+

+

+

+

+

Online learning

arbitrary

dist.  over X£ {–,+}

X

h

h3

ERM = “best on data”

Goal: err(alg) ·

minf2F err(f) +

Familyof functions F(e.g. halfspaces)

batch learning vs3
Batch learning vs.

Analogous definition:

(x1,y1)…(xn,yn) 2X£ {–,+}

X

+

{x1,x2,…,xn}

+

+

+

Alg.H

+

+

+

hi2F

+

(x1,y1),…,(xi-1,yi-1)

+

+

+

+

+

H learns F if,8(x1,y1),…,(xn,yn):

E[err(H)]·minf2Ferr(f)+n-c

and H runs in time poly(n)

+

+

+

+

+

x2

x1

x3

+

+

+

+

+

h2

+

+

h3

+

+

+

Transductive

Online learning

[Ben-David,Kushilevitz,Mansour95]

arbitrary

dist.  over X£ {–,+}

.

.

X

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

equivalent

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

h

“proper” learning

outputh(i)2F

h1

.

.

.

.

.

.

.

.

.

ERM = “best on data”

Goal: err(alg) ·

minf2F err(f) +

Familyof functions F(e.g. halfspaces)

our results
Our results

H

4

HERM = Hallucination + ERM

Theorem 1. In online trans. setting,

HERM requires one ERM computation per sample.

Theorem 2. These are equivalent for proper learning:

  • F is agnostically learnable
  • ERM agnostically learns F

(ERM can be done efficiently and VC(F) is finite)

  • F is online transductively learnable
  • HERM online transductively learns F
online erm algorithm
Online ERM algorithm

(sucks)

Choose hi2F with minimal errors on (x1,y1),…,(xi-1,yi-1)

hi = argminf2F|{ j<i| f(xj)yj}|

x1 = (0,0) y1 = –

x2 = (0,0) y2 = +

x3 = (0,0) y3 = –

x4 = (0,0) y4 = +

F= {–,+}X = { (0,0) }

h1(x) = +

h2(x) = –

h3(x) = +

h4(x) = –

online erm algorithm1
Online ERM algorithm

Choose hi2F with minimal errors on (x1,y1),…,(xi-1,yi-1)

hi = argminf2F|{ j<i| f(xj)yj}|

err(ERM) · minf2Ferr(f) + Pi2{1,…,n}[hihi+1]

Online “stability” lemma:

[KVempala01]

Proof by induction on n = #examples

easy!

online h erm algorithm
Online HERM algorithm

random from {1,2,…,R}

Prxi,rxi[hi hi+1] · R-1

Stability: 8i,

-

+

James Hannan

(xi,+),(xi,+),…,(xi,+)

rxi

+

Inputs: ={x1,x2,…,xn}, int R

For each x2, hallucinate rx copies of (x,+) & rx copies of (x,–)

Choose hi2F that minimizes errors onhallucinated data + (x1,y1),…,(xi-1,yi-1)

+

-

, (xi,+)

online h erm algorithm1
Online HERM algorithm

random from {1,2,…,R}

Prxi,rxi[hi hi+1] · R-1

Stability: 8i,

-

+

H

4

Theorem 1

For R=n¼:

It requires one ERM computation per example.

Inputs: ={x1,x2,…,xn}, int R

For each x2, hallucinate rx copies of (x,+) & rx copies of (x,–)

Choose hi2F that minimizes errors onhallucinated data + (x1,y1),…,(xi-1,yi-1)

+

-

Online “stability” lemma

Hallucination cost

being more adaptive shifting bounds
Being more adaptive(shifting bounds)

(x1,y1),…,(xi,yi),…(xi+W,yi+W),…(xn,yn)

window

4

related work
Related work
  • Inequivalence of batch and online learning in noiseless setting
    • ERM black box is noiseless
    • For computational reasons!
  • Inefficient alg. for online trans. learning:
    • List all · (n+1)VC(F) labelings (Sauer’s lemma)
    • Run weighted majority

[Blum90,Balcan06]

[Ben-David,Kushilevitz,Mansour95]

[Littlestone,Warmuth92]

conclusions
Conclusions
  • Alg. for removing iid assumption, efficiently, using unlabeled data
  • Interesting way to use unlabeled data online, reminiscent of bootstrap/bagging
  • Adaptive version: can do well on every window
  • Find “right” algorithm/analysis