On Agnostic Boosting and Parity Learning
Download
1 / 39

defs - PowerPoint PPT Presentation


  • 325 Views
  • Updated On :

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'defs' - Jimmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

On Agnostic Boosting and Parity Learning

Adam Tauman Kalai, Georgia Tech.

Yishay Mansour, Google and Tel-Aviv

Elad Verbin, Tsinghua


Slide2 l.jpg
Defs

  • Agnostic Learning = learning with adversarial noise

  • Boosting = turn weak learner into strong learner

  • Parities = parities of subsets of the bits

    • f:{0,1}n→{0,1}. f(x)=x1x3x7

  • Agnostic Boosting

    • Turning a weak agnostic learner to a strong agnostic learner

  • 2O(n/logn)-time algorithm for agnostically learning parities over any distribution

Outline


Agnostic boosting l.jpg

Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

Runs weak learner as black box


Learning with noise l.jpg
Learning with Noise

It’s, like, a really hard model!!!

* up to well-studied open problems (i.e. we know where we’re stuck)



Agnostic learning some known results6 l.jpg
Agnostic Learning: some known results

Due to hardness, or lack of tools???

Agnostic boosting: strong tool, makes

it easier to design algorithms.


Why care about agnostic learning l.jpg
Why care about agnostic learning?

  • More relevant in practice

  • Impossibility results might be useful for building cryptosystems


Noisy learning l.jpg
Noisy learning

f:{0,1}n→{0,1} from class F.

alg gets samples <x,f(x)> where

x is drawn from distribution D.

  • No noise

  • Random noise

  • Adversarial (≈agnostic) noise

f

Learning algorithm.

Should approximate f

up to error 

Learning algorithm.

Should approximate f

up to error 

f

%noise

g

Learning algorithm.

Should approximate g

up to error  + 

f

allowed to corrupt -fraction


Agnostic learning geometric view l.jpg
Agnostic learning (geometric view)

F

f

opt

opt + 

g

PROPER LEARNING

Parameters: F, metric

Input: oracle for g

Goal: return some element of blue ball


Agnostic boosting10 l.jpg
Agnostic boosting

definition

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 


Agnostic boosting11 l.jpg
Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) · opt + 

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 

Runs weak learner poly(1/100)times


Agnostic boosting12 l.jpg
Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) ·opt +  + 

D

(,)-weak

learner

w.h.p.

h

g

errD(g,h)·½ - 

opt ·½ - 

Runs weak learner poly(1/, 1/)times


Agnostic boosting13 l.jpg

Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis


Analogy l.jpg

“Approximation

Booster”

Analogy

poly-time MAX-3-SAT algorithm that

when opt=7/8+ε produces solution with value 7/8+ε100

algorithm for MAX-3-SAT

produces solution with value opt + 

running time poly(n,1/)


Slide15 l.jpg
Gap

0

½

1

No hardness gap close to ½

booster

no gap anywhere

(additive PTAS)


Agnostic boosting16 l.jpg
Agnostic boosting

  • New Analysis for Mansour-McAllester booster.

    • uses branching programs; nodes are weak hypotheses

  • Previous Agnostic Boosting:

    • Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently.

    • Their result cannot be used for our application


Booster l.jpg
Booster

x

h1

h1(x)=0

h1(x)=1

1

0


Booster split step l.jpg
Booster: Split step

x

different distribution

different distribution

h1

h1

h1(x)=0

h1(x)=0

h1(x)=1

h1(x)=1

h2’

0

h2

1

h2‘(x)=0

h2‘(x)=1

h2(x)=0

h2(x)=1

1

0

1

0

choose the “better” option


Booster split step19 l.jpg
Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h2

1

h2(x)=0

h2(x)=1

h3

0

H3(x)=0

h3(x)=1

1

0


Booster split step20 l.jpg
Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

h3(x)=1

1

0


Booster merge step l.jpg
Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

Merge if “similar”

h3(x)=1

1

0


Booster merge step22 l.jpg
Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

1

0


Booster another split step l.jpg
Booster: Another split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

h5

0

0

1


Booster final result l.jpg
Booster: final result

x

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

0

1



Application parity with noise l.jpg
Application: Parity with Noise

* non-proper learner. hypothesis is circuit with 2O(n/logn) gates

Feldman et al give black-box reduction to random-noise case. We give direct result

  • Theorem:ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)


Corollary learners for many classes without noise l.jpg
Corollary: Learners for many classes (without noise)

  • Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)

    • e.g. DNF, any others?

  • A weak parity learner that runs in 2O(n0.32) time would beat the best algorithm known for learning DNF

    • Good evidence that parity with noise is hard

      efficient cryptosystems

      [Hopper-Blum, Blum-Furst-etal, and many others]

?


Idea of weak agnostic parity learner l.jpg

Main Idea:

1. Take Learner which resists random noise (BKW)

2. Add Randomness to its behavior, until you get a Weak Agnostic learner.

Idea of weak agnostic parity learner

“Between two evils, I pick the one I haven’t tried before”– Mae West

“Between two evils, I pick uniformly at random”

– CS folklore


Summary l.jpg
Summary

Problem: It is difficult but perhaps possible to design agnostic learning algorithms.

Proposed Solution: Agnostic Boosting.

Contributions:

  • Right(er) definition for weak agnostic learner

  • Agnostic boosting

  • Learning Parity with noise in hardest noise model

  • Entertaining STOC ’08 participants


Open problems l.jpg
Open Problems

  • Find other applications for Agnostic Boosting

  • Improve PwN algorithms.

    • Get proper learner for parity with noise

    • Reduce PwN with agnostic noise to PwN with random noise

  • Get evidence that PwN is hard

    • Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!


May the parity be with you l.jpg

May the parity be with you!

The end.



Weak parity learner l.jpg
Weak parity learner

  • Sample labeled points from distribution, sample unlabeled x, let’s guess f(x)

Bucket according to last 2n/logn bits

+

+

+

to next round


Weak parity learner34 l.jpg
Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

+

+

+

=0

=0

=0


Weak parity learner35 l.jpg
Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

  • by symmetry, prob. of mistake = %mistakes

  • Claim: %mistakes (Cauchy-Schwartz)

+

+

+

=0

=0

=0




Intuition behind boosting38 l.jpg
Intuition behind Boosting

decrease weight

increase weight


Intuition behind boosting39 l.jpg
Intuition behind Boosting

1

  • Run, reweight, run, reweight, … . Take majority of hypotheses.

  • Algorithmic & Efficient Yao-von Neumann Minimax Principle

decrease weight

1

increase weight

2

0


ad