Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua PowerPoint PPT Presentation


  • 279 Views
  • Updated On :
  • Presentation posted in: General

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits

Related searches for Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Download Presentation

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

On Agnostic Boosting and Parity Learning

Adam Tauman Kalai, Georgia Tech.

Yishay Mansour, Google and Tel-Aviv

Elad Verbin, Tsinghua


Slide2 l.jpg

Defs

  • Agnostic Learning = learning with adversarial noise

  • Boosting = turn weak learner into strong learner

  • Parities = parities of subsets of the bits

    • f:{0,1}n→{0,1}. f(x)=x1x3x7

  • Agnostic Boosting

    • Turning a weak agnostic learner to a strong agnostic learner

  • 2O(n/logn)-time algorithm for agnostically learning parities over any distribution

Outline


Agnostic boosting l.jpg

Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

Runs weak learner as black box


Learning with noise l.jpg

Learning with Noise

It’s, like, a really hard model!!!

* up to well-studied open problems (i.e. we know where we’re stuck)


Agnostic learning some known results l.jpg

Agnostic Learning: some known results


Agnostic learning some known results6 l.jpg

Agnostic Learning: some known results

Due to hardness, or lack of tools???

Agnostic boosting: strong tool, makes

it easier to design algorithms.


Why care about agnostic learning l.jpg

Why care about agnostic learning?

  • More relevant in practice

  • Impossibility results might be useful for building cryptosystems


Noisy learning l.jpg

Noisy learning

f:{0,1}n→{0,1} from class F.

alg gets samples <x,f(x)> where

x is drawn from distribution D.

  • No noise

  • Random noise

  • Adversarial (≈agnostic) noise

f

Learning algorithm.

Should approximate f

up to error 

Learning algorithm.

Should approximate f

up to error 

f

%noise

g

Learning algorithm.

Should approximate g

up to error  + 

f

allowed to corrupt -fraction


Agnostic learning geometric view l.jpg

Agnostic learning (geometric view)

F

f

opt

opt + 

g

PROPER LEARNING

Parameters: F, metric

Input: oracle for g

Goal: return some element of blue ball


Agnostic boosting10 l.jpg

Agnostic boosting

definition

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 


Agnostic boosting11 l.jpg

Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) · opt + 

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 

Runs weak learner poly(1/100)times


Agnostic boosting12 l.jpg

Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) ·opt +  + 

D

(,)-weak

learner

w.h.p.

h

g

errD(g,h)·½ - 

opt ·½ - 

Runs weak learner poly(1/, 1/)times


Agnostic boosting13 l.jpg

Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis


Analogy l.jpg

“Approximation

Booster”

Analogy

poly-time MAX-3-SAT algorithm that

when opt=7/8+ε produces solution with value 7/8+ε100

algorithm for MAX-3-SAT

produces solution with value opt + 

running time poly(n,1/)


Slide15 l.jpg

Gap

0

½

1

No hardness gap close to ½

booster

no gap anywhere

(additive PTAS)


Agnostic boosting16 l.jpg

Agnostic boosting

  • New Analysis for Mansour-McAllester booster.

    • uses branching programs; nodes are weak hypotheses

  • Previous Agnostic Boosting:

    • Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently.

    • Their result cannot be used for our application


Booster l.jpg

Booster

x

h1

h1(x)=0

h1(x)=1

1

0


Booster split step l.jpg

Booster: Split step

x

different distribution

different distribution

h1

h1

h1(x)=0

h1(x)=0

h1(x)=1

h1(x)=1

h2’

0

h2

1

h2‘(x)=0

h2‘(x)=1

h2(x)=0

h2(x)=1

1

0

1

0

choose the “better” option


Booster split step19 l.jpg

Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h2

1

h2(x)=0

h2(x)=1

h3

0

H3(x)=0

h3(x)=1

1

0


Booster split step20 l.jpg

Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

h3(x)=1

1

0


Booster merge step l.jpg

Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

Merge if “similar”

h3(x)=1

1

0


Booster merge step22 l.jpg

Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

1

0


Booster another split step l.jpg

Booster: Another split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

h5

0

0

1


Booster final result l.jpg

Booster: final result

x

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

0

1


Agnostically learning parities l.jpg

Agnostically learning parities


Application parity with noise l.jpg

Application: Parity with Noise

* non-proper learner. hypothesis is circuit with 2O(n/logn) gates

Feldman et al give black-box reduction to random-noise case. We give direct result

  • Theorem:ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)


Corollary learners for many classes without noise l.jpg

Corollary: Learners for many classes (without noise)

  • Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)

    • e.g. DNF, any others?

  • A weak parity learner that runs in 2O(n0.32) time would beat the best algorithm known for learning DNF

    • Good evidence that parity with noise is hard

      efficient cryptosystems

      [Hopper-Blum, Blum-Furst-etal, and many others]

?


Idea of weak agnostic parity learner l.jpg

Main Idea:

1. Take Learner which resists random noise (BKW)

2. Add Randomness to its behavior, until you get a Weak Agnostic learner.

Idea of weak agnostic parity learner

“Between two evils, I pick the one I haven’t tried before”– Mae West

“Between two evils, I pick uniformly at random”

– CS folklore


Summary l.jpg

Summary

Problem: It is difficult but perhaps possible to design agnostic learning algorithms.

Proposed Solution: Agnostic Boosting.

Contributions:

  • Right(er) definition for weak agnostic learner

  • Agnostic boosting

  • Learning Parity with noise in hardest noise model

  • Entertaining STOC ’08 participants


Open problems l.jpg

Open Problems

  • Find other applications for Agnostic Boosting

  • Improve PwN algorithms.

    • Get proper learner for parity with noise

    • Reduce PwN with agnostic noise to PwN with random noise

  • Get evidence that PwN is hard

    • Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!


May the parity be with you l.jpg

May the parity be with you!

The end.


Sketch of weak parity learner l.jpg

Sketch of weak parity learner


Weak parity learner l.jpg

Weak parity learner

  • Sample labeled points from distribution, sample unlabeled x, let’s guess f(x)

Bucket according to last 2n/logn bits

+

+

+

to next round


Weak parity learner34 l.jpg

Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

+

+

+

=0

=0

=0


Weak parity learner35 l.jpg

Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

  • by symmetry, prob. of mistake = %mistakes

  • Claim: %mistakes (Cauchy-Schwartz)

+

+

+

=0

=0

=0


Intuition behind two main parts l.jpg

Intuition behind two main parts


Intuition behind boosting l.jpg

Intuition behind Boosting


Intuition behind boosting38 l.jpg

Intuition behind Boosting

decrease weight

increase weight


Intuition behind boosting39 l.jpg

Intuition behind Boosting

1

  • Run, reweight, run, reweight, … . Take majority of hypotheses.

  • Algorithmic & Efficient Yao-von Neumann Minimax Principle

decrease weight

1

increase weight

2

0


  • Login