Slide1 l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 39

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua PowerPoint PPT Presentation


  • 301 Views
  • Updated On :
  • Presentation posted in: General

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits

Related searches for Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Download Presentation

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


On Agnostic Boosting and Parity Learning

Adam Tauman Kalai, Georgia Tech.

Yishay Mansour, Google and Tel-Aviv

Elad Verbin, Tsinghua


Defs

  • Agnostic Learning = learning with adversarial noise

  • Boosting = turn weak learner into strong learner

  • Parities = parities of subsets of the bits

    • f:{0,1}n→{0,1}. f(x)=x1x3x7

  • Agnostic Boosting

    • Turning a weak agnostic learner to a strong agnostic learner

  • 2O(n/logn)-time algorithm for agnostically learning parities over any distribution

Outline


Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

Runs weak learner as black box


Learning with Noise

It’s, like, a really hard model!!!

* up to well-studied open problems (i.e. we know where we’re stuck)


Agnostic Learning: some known results


Agnostic Learning: some known results

Due to hardness, or lack of tools???

Agnostic boosting: strong tool, makes

it easier to design algorithms.


Why care about agnostic learning?

  • More relevant in practice

  • Impossibility results might be useful for building cryptosystems


Noisy learning

f:{0,1}n→{0,1} from class F.

alg gets samples <x,f(x)> where

x is drawn from distribution D.

  • No noise

  • Random noise

  • Adversarial (≈agnostic) noise

f

Learning algorithm.

Should approximate f

up to error 

Learning algorithm.

Should approximate f

up to error 

f

%noise

g

Learning algorithm.

Should approximate g

up to error  + 

f

allowed to corrupt -fraction


Agnostic learning (geometric view)

F

f

opt

opt + 

g

PROPER LEARNING

Parameters: F, metric

Input: oracle for g

Goal: return some element of blue ball


Agnostic boosting

definition

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 


Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) · opt + 

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ - 

Runs weak learner poly(1/100)times


Agnostic boosting

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) ·opt +  + 

D

(,)-weak

learner

w.h.p.

h

g

errD(g,h)·½ - 

opt ·½ - 

Runs weak learner poly(1/, 1/)times


Agnostic

Booster

Agnostic boosting

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis


“Approximation

Booster”

Analogy

poly-time MAX-3-SAT algorithm that

when opt=7/8+ε produces solution with value 7/8+ε100

algorithm for MAX-3-SAT

produces solution with value opt + 

running time poly(n,1/)


Gap

0

½

1

No hardness gap close to ½

booster

no gap anywhere

(additive PTAS)


Agnostic boosting

  • New Analysis for Mansour-McAllester booster.

    • uses branching programs; nodes are weak hypotheses

  • Previous Agnostic Boosting:

    • Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently.

    • Their result cannot be used for our application


Booster

x

h1

h1(x)=0

h1(x)=1

1

0


Booster: Split step

x

different distribution

different distribution

h1

h1

h1(x)=0

h1(x)=0

h1(x)=1

h1(x)=1

h2’

0

h2

1

h2‘(x)=0

h2‘(x)=1

h2(x)=0

h2(x)=1

1

0

1

0

choose the “better” option


Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h2

1

h2(x)=0

h2(x)=1

h3

0

H3(x)=0

h3(x)=1

1

0


Booster: Split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

h3(x)=1

1

0


Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

Merge if “similar”

h3(x)=1

1

0


Booster: Merge step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

1

0


Booster: Another split step

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

h5

0

0

1


Booster: final result

x

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

0

1


Agnostically learning parities


Application: Parity with Noise

* non-proper learner. hypothesis is circuit with 2O(n/logn) gates

Feldman et al give black-box reduction to random-noise case. We give direct result

  • Theorem:ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)


Corollary: Learners for many classes (without noise)

  • Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)

    • e.g. DNF, any others?

  • A weak parity learner that runs in 2O(n0.32) time would beat the best algorithm known for learning DNF

    • Good evidence that parity with noise is hard

      efficient cryptosystems

      [Hopper-Blum, Blum-Furst-etal, and many others]

?


Main Idea:

1. Take Learner which resists random noise (BKW)

2. Add Randomness to its behavior, until you get a Weak Agnostic learner.

Idea of weak agnostic parity learner

“Between two evils, I pick the one I haven’t tried before”– Mae West

“Between two evils, I pick uniformly at random”

– CS folklore


Summary

Problem: It is difficult but perhaps possible to design agnostic learning algorithms.

Proposed Solution: Agnostic Boosting.

Contributions:

  • Right(er) definition for weak agnostic learner

  • Agnostic boosting

  • Learning Parity with noise in hardest noise model

  • Entertaining STOC ’08 participants


Open Problems

  • Find other applications for Agnostic Boosting

  • Improve PwN algorithms.

    • Get proper learner for parity with noise

    • Reduce PwN with agnostic noise to PwN with random noise

  • Get evidence that PwN is hard

    • Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!


May the parity be with you!

The end.


Sketch of weak parity learner


Weak parity learner

  • Sample labeled points from distribution, sample unlabeled x, let’s guess f(x)

Bucket according to last 2n/logn bits

+

+

+

to next round


Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

+

+

+

=0

=0

=0


Weak parity learner

LAST ROUND:

  • √n vectors with sum=0. gives guess for f(x)

  • by symmetry, prob. of mistake = %mistakes

  • Claim: %mistakes (Cauchy-Schwartz)

+

+

+

=0

=0

=0


Intuition behind two main parts


Intuition behind Boosting


Intuition behind Boosting

decrease weight

increase weight


Intuition behind Boosting

1

  • Run, reweight, run, reweight, … . Take majority of hypotheses.

  • Algorithmic & Efficient Yao-von Neumann Minimax Principle

decrease weight

1

increase weight

2

0


  • Login