- 301 Views
- Updated On :
- Presentation posted in: General

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

On Agnostic Boosting and Parity Learning

Adam Tauman Kalai, Georgia Tech.

Yishay Mansour, Google and Tel-Aviv

Elad Verbin, Tsinghua

- Agnostic Learning = learning with adversarial noise
- Boosting = turn weak learner into strong learner
- Parities = parities of subsets of the bits
- f:{0,1}n→{0,1}. f(x)=x1x3x7

- Agnostic Boosting
- Turning a weak agnostic learner to a strong agnostic learner

- 2O(n/logn)-time algorithm for agnostically learning parities over any distribution

Outline

Agnostic

Booster

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

Runs weak learner as black box

It’s, like, a really hard model!!!

* up to well-studied open problems (i.e. we know where we’re stuck)

Due to hardness, or lack of tools???

Agnostic boosting: strong tool, makes

it easier to design algorithms.

- More relevant in practice
- Impossibility results might be useful for building cryptosystems

f:{0,1}n→{0,1} from class F.

alg gets samples <x,f(x)> where

x is drawn from distribution D.

- No noise
- Random noise
- Adversarial (≈agnostic) noise

f

Learning algorithm.

Should approximate f

up to error

Learning algorithm.

Should approximate f

up to error

f

%noise

g

Learning algorithm.

Should approximate g

up to error +

f

allowed to corrupt -fraction

F

f

opt

opt +

g

PROPER LEARNING

Parameters: F, metric

Input: oracle for g

Goal: return some element of blue ball

definition

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ -

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) · opt +

D

weak

learner

w.h.p.

h

g

errD(g,h)· ½ - 100

opt · ½ -

Runs weak learner poly(1/100)times

Agnostic

Booster

w.h.p.

h’

Samples from g

errD(g,h’) ·opt + +

D

(,)-weak

learner

w.h.p.

h

g

errD(g,h)·½ -

opt ·½ -

Runs weak learner poly(1/, 1/)times

Agnostic

Booster

Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis

Strong Learner. Produces almost-optimal hypothesis

“Approximation

Booster”

poly-time MAX-3-SAT algorithm that

when opt=7/8+ε produces solution with value 7/8+ε100

algorithm for MAX-3-SAT

produces solution with value opt +

running time poly(n,1/)

0

½

1

No hardness gap close to ½

booster

no gap anywhere

(additive PTAS)

- New Analysis for Mansour-McAllester booster.
- uses branching programs; nodes are weak hypotheses

- Previous Agnostic Boosting:
- Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently.
- Their result cannot be used for our application

x

h1

h1(x)=0

h1(x)=1

1

0

x

different distribution

different distribution

h1

h1

h1(x)=0

h1(x)=0

h1(x)=1

h1(x)=1

h2’

0

h2

1

h2‘(x)=0

h2‘(x)=1

h2(x)=0

h2(x)=1

1

0

1

0

choose the “better” option

x

h1

h1(x)=0

h1(x)=1

h2

1

h2(x)=0

h2(x)=1

h3

0

H3(x)=0

h3(x)=1

1

0

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

h3(x)=1

…

1

0

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h4(x)=1

h2(x)=0

h2(x)=1

1

0

h3

0

H3(x)=0

Merge if “similar”

h3(x)=1

1

0

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

1

0

x

h1

h1(x)=0

h1(x)=1

h4

h2

H4(x)=0

h2(x)=0

h2(x)=1

h4(x)=1

0

h3

0

h3(x)=1

H3(x)=0

h5

…

0

0

1

x

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

h1

0

1

Agnostically learning parities

* non-proper learner. hypothesis is circuit with 2O(n/logn) gates

Feldman et al give black-box reduction to random-noise case. We give direct result

- Theorem:ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)

- Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn)
- e.g. DNF, any others?

- A weak parity learner that runs in 2O(n0.32) time would beat the best algorithm known for learning DNF
- Good evidence that parity with noise is hard
efficient cryptosystems

[Hopper-Blum, Blum-Furst-etal, and many others]

- Good evidence that parity with noise is hard

?

Main Idea:

1. Take Learner which resists random noise (BKW)

2. Add Randomness to its behavior, until you get a Weak Agnostic learner.

Idea of weak agnostic parity learner

“Between two evils, I pick the one I haven’t tried before”– Mae West

“Between two evils, I pick uniformly at random”

– CS folklore

Problem: It is difficult but perhaps possible to design agnostic learning algorithms.

Proposed Solution: Agnostic Boosting.

Contributions:

- Right(er) definition for weak agnostic learner
- Agnostic boosting
- Learning Parity with noise in hardest noise model
- Entertaining STOC ’08 participants

- Find other applications for Agnostic Boosting
- Improve PwN algorithms.
- Get proper learner for parity with noise
- Reduce PwN with agnostic noise to PwN with random noise

- Get evidence that PwN is hard
- Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!

May the parity be with you!

The end.

Sketch of weak parity learner

- Sample labeled points from distribution, sample unlabeled x, let’s guess f(x)

Bucket according to last 2n/logn bits

+

+

+

to next round

LAST ROUND:

- √n vectors with sum=0. gives guess for f(x)

+

+

+

=0

=0

=0

LAST ROUND:

- √n vectors with sum=0. gives guess for f(x)
- by symmetry, prob. of mistake = %mistakes
- Claim: %mistakes (Cauchy-Schwartz)

+

+

+

=0

=0

=0

Intuition behind two main parts

decrease weight

increase weight

1

- Run, reweight, run, reweight, … . Take majority of hypotheses.
- Algorithmic & Efficient Yao-von Neumann Minimax Principle

decrease weight

1

increase weight

2

0