Learning of Conjunctions with Agnostic Noise: The Computational Challenge

Introduction

Conjunctions (Monomials) The Spam Problem “10 Millon= yes” and “Lottery=yes” and “Pharmacy=yes”

Decision Lists The Spam Problem If “10 Millon= NO” thenNot SPAM Else If “Lottery = No” thenNot Spam Else If “Pharmacy = No” thenNot Spam ElseSPAM

Halfspaces The Spam Problem “Million= YES” + 2 “Lottery=YES”+ “Pharmacy = YES” ≥ 4

Relationship Halfspaces Decision List Conjunctions

PAC learning Model Unknown distribution D over Rn, examples labeled by an unknown function f. f h - + - + - After receiving examples, algorithm does its computation and outputs hypothesis h. - - - + + - - + + + - + + Accuracy of hypothesis is

Learning Conjunctions from random examples is easy! Unknown distribution D over{0,1}n examples labeled by an unknown conjunctions. - + Since conjunctions is a special halfspaces, we can use poly-time linear programming to find a halfspace hypothesis consistent with all examples: - + - - - - + + - - + + + - + + Well-known theory (VC dimension)  for any D random sample of many examples yields -accurate hypothesis w.h.p.

perfectly labeled Learning Conjunctions from random examples ^ …but not very realistic… is easy! Real-world data probably doesn’t come with guarantee that examples are labeled perfectly according to a conjunction. Linear programming is brittle: noisy examples can easily result in no consistent hypothesis. Motivates study of noisy variants of PAC learning for conjunctions. - + + - - + - - + - + + - - - + + + - + - +

This Talk: Learning Conjunctions with Agnostic Noise Unknown distribution D over {0,1}n examples labeled by an unknown conjunction function f . • All the random examples given to learner: • 1- εfraction of the example is perfectly labeled, i.e.x~D, y = f(x). • ε fraction of the example mislabeled. Goal: To find a good hypothesis that has good accuracy (close to 1- ε? Or just better than 50%?)

Related Work (Positive) • No Noise: [Val84, Lit88, Hau88]: PAC Learnable • Random Noise: [Kea98]: PAC Learnable under random noise model.

Related Work (Negative) • For any ε,δ > 0 , NP-hard to tell whether • Some conjunction consistent with 1- εfraction of the data, • No conjunction is ½ + δconsistent with the data. [FGKP06] It is NP-hard to find a 51%-accuracy conjunction even if knowing some conjunction is consistent with 99% of the data.

Proper v.s. Non-Proper learning • Proper: Given f is in function class C (e.g. conjunctions), learner output a function in class C. • Non-Proper: Given f is in class C (e.g. conjunctions), learner can output function in the class D (e.g. halfspaces).

Weakness of Previous Result • We might still be able to learn conjunctions by outputinglarger class of functions (say by linear programming?). • E.g. [Lit88] use the winnow algorithm which output halfspacefunction.

Other Related Work • For any ε,δ > 0 , NP-hard to tell whether • Some halfspace consistent with 1- εfraction of the data, • No halfspace is ½ + δconsistent with the data. [FGKP, GR]. It is NP-hard to find a 51%-accuracy halfspaces even if knowing some halfspaces is consistent with 99% of the data.

Ideally, we want to show: • For any ε,δ > 0 , NP-hard to tell whether • Some conjunction consistent with 1- εfraction of the data, • No function in any hypothesis class is ½ + δconsistent with the data.

Negative Negative Result • [ABX08]: Showing NP-hardness using black-box reductions for unrestricted-class ofimproper learning is hard. • It will otherwise break some long-standing cryptographic assumptions: (transformation from any average-case hard problem in NP to a one-way function)

Our Results

Main Result • For any ε,δ > 0 , NP-hard to tell whether • Some conjunction consistent with 1- εfraction of the data, • No halfspaces is ½ + δconsistent with the data. It is NP-hard to find a 51%-accurate halfspaceseven if knowing some conjunction is consistent with 99% of the data.

Why halfspaces? In practice, halfspace are at the heart of many learning algorithms: • Perceptron • Winnow • SVM • Logistic Regression • Linear Discriminant Analysis Computational Learning Theory We can not agnostically learn conjunctions using any of the above mentioned algorithm!

Corollary Weakly Agnostic learning Conjunctions/Decision Lists/Halfspaces by Halfspaces is hard! Halfspaces Decision List Conjunctions

Proof

Proof: Reduction from Label Cover

Dictator Test • “Dictator” (halfspaces depending on very few variables • e.g. f(x) = sgn(x1)) • “Majority”(no variables has too much weight, • e.g. f(x) = sgn(x1+x2+x3+…+xn).

Dictator Testing for halfspaces x Tester Halfspace f : {0,1}n {0,1} chooses: x2 {0,1}n, b 2 {0,1} from some distribution . f(x) Accept if f(x) = b Completeness ¸ c$ all (Monomials) f(x) = xi accepted w. prob. ¸ c Soundness · s$ “Majority like function” accepted “w. prob. · s With such a test, we can show NP-hard to tell i) some monomial satisfies c fraction of the data; ii) no halfspaces satisfies more than s fraction of the data.

How to generate (x,b) • Generate z by setting each zi independently to be random bits. • Generate y by resetting each zi to be 0 with probability 0.99. • Generating a random bit b and setting xi to be yi + b/2n. • Output (x,b) (Accept if f(x) = sgn(b)).

How to generate (x,b) z = 0 0 0 0 y= random bit b x = b/2n b/2n b/2n b/2n

Analysis of the Test • f(x)= xi • Then • Pr(f(x) =xi=b) > Pr(yi = 0) =0.99 • f(x) = sgn ( ) • Then • Pr( f(x) = b) = Pr(sgn (N(0, 0.1) + b /2n) =b)< 0.51

Conclusion • We prove that even weakly agnostic learning Conjunctions by Halfspace is NP-hard. • To propose a efficient halfspace learning algorithm for conjunctions/decision lists/halfspaces, we need either modeling the distribution of example or the noise.

Future Work • Prove: For any ε,δ > 0 , given a set of training examples such that there is a conjunction consistent with 1- εfraction of the data, it is NP-hard to find a degree d polynomial threshold function that is ½ + δconsistent with the data. Why low degree ptf? Because such a hypothesis can agnostically learn conjunctions/halfspaces under uniform distribution.

Learning of Conjunctions with Agnostic Noise: The Computational Challenge

Learning of Conjunctions with Agnostic Noise: The Computational Challenge

Presentation Transcript

Agnostically learning halfspaces

Conjunctions

Conjunctions

Conjunctions

Pseudorandom Generators for Halfspaces

Conjunctions

Conjunctions

CONJUNCTIONS

CONJUNCTIONS

Conjunctions

CONJUNCTIONS

Learning intersections and thresholds of halfspaces

Conjunctions

Conjunctions

Conjunctions

Hardness of Learning Halfspaces with Noise

Learning, testing, and approximating halfspaces

Agnostic Atheism

Conjunctions

Conjunctions

Learning intersections and thresholds of halfspaces

Types of Conjunctions