1 / 31

Hardness of Learning Halfspaces with Noise

Hardness of Learning Halfspaces with Noise. Prasad Raghavendra Advisor Venkatesan Guruswami. Spam Problem. 1. 2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6. 2. 1. 3. 3. 6 > 3 Output SPAM. 3. 0. 1. 1. 7. PERCEPTRON. 0. Halfspace Learning Problem. Input: Training Samples

yul
Download Presentation

Hardness of Learning Halfspaces with Noise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

  2. Spam Problem 1 2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6 2 1 3 3 6 > 3 Output SPAM 3 0 1 1 7 PERCEPTRON 0

  3. Halfspace Learning Problem Input: Training Samples Vectors : W1,W2,…Wm {-1,1}n Labels : l1, l2,…lm {-1,1} SPAM + + + + + + Y - + + - - Output: Separating Halfspace:(A, θ) A ∙ Wi < θif li =-1 A ∙ Wi ≥ θif li =1 θ-Threshold - - - - NOT SPAM - X

  4. Perspective • Perceptron classifiers are the simplest neural networks – widely used for classification. • Perceptron learning algorithms can learn if the data is perfectly separated. SPAM + + + - + + + - - - + + + - - - - + - - + - NOT SPAM - + X

  5. Inseparability • Who said Halfspaces can classify SPAM vs NOT SPAM? Data is inherently inseparable • Agnostic Learning • Even if data is separable, what about Noise? inherent in many forms of data PAC learning

  6. In Presence of Noise Classifies correctly 16 of the 20 examples : Agreement = 0.8 or 80% Agreement : fraction of the examples classified correctly + + + + + + - ‘Find the hyperplane that maximizes the agreement with training examples’ Halfspace Maximum Agreement (HSMA) Problem Y - + + + - - - - - + - - - X

  7. Related Work : Positive Results Each label flipped with probability less than 1/2 Random Classification Noise • [Blum-Freize-Kannan-Vempala 96]: a PAC learning algorithm that outputs a decision list of halfspaces • [Cohen 97] : a proper learning algorithm(outputs a halfspace) for learning halfspaces Distribution of examples • [Kalai-Klivans-Mansour-Servedio 05] : an algorithm that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution.

  8. Related Work : Negative Results • [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418] • [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85 • [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98]NP-hard to minimize disagreements within a factor of 2O(log n)

  9. Open Problem Given that 99.9% of the examples are correct : No algorithm known that finds a halfspace with agreement of 51% No hardness result ruled out getting an agreement of 99% • Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] • Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials

  10. Our Result For any ε,δ > 0 , given a set of training examples, it is NP-hard to distinguish between following two cases: • There is a halfspace with agreement 1- ε • No halfspace has agreement greater than ½ + δ Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!

  11. Remarks • [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result. • Our Hardness result holds even for boolean examples {-1,1}n(their result holds for Rn) • [Feldman et al.]’shardness result gives stronger hardness in the sub-constant regime • We also show: Given a set of linear equations over integers that is 1-εsatisfiable it isNP-hard to find an assignment that satisfies more than δ fraction of the equations

  12. Linear Inequalities Unknowns A = (a1,a2,a3,a4) θ Let halfspace be a1x1 + a2x2 +… +an xn≥ θ Suppose W1 = (-1, 1, -1, 1) l1 = 1 Constraint : a1(-1) + a2(1) + a3(-1)+ a4(1)≥ θ a1 + a2 + a3 + a4 ≥ θ a1 + a2 + a3 - a4 < θ a1 + a2 - a3 + a4 <θ a1 + a2 - a3 + a4 ≥θ a1 - a2 + a3 - a4 ≥θ a1 - a2 + a3 + a4 <θ a1 + a2 - a3 - a4 <θ Solving a system of linear inequalities Learning a Halfspace

  13. Label Cover Problem U, V : set of vertices E : set of edges {1,2… R} : set of labels πe: constraint on edge e An assignment A satisfies an edge e = (u,v) E if πe (A(u)) = A(v) 1 2 3 . . R 1 6 1 2 3 . . R 3 5 3 u 3 πe 2 2 v π e (3)=2 1 5 4 7 U V Find an assignment A that satisfies maximum number of edges

  14. Hardness of Label Cover [Raz 98] There exists γ > 0 such that Given a label cover instance Г =(U,V,E,R,π), it is NP-hard to distinguish between : • Г is completely satisfiable • No assignment satisfies more than 1/Rγ fraction of the edges.

  15. Aim Variables : a1,a2,a3,a4,θ a1 + a2 + a3 + a4 ≥ θ a1 + a2 + a3 - a4 < θ a1 + a2 - a3 + a4 <θ a1 + a2 - a3 + a4 ≥θ a1 - a2 + a3 - a4 ≥θ a1 - a2 + a3 + a4 <θ a1 + a2 - a3 - a4 <θ SATISFIABLE 1/Rγ SATISFIABLE U V Homogenous inequalities with +1, -1 coefficients

  16. Variables For each vertex u, R variables : u1,u2,…,uR 1 2 3 . .R If u is assigned label k then uk = 1 and uj = 0 for all j ≠k U V

  17. Equation Tuples EQUATION TUPLE For all u u1 + u2 +.. uR = 1 For all u,v u1 + u2 +.. uR - (v1 + v2 +.. vR) = 0 All vertices are assigned exactly one label 1 2 3 . . R For all constraints πe all 1 ≤ k ≤ R ∑ui = vksummation over all i,πe(i) = k 1 2 3 . . R u1 – v1 = 0 u2 + u3 – v2 = 0 u πe v Pick randomly t variables ui ui = 0 Most of the variables are zero OVER ALL RANDOM CHOICES

  18. Equation Tuples SATISFIABLE 1/Rγ SATISFIABLE There is an assignment that satisfies most of the equation tuples Scaling Factor : u1 + u2 +.. uR Suppose u2 + u3 – v2 = 0 is an equation |u2 + u3 – v2| > ε (u1 + u2 +.. uR)

  19. Next Step Each variable appears exactly once in a tuple, with coefficient+1, -1 u1 – v1 = 0 u2 + u3 – v2 = 0 u1 + u2 + u3 – v1 –v2– v3 = 0 u1 = 0 u3 + v1 – v2 = 0 • Introduce Several copies of the variables • Add consistency checks between the different copies of the same variable Most tuples have C equations that are not even approximately satisfied One Unsatisfied equation

  20. Recap Each variable appears exactly once in a tuple, with coeffcient+1, -1 SATISFIABLE 1/Rγ SATISFIABLE • Using linear inequalities distinguish between a tuple that is • Completely Satisfied • Atleast C of its equations are not even approximately satisfied Most tuples have C equations that are not even approximately satisfied Most tuples are completely satisfied

  21. Observation B > 0 |A| < B A – B < 0 A + B ≥ 0 Pick one of the equation tuples at random Scaling Factor : u1 + u2 +.. uR X 1+ X 1+ X -1+ X 1+ X -1+ u1 – v1 = 0 u4 + u5 – v2 = 0 u6 + u2 + u7 – v4 –v5– v6 = 0 u3 = 0 u8 + v3 – v7 = 0 = u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR≥ 0

  22. Good Case With high probability over the choice of tuples u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2– v3 u1 u3 + v1 – v2 = 0 = 0 = 0 = 0 = 0 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 = 0 The assignment also satisfies, u1 + u2 +.. uR = 1 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR≥ 0 BOTH INEQUALITIES SATISFIED

  23. Bad Case With high probability over choice of equaton tuple, > ε (u1 + u2 +.. uR ) > ε (u1 + u2 +.. uR ) > ε (u1 + u2 +.. uR ) u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2– v3 u1 u3 + v1 – v2 For large enough C ,With high probability over choice of +1,-1 combination, | u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 | > (u1 + u2 +.. uR ) u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0 u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR≥ 0 ATMOST ONE OF INEQUALITIES SATISFIED

  24. Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}nsuch that For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(>ε), at least 1- δfractionof the vectors u S satisfy |u∙v| > 1 Construction using 4-wise independent family and random grouping of coordinates

  25. Construction -1 1 -1 1 -1 1 1 V1 V2 V3 V4 V5 V6 V7 > ε > ε > ε > ε ∙ -V1 +V2 -V3 +V4 -V5 +V6 +V7 > 1 = Four-wise independent family : some constant probability All 2n combinations : probability close to 1

  26. 4-wise independent set Construction V1 V2 V3 V4 V5 V6 V7 .. .. V89 V99 V100 V101 V102 V103 ε -1 .. 1 V2 .. V5 ε ε = S1 1 -1 +1 1 All 2n combinations ε V9 … V89 V6 1 … -1 1 ε =S2 ε ε ε = L 1 V1 …. V8 1 …. 1 ε =S5 ε ε ε All 2n combinations ε V100 … V101 V103 -1 … 1 -1 By independence of grouping By Chernoff Bounds =S10 ε ε

  27. Conclusion • Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms. • [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.

  28. THANK YOU

  29. Details • All possible {-1,1} combinations is an exponentially large set. • No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1} Construction using 4-wise independent family and random grouping of coordinates Use different copies of the variables for different equations, and careful choice of consistency checks

  30. Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(>ε), atmostδfractionof the vectors u S satisfy |u∙v| < 1 Construction using 4-wise independent family and random grouping of coordinates

  31. Equation Tuple ε-Satisfaction An assignment A is said to ε-satisfy an equation E tuple if it satisfies all the equations in the tuple u1 – v1 = 0 u2 + u3 – v2 = 0 u1 + u2 + u3 – v1 –v2– v3 = 0 u1 = 0 u3 + v1 – v2 = 0 u2 + u3 – v2 < ε (u1 + u2 + u3)

More Related