1 / 51

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables

This research aims to develop efficient algorithms for learning monomial functions with low sample complexity and computational efficiency. The study focuses on recovering monomials from highly correlated variables, considering both linear and sparse polynomial functions. The goal is to achieve attribute-efficient learning by optimizing sample efficiency and runtime efficiency.

lymane
Download Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Yahoo Research, Aug. 2019

  2. General Learning Problem Given: , drawn i.i.d. Assumption 1: is from a low-complexity class Assumption 2:, some reasonable distribution Goal: Recover exactly

  3. A Natural Class? Given: , drawn i.i.d. Assumption 1: depends on only features

  4. A Natural Class? Given: , drawn i.i.d. Assumption 1: depends on only features Goal 1: Learn with low sample complexity Goal 2: Learn computationally efficiently

  5. A Natural Class? Given: , drawn i.i.d. Assumption 1: depends on only features linear → classical compressed sensing Goal 1: Learn with low sample complexity Goal 2: Learn computationally efficiently

  6. Linear functions: Compressed Sensing Given: , drawn i.i.d. Assumption 1: depends on only features Goal 1: Learn with samples Goal 2: Learn in runtime

  7. A Natural Class? Given: , drawn i.i.d. Assumption 1: depends on only features nonlinear ? Perhaps a polynomial… Goal 1: Learn with low sample complexity Goal 2: Learn computationally efficiently

  8. Sparse polynomial functions Given: , drawn i.i.d. Assumption 1: depends on only features Goal 1: Learn with samples Goal 2: Learn in runtime

  9. Simplest Case: Sparse Monomials A Simple Nonlinear Function Class 3 dimensions In dimensions Ex: andsparse

  10. The Learning Problem Given: , drawn i.i.d. Assumption 1: is a -sparse monomial function Assumption 2: Goal: Recover exactly

  11. Attribute-Efficient Learning • Sample efficiency: • Runtime efficiency: ops • Goal: achieve both!

  12. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  13. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  14. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  15. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  16. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  17. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14]

  18. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14] • samples and runtime • is maximum degree • is number of monomials

  19. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14] • samples and runtime • is maximum degree • is number of monomials • Works for Gaussian and Uniform distributions…

  20. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14] • samples and runtime • is maximum degree • is number of monomials • Works for Gaussian and Uniform distributions… • BUT they must be product distributions!

  21. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14] • samples and runtime • is maximum degree • is number of monomials • Works for Gaussian and Uniform distributions… • BUT they must be product distributions! • Whitening blows up complexity

  22. Motivation • Monomials Parity functions • No attribute-efficient algs![Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] • Even in the noiseless setting • Brute force: samples, runtime • Can improve to runtime • Improper learner: samples, runtime • Even under avg. case assumptions… • samples and runtime • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse sums of monomials[Andoni+’14] Uncorrelated features:

  23. Motivation • Sparse linear regression[Candes+’04, Donoho+’04, Bickel+’09…] • Sparse polynomials[Andoni+’14] For uncorrelated features: Question: What if ? Monomials Parity functions No attribute-efficient algs![Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…]

  24. Potential Degeneracy of Ex: Singular matrix canbelow-rank!

  25. Rest of the Talk • Algorithm • Intuition • Analysis • Conclusion

  26. 1. Algorithm

  27. The Algorithm Ex: Step Log-transformed Data Gaussian Data Step Sparse Regression: (Ex: Basis Pursuit) feature

  28. 2. Intuition

  29. Why is our Algorithm Attribute-Efficient? • Runtime: basis pursuit is efficient • Sample complexity? • Sparse linear regression? E.g., • But: sparse recovery properties may not hold…

  30. Degenerate High Correlation Recall the example: -eigenvectors can be -sparse Sparse recovery conditions false! 3-sparse

  31. Summary of Challenges • Highly correlated features • Nonlinearity of • Need a recovery condition…

  32. Log-Transform affects Data Covariance “inflating the balloon” Spectral View: Destroys correlation structure

  33. 3. Analysis

  34. Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09] Ex: Restricted Eigenvalue Cone restriction “restricted strong convexity” Note:

  35. Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09] Ex: Restricted Eigenvalue Cone restriction “restricted strong convexity” Note: Sufficient to prove exact recovery for basis pursuit!

  36. Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue - with probability with high probability Exact Recovery for Basis Pursuit with high probability

  37. Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue - with probability Sample Complexity Bound: with high probability Exact Recovery for Basis Pursuit with high probability

  38. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: (for large enough )

  39. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: (for large enough )

  40. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: Integration by Parts Recursive Properties of Hermite Polynomials Stirling Approximation Note: if odd. (for large enough )

  41. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: Elementwise Matrix Product (for large enough )

  42. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: is superadditive when is symmetric PSD with on the diagonal [Bapat & Sunder, ’85] (for large enough )

  43. Population Minimum Eigenvalue • Hermite expansion of • off-diagonals decay fast! • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: Choose so that (for large enough )

  44. Population Minimum Eigenvalue • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: The Full, Ugly Bound (for large enough )

  45. Population Minimum Eigenvalue • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: The Full, Ugly Bound (for large enough )

  46. Population Minimum Eigenvalue • Apply to Hermite formula: • Apply Gershgorin Circle Theorem: The Full, Ugly Bound (for large enough )

  47. Concentration of Restricted Eigenvalue • - • Follows from Holder’s inequality and the Restricted Cone condition • Log-transformed variables are sub-exponential • Elementwise error concentrates • [Kuchibhotla & Chakrabortty ‘18]

  48. Concentration of Elementwise error With probability at least , [Kuchibhotla & Chakrabortty ‘18]

  49. 4. Conclusion

  50. Recap • Attribute-efficient algorithm for monomials • Prior (nonlinear) work: uncorrelated features • This work: allow highly correlated features • Works beyond multilinear monomials • Blessing of nonlinearity

More Related