1 / 14

Smooth ε -Insensitive Regression by Loss Symmetrization

Smooth ε -Insensitive Regression by Loss Symmetrization. Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory . Before We Begin ….

milly
Download Presentation

Smooth ε -Insensitive Regression by Loss Symmetrization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smooth ε-Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory

  2. Before We Begin … Linear Regression: given find such that Least Squares: minimize Support Vector Regression: minimize s.t.

  3. Loss Symmetrization Loss functions used in classification Boosting: Symmetric versions of these losses can be used for regression:

  4. A General Reduction • Begin with a regression training set where , • Generate 2mclassification training examples of dimension n+1: • Learn while maintaining by minimizing a margin-based classification loss

  5. A Batch Algorithm An illustration of a single batch iteration Simplifying assumptions (just for the demo) • Instances are in • Set • Use the Symmetric Log-loss

  6. A Batch Algorithm Calculate discrepancies and weights: 43210 0 1 2 3 4

  7. A Batch Algorithm Cumulative weights: 0 1 2 3 4

  8. or Additive update Two Batch Algorithms Update the regressor: 43210 Log-Additive update 0 1 2 3 4

  9. Progress Bounds Theorem: (Log-Additive update) Theorem: (Additive update) Lemma: Both bounds are non-negative and equal zero only at the optimum

  10. Boosting Regularization A new form of regularization for regression and classification Boosting Can be implemented by addingpseudo-examples * Communicated by Rob Schapire where

  11. Regularization Contd. • Regularization Compactness of the feasible set for • Regularization A unique attainable optimizer of the loss function  Proof of Convergence Progress + compactness + uniqueness = asymptotic convergence to the optimum

  12. Exp-loss vs. Log-loss • Two synthetic datasets Log-loss Exp-loss

  13. Extensions • Parallel vs. Sequential updates • Parallel - update all elements of in parallel • Sequential - update the weight of a single weak regressor on each round (like classic boosting) • Another loss function – the “Combined Loss” Log-loss Exp-loss Comb-loss

  14. On-line Algorithms • GD and EG online algorithms for Log-loss • Relative loss bounds Future Directions • Regression tree learning • Solving one-class and various ranking problems using similar constructions • Regression generalization bounds based on natural regularization

More Related