1 / 20

Robust Nonparametric Regression by Controlling Sparsity

Robust Nonparametric Regression by Controlling Sparsity. Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments : NSF grants no. CCF-0830480, 1016605 EECS-0824007, 1002180. May 24, 2011. Nonparametric regression.

sammy
Download Presentation

Robust Nonparametric Regression by Controlling Sparsity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments: NSF grants no. CCF-0830480, 1016605 EECS-0824007, 1002180 May 24, 2011

  2. Nonparametric regression • Given , function estimation allows predicting • Estimate unknown from a training data set • If one trusts data more than any parametric model • Then go nonparametric regression: • lives in a (possibly -dimensional) space of “smooth’’ functions • Ill-posed problem • Workaround: regularization [Tikhonov’77], [Wahba’90] • RKHS with reproducing kernel and norm • Our focus • Nonparametric regression robust against outliers • Robustness by controlling sparsity 2

  3. Our work in context • Noteworthy applications • Load curve data cleansing [Chen et al’10] • Spline-based PSD cartography [Bazerque et al’09] • Robust nonparametric regression • Huber’s function [Zhu et al’08] • No systematic way to select thresholds • Robustness and sparsity in linear (parametric) regression • Huber’s M-type estimator as Lasso [Fuchs‘99]; contamination model • Bayesian framework [Jin-Rao‘10][Mitra et al’10]; rigid choice of 3

  4. Variational LTS Least-trimmed squares (LTS) regression [Rousseeuw’87] (VLTS) • is the -th order statistic among • residuals discarded • Q: How should we go about minimizing ? (VLTS) is nonconvex; existence of minimizer(s)? A: Try all subsamples of size , solve, and pick the best Variational (V)LTS counterpart • Simple but intractable beyond small problems 4

  5. Modeling outliers outlier • Outlier variables s.t. otherwise • Nominal data obey ; outliers something else • Remarks • Both and are unknown • If outliers sporadic, then vector is sparse! • Natural (but intractable) nonconvex estimator 5

  6. VLTS as sparse regression Proposition 1: If solves (P0) with chosen s.t. , then solves (VLTS) too. • Lagrangian form (P0) • Tuning parameter controls sparsity in number of outliers • The equivalence • Formally justifies the regression model and its estimator (P0) • Ties sparse regression with robust estimation 6

  7. Just relax! • (P0) is NP-hard relax • (P1) convex, and thus efficiently solved • Role of sparsity controlling is central (P1) • Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where 7

  8. Alternating minimization • (P1) jointly convex in AM solver (P1) • Remarks • Single Cholesky factorization of • Soft-thresholding • Reveals the intertwining between • Outlier identification • Function estimation with outlier compensated data 8

  9. Lassoing outliers Minimizers of (P1) are fully determined by w/ • Enables effective methods to select • Lasso solvers return entire robustification path (RP) • Alternative to AM solve Lasso [Tibshirani’94] Proposition 2: as and , with • Cross-validation (CV) fails with multiple outliers [Hampel’86] 9

  10. Robustification paths Coeffs. • Leverage these solvers consider 2-D grid • values of • For each , values of • Lasso path of solutions is piecewise linear • LARS returns whole RP [Efron’03] • Same cost of a single LS fit ( ) • Lasso is simple in the scalar case • Coordinate descent is fast! [Friedman ‘07] • Exploits warm starts, sparsity • Other solvers: SpaRSA [Wright et al’09], SPAMS [Mairal et al’10] 10

  11. Selecting and • Variance of the nominal noise known: from RP, for each on the grid, obtain an entry of the sample variance matrix as The best are s.t. • Variance of the nominal noise unknown: replace above with a robust estimate , e.g., median absolute deviation (MAD) • Relies on RP and knowledge on the data model • Number of outliers known: from RP, obtain range of s.t. .Discard outliers (known), and use CV to determine 11

  12. Nonconvex regularization • Remarks • Initialize with , use and • Bias reduction (cf. adaptive Lasso [Zou’06]) • Nonconvex penalty terms approximate better in (P0) • Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08] • Iterative linearization-minimization of around 12

  13. Robust thin-plate splines • Solution: • Radial basis function • Augment w/ member of the nullspace of • Given , unknowns found in closed form • Still, Proposition 2 holds for appropriate • Specialize to thin-plate splines [Duchon’77], [Wahba’80] • Smoothing penalty only a seminorm in 13

  14. Simulation setup • Training set : noisy samples of Gaussian mixture • examples, i.i.d. • Outliers: i.i.d. for • Nominal: w/ i.i.d. ( known) Data True function 14

  15. Robustification paths Grid parameters: grid: grid: Outlier Inlier • Paths obtained using SpaRSA [Wright et al’09] 15

  16. Results Nonrobust predictions True function Robust predictions Refined predictions • Effectiveness in rejecting outliers is apparent 16

  17. Generalization capability • Figures of merit • Training error: • Test error: • Nonconvex refinement leads to consistently lower • In all cases, 100% outlier identification success rate 17

  18. Load curve data cleansing Uruguay’s aggregate power consumption (MW) • Load curve: electric power consumption recorded periodically • Reliable data: key to realize smart grid vision • B-splines for load curve prediction and denoising [Chen et al ’10] • Deviation from nominal models (outliers) • Faulty meters, communication errors • Unscheduled maintenance, strikes, sporting events 18

  19. Real data tests Nonrobust predictions Robust predictions Refined predictions 19

  20. Concluding summary Robust nonparametric regression VLTS as -(pseudo)norm regularized regression (NP-hard) Convex relaxationvariational M-type estimator Lasso • Controlling sparsity amounts to controlling number of outliers • Sparsity controlling role of is central • Selection of using the Lasso robustification paths • Different options dictated by available knowledge on the data model • Refinement via nonconvex penalty terms • Bias reduction and improved generalization capability • Real data tests for load curve cleansing 20

More Related