Robust Nonparametric Regression by Controlling Sparsity

Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments: NSF grants no. CCF-0830480, 1016605 EECS-0824007, 1002180 May 24, 2011

Nonparametric regression • Given , function estimation allows predicting • Estimate unknown from a training data set • If one trusts data more than any parametric model • Then go nonparametric regression: • lives in a (possibly -dimensional) space of “smooth’’ functions • Ill-posed problem • Workaround: regularization [Tikhonov’77], [Wahba’90] • RKHS with reproducing kernel and norm • Our focus • Nonparametric regression robust against outliers • Robustness by controlling sparsity 2

Our work in context • Noteworthy applications • Load curve data cleansing [Chen et al’10] • Spline-based PSD cartography [Bazerque et al’09] • Robust nonparametric regression • Huber’s function [Zhu et al’08] • No systematic way to select thresholds • Robustness and sparsity in linear (parametric) regression • Huber’s M-type estimator as Lasso [Fuchs‘99]; contamination model • Bayesian framework [Jin-Rao‘10][Mitra et al’10]; rigid choice of 3

Variational LTS Least-trimmed squares (LTS) regression [Rousseeuw’87] (VLTS) • is the -th order statistic among • residuals discarded • Q: How should we go about minimizing ? (VLTS) is nonconvex; existence of minimizer(s)? A: Try all subsamples of size , solve, and pick the best Variational (V)LTS counterpart • Simple but intractable beyond small problems 4

Modeling outliers outlier • Outlier variables s.t. otherwise • Nominal data obey ; outliers something else • Remarks • Both and are unknown • If outliers sporadic, then vector is sparse! • Natural (but intractable) nonconvex estimator 5

VLTS as sparse regression Proposition 1: If solves (P0) with chosen s.t. , then solves (VLTS) too. • Lagrangian form (P0) • Tuning parameter controls sparsity in number of outliers • The equivalence • Formally justifies the regression model and its estimator (P0) • Ties sparse regression with robust estimation 6

Just relax! • (P0) is NP-hard relax • (P1) convex, and thus efficiently solved • Role of sparsity controlling is central (P1) • Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where 7

Alternating minimization • (P1) jointly convex in AM solver (P1) • Remarks • Single Cholesky factorization of • Soft-thresholding • Reveals the intertwining between • Outlier identification • Function estimation with outlier compensated data 8

Lassoing outliers Minimizers of (P1) are fully determined by w/ • Enables effective methods to select • Lasso solvers return entire robustification path (RP) • Alternative to AM solve Lasso [Tibshirani’94] Proposition 2: as and , with • Cross-validation (CV) fails with multiple outliers [Hampel’86] 9

Robustification paths Coeffs. • Leverage these solvers consider 2-D grid • values of • For each , values of • Lasso path of solutions is piecewise linear • LARS returns whole RP [Efron’03] • Same cost of a single LS fit ( ) • Lasso is simple in the scalar case • Coordinate descent is fast! [Friedman ‘07] • Exploits warm starts, sparsity • Other solvers: SpaRSA [Wright et al’09], SPAMS [Mairal et al’10] 10

Selecting and • Variance of the nominal noise known: from RP, for each on the grid, obtain an entry of the sample variance matrix as The best are s.t. • Variance of the nominal noise unknown: replace above with a robust estimate , e.g., median absolute deviation (MAD) • Relies on RP and knowledge on the data model • Number of outliers known: from RP, obtain range of s.t. .Discard outliers (known), and use CV to determine 11

Nonconvex regularization • Remarks • Initialize with , use and • Bias reduction (cf. adaptive Lasso [Zou’06]) • Nonconvex penalty terms approximate better in (P0) • Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08] • Iterative linearization-minimization of around 12

Robust thin-plate splines • Solution: • Radial basis function • Augment w/ member of the nullspace of • Given , unknowns found in closed form • Still, Proposition 2 holds for appropriate • Specialize to thin-plate splines [Duchon’77], [Wahba’80] • Smoothing penalty only a seminorm in 13

Simulation setup • Training set : noisy samples of Gaussian mixture • examples, i.i.d. • Outliers: i.i.d. for • Nominal: w/ i.i.d. ( known) Data True function 14

Robustification paths Grid parameters: grid: grid: Outlier Inlier • Paths obtained using SpaRSA [Wright et al’09] 15

Results Nonrobust predictions True function Robust predictions Refined predictions • Effectiveness in rejecting outliers is apparent 16

Generalization capability • Figures of merit • Training error: • Test error: • Nonconvex refinement leads to consistently lower • In all cases, 100% outlier identification success rate 17

Load curve data cleansing Uruguay’s aggregate power consumption (MW) • Load curve: electric power consumption recorded periodically • Reliable data: key to realize smart grid vision • B-splines for load curve prediction and denoising [Chen et al ’10] • Deviation from nominal models (outliers) • Faulty meters, communication errors • Unscheduled maintenance, strikes, sporting events 18

Real data tests Nonrobust predictions Robust predictions Refined predictions 19

Concluding summary Robust nonparametric regression VLTS as -(pseudo)norm regularized regression (NP-hard) Convex relaxationvariational M-type estimator Lasso • Controlling sparsity amounts to controlling number of outliers • Sparsity controlling role of is central • Selection of using the Lasso robustification paths • Different options dictated by available knowledge on the data model • Refinement via nonconvex penalty terms • Bias reduction and improved generalization capability • Real data tests for load curve cleansing 20

Robust Nonparametric Regression by Controlling Sparsity