Dimensionality reduction

Dimensionality reduction Alexis Boukouvalas Work in collaboration with D. M. Maniyar and D. Cornford Managing Uncertainty in Complex Models, Aston University

Goals • Develop methods for dimensionality reduction of either the input and/or output space of models. • To gain an understanding initially use a toy dataset to compare existing methods. • Later on utilize methods on real world models. • Goal is to extend methods to work with high number of variables - 10^5 Managing Uncertainty in Complex Models, Aston University

Methods • Feature Selection • Also known as Screening in statistical literature • Select p most relevant of the original k variables. • Meaning of variables is preserved => method results are interpretable • Projective methods • Variables are transformed X’=F(X) • Transformations can be linear or non-linear • Interpretation is non-trivial especially for non-linear mappings. Managing Uncertainty in Complex Models, Aston University

Toy data set (1) • Generate N base vectors x of dimensionality d from sampling a Latin hypercube. Normalize the data. • Evaluate the generative model g(.) • Corrupt the model output with independent identically distributed Gaussian noise. Initially we set noise variance is 0.1*signal variance. • [Screening] Augment with extra noise dimensions • e = Bx + input noise • Noise is always N(0,I). B matrix is described on the next slide. • [Projection] Project to a higher dimensional space using x’ = W*F(x) Managing Uncertainty in Complex Models, Aston University

Toy data set (2) • [Screening] B matrix determines correlation between noise and model variables • B=0 constructs noise variables that are uncorrelated to the model variables. • k randomly selected rows have a single non zero entry corresponding to the noise variable being linearly correlated to a single model variable. Currently k=0.5*#noise variables and coefficient is set to 0.5 • Same as previous but two elements of k rows are non-zero, k=0.8 and coefficients are randomly taken from the set {-0.2,-0.5,+0.5,+0.7} Managing Uncertainty in Complex Models, Aston University

Toy data set (3) • [Projection] Project into higher dimensional space q x’ = W*F(x) • W is a q*d weight matrix and F(·) are basis functions which are responsible for the projection mapping. A typical choice of such projection mapping is to use Radial Basis Functions (RBF). Managing Uncertainty in Complex Models, Aston University

Toy data set - extensions • Different noise models • Correlated • Multiplicative • Non-linear interactions of noise variables with model variables • Mix screening and projection Managing Uncertainty in Complex Models, Aston University

Feature Selection • Variable selection methods have been broadly categorised in three categories • Variable Ranking. Input variables are ranked according to the prediction accuracy of each input calculated against the model output. • Wrapper methods. The emulator is used to assess the predictive power of subsets of variables • Embedded methods. For both variable ranking and wrapper methods, the emulator is considered a perfect black box. In embedded methods, the variable selection is done as part of the training of the emulator. Managing Uncertainty in Complex Models, Aston University

Wrapper Methods • Forward selection where variables are progressively incorporated in larger and larger subsets • Backward elimination proceeds in the opposite direction. • Efroymson’s algorithm aka stepwise selection. Proceed as forward selection but after each variable is added, check if any of the selected variables can be deleted without significantly affecting RSS. • Exhaustive search where all possible subsets are considered. • Branch and Bound. Eliminate subset choices as early as possible. E.g. is variables A-Z, RSS of A,B subset 100, then C-Z subset branch need not be followed if RSS of all C-Z variables > 100. Managing Uncertainty in Complex Models, Aston University

Embedded methods • An embedded method commonly employed in the context of Gaussian Processes is Automatic Relevance Determination (ARD) where the characteristic length scales l determine the input relevance Managing Uncertainty in Complex Models, Aston University

Preliminary experiments • The following algorithms were used in the experiments • • BaseRelevant: Baseline run using the relevant dimensions only. The RMSE was obtained by training a GP on the relevant dimensions. This value can be interpreted as the optimal RMSE value. • • BaseAll: Baseline run using all the dimensions, i.e. relevant + extra. Again the RMSE was obtained by training a GP on this set. The difference BaseAll-BaseRelevant is a measure of the effect of the extra variables on the predictive accuracy of the GP. • • CorrCoef: Pearson Correlation Coefficient. A variable ranking is performed using the formulae 10 and the top 3 variables are selected and used to train a GP. • • LinFS: Employ a forward selection subset selection strategy using a multivariate linear regression model. The RMSE is obtained from evaluating the selected subset on a multiple linear regression model. • • GPFS: Again employ forward selection to generate subsets but use a GP rather than a linear model. • • ARD: Employ the ARD method to rank the input variables and select the top 3 to train a GP model. Managing Uncertainty in Complex Models, Aston University

Experiment 1:No correlation • 200 observations,3 model dimensions, 6 total 5.56684 5.56684 Managing Uncertainty in Complex Models, Aston University

Experiment2:Two var correlation • 200 observations,3 model dimensions, 6 total 5.56684 5.56684 Managing Uncertainty in Complex Models, Aston University

Experiment 3: ARD • Initial results for high-D input, two-correlated, model inputs 100, noise dimensions 500, number of observations 500. Length - Input Number 31.8373 361 18.7081 501 14.2097 296 12.7581 51 12.3160 456 11.8689 496 11.3176 166 10.2424 310 10.2220 420 9.6192 325 9.0732 363 Length - Input Number 8.6898 53 8.5453 347 7.9338 419 7.8201 294 7.8017 188 7.4327 103 7.3760 13 7.1526 572 7.0997 478 6.9481 393 6.6417 187 Managing Uncertainty in Complex Models, Aston University

Summary of Experiments • Best performing methods are GPFS and ARD which usually find the optimal subset. However the GPFS method is on average more than three times slower than ARD. • The CorrCoef and LinFS methods are computationally inexpensive but provide unsatisfactory results. • Even for simple mapping functions (sinx) on underdetermined systems where number of observations < dimensions, ARD breaks down. Managing Uncertainty in Complex Models, Aston University

Research Directions • Batch hierarchical screening • Explore the potential of partitioning the input space into groups of inputs, applying screening methods on the groups and combining the important inputs • Some work already done for linear models (Gabriel and Pan 1979) • Grouping of variables such that if two variable Xi Xj are in different groups, then their regression sum of squares (RSS) are additive, i.e. if Si is the reduction in RSS from including Xi and Sj for Xj, then when including both Xi Xj Si.j=Si+Sj Managing Uncertainty in Complex Models, Aston University

Research directions (2) • Coupled Emulation • separate emulators for different outputs, linked with some model for the covariance • Connections to sequential methods to handle large datasets. Linked to Sequential Sparse GPs? • Projective methods in conjunction with feature selection. Managing Uncertainty in Complex Models, Aston University

Projective methods [From Van der Maaten et al 2007] Managing Uncertainty in Complex Models, Aston University

But… • But [Van der Maaten et al 2007] compared the non-linear to linear methods and found them no better. Reasons they propose relate to curse of dimensionality, overfitting of local models and others. Managing Uncertainty in Complex Models, Aston University

References • Dimensionality Reduction: A Comparative Review, L.J.P. van der Maaten E.O. Postma H.J. van den Herik 2007 • Andr Elisseeff Isabelle Guyon. An Introduction to Variable and Feature Selection. Journal of Maching Learning Research, 3:1157–1182, 2003. Managing Uncertainty in Complex Models, Aston University

Dimensionality reduction