1 / 34

Regression Analysis

Summer Course: Data Mining. Regression Analysis. Presenter: Georgi Nalbantov. August 2009. Structure. Regression analysis: definition and examples Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear)

libby
Download Presentation

Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summer Course: Data Mining Regression Analysis Presenter: Georgi Nalbantov August 2009

  2. Structure • Regression analysis: definition and examples • Classical Linear Regression • LASSO and Ridge Regression (linear and nonlinear) • Nonparametric (local) regression estimation:kNN for regression, Decision trees, Smoothers • Support Vector Regression (linear and nonlinear) • Variable/feature selection (AIC, BIC, R^2-adjusted)

  3. Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process U.M.Fayyad, G.Patetsky-Shapiro and P.Smyth (1995)

  4. Common Data Mining tasks • k-th Nearest Neighbour • Parzen Window • Unfolding, Conjoint Analysis, Cat-PCA Clustering ClassificationRegression + X 2 X 2 + + + + + + + + + + + - + + + + + + - + + + - - + + + + + + + + - + - + X 1 X 1 X 1 • Linear Discriminant Analysis, QDA • Logistic Regression (Logit) • Decision Trees, LSSVM, NN, VS • Classical Linear Regression • Ridge Regression • NN, CART

  5. Linear regression analysis: examples

  6. Linear regression analysis: examples

  7. The Regression task • Given data on m explanatory variables and 1 explained variable, where the explained variable can take real values in 1, find a function that gives the “best” fit: • Given: ( x1, y1 ), … , ( xm, ym)  nX 1 • Find:  : n   1 “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k, ym+k)is minimal

  8. Classical Linear Regression (OLS) • Explanatory and Response Variables are Numeric • Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) • Model: • b1 > 0  Positive Association • b1 < 0  Negative Association • b1 = 0  No Association

  9. Classical Linear Regression (OLS) b0 Mean response whenx=0 (y-intercept) b1 Change in mean response when x increases by 1 unit (slope) b0,b1 are unknown parameters (like m) b0+b1x Mean response when explanatory variable takes on the value x Task:Minimize the sum of squared errors:

  10. Classical Linear Regression (OLS) • Parameter: Slope in the population model (b1) • Estimator: Least squares estimate: • Estimated standard error: • Methods of making inference regarding population: • Hypothesis tests (2-sided or 1-sided) • Confidence Intervals x1 y

  11. Classical Linear Regression (OLS)

  12. Classical Linear Regression (OLS)

  13. Classical Linear Regression (OLS) • Coefficient of determination (r2) : proportion of variation in y “explained” by the regression on x. where

  14. Classical Linear Regression (OLS):Multiple regression • Numeric Response variable (y) • p Numeric predictor variables • Model: • Y = b0 + b1x1 +  + bpxp+ e • Partial Regression Coefficients: bi effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant

  15. Classical Linear Regression (OLS):Ordinary Least Squares estimation • Population Model for mean response: • Least Squares Fitted (predicted) equation, minimizing SSE:

  16. Classical Linear Regression (OLS):Ordinary Least Squares estimation • Model: • OLS estimation: • LASSO estimation: • Ridge regression estimation:

  17. LASSO and Ridge estimation of model coefficients sum(|beta|) sum(|beta|)

  18. Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

  19. Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

  20. Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

  21. How to Choose k or h? Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers • When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity • As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity • Cross-validation is used to finetune k or h.

  22. middle-sized area ● ● ● ● ● ● ● ● Expenditures Expenditures ● ● ● ● ● ● Age Age Linear Support Vector Regression small area biggest area ● ● ● ● Expenditures ● ● ● “Support vectors” Age “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalisation) “Lazy case” (underfitting) • The thinner the “tube”, the more complex the model

  23. ● ● ● Expenditures ● ● ● Age Nonlinear Support Vector Regression • Map the data into a higher-dimensional space:

  24. ● ● ● Expenditures ● ● ● Age Nonlinear Support Vector Regression • Map the data into a higher-dimensional space:

  25. Nonlinear Support Vector Regression: Technicalities • The SVR function: • To find the unknown parameters of the SVR function, solve: Subject to: • How to choose , , • = RBF kernel: • Find , , , and from a cross-validation procedure

  26. SVR Technicalities: Model Selection • Do 5-fold cross-validation to find and for several fixed values of .

  27. SVR Study : Model Training, Selection and Prediction CVMSE(IR*, HR*, CR*) True returns (red) and raw predictions (blue) CVMSE(IR*, HR*, CR*)

  28. SVR: Individual Effects

  29. SVR Technicalities: SVR vs. OLS • Performance on the test set • Performance on the test set SVR MSE= 0.04 OLS MSE= 0.23

  30. Technical Note:Number of Training Errors vs. Model Complexity Min. number of training errors, Model complexity test errors training errors complexity Functions ordered in increasing complexity Best trade-off MATLAB video here…

  31. Variable selection for regression • Akaike Information Criterion (AIC). Final prediction error:

  32. Variable selection for regression • Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final prediction error: • BIC tends to choose simpler models than AIC.

  33. Variable selection for regression • R^2-adjusted:

  34. Conclusion / Summary / References • Classical Linear Regression • LASSO and Ridge Regression (linear and nonlinear) • Nonparametric (local) regression estimation:kNN for regression, Decision trees, Smoothers • Support Vector Regression (linear and nonlinear) • Variable/feature selection (AIC, BIC, R^2-adjusted) (any introductory statistical/econometric book) http://www-stat.stanford.edu/~tibs/lasso.html , Bishop, 2006 Alpaydin, 2004, Hastie et. el., 2001 Smola and Schoelkopf, 2003 Hastie et. el., 2001,(any statistical/econometric book)

More Related