1 / 49

AMCS/CS 340: Data Mining

Regression. AMCS/CS 340: Data Mining. Xiangliang Zhang King Abdullah University of Science and Technology. Outline. What is regression? Minimizing sum-of-square error Linear regression Nonlinear regression Statistic models for regression Overfitting and Cross validation for regression.

manton
Download Presentation

AMCS/CS 340: Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression AMCS/CS 340: Data Mining Xiangliang Zhang King Abdullah University of Science and Technology

  2. Outline • What is regression? • Minimizing sum-of-square error • Linear regression • Nonlinear regression • Statistic models for regression • Overfitting and Cross validation for regression 2 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  3. Classification (reminder) X  Y • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • discrete: • {0,1} binary • {1,…k} multi-class • tree, etc. structured 3 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  4. Regression X  Y • continuous: • , d • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • discrete: • {0,1} binary • {1,…k} multi-class • tree, etc. structured 4 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  5. Examples • Data  Predicting or Forecasting: • Processes, memory  Power consumption • Protein structure  Energy • Heart-beat rate, age, speed, duration  Fat • Oil supply, consumption, etc  Oil price • …… 5 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  6. Linear regression 40 26 24 Temperature 22 20 20 30 40 20 30 20 10 0 10 0 10 20 0 0 Given examples given a new point Predict 6 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  7. 26 24 22 20 30 40 20 30 Prediction 20 10 Prediction 10 0 0 Linear regression 40 Temperature 20 0 0 20 7

  8. Outline • What is regression? • Minimizing sum-of-square error • Linear regression • Nonlinear regression • Statistic models for regression • Overfitting and Cross validation for regression 8 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  9. Sum-of-Squares Error Function Observation Prediction Error or “residual” minimizing Sum squared error Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  10. Sum-of-Squares Error Function Minimizing E(w) to find w* E(w) ---- quadratic function of w If derivative of E(w) w.r.t. w--- linear of w unique solution for minimizing E(w) Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  11. Sum squared error Least Squares Error or “residual” Observation Prediction 0 0 20 Estimate the unknown parameter w by minimizing 11 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  12. Minimize the sum squared error Sum squared error =0 Predict http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo1.m 12

  13. Linear regression models Generally where Фj(x)are known as basis functions. Typically, Ф0(x) = 1, so that w0 acts as a bias. e.g. = [0 x x2 x3 ]

  14. Linear regression models Example: Polynomial Curve Fitting Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  15. Basis Functions Polynomial basis functions: These are global; a small change in xaffectsall basis functions. Gaussianbasis functions: Sigmoidal basis functions: where Gaussian and Sigmoidal basis functions are local; a small change in x only affect nearby basis functions. μj and s control location and scale (width or slope).

  16. Minimize the sum squared error Sum squared error =0 Predict http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo2.m 16

  17. Batch gradient descent algorithm (1) If derivative of E(w) w.r.t. wis not linear of w, Batch gradient descent Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  18. Batch gradient descent algorithm (2) Batch gradient descent example Update the value of w according to the gradient Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  19. Batch gradient descent algorithm (3) Batch gradient descent example Iteratively approaches the optimum of the Error function Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  20. Statistical models for regression • Machine Learning paradigms • Regression Tree (CART) • Neural Networks • Support Vector Machine • Strength • flexibility within a nonparametric, assumption-free openwork that accommodates big data • Weakness • difficulty in interpreting the effects of each covariate 20 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  21. Regression Tree Target value 21 http://chem-eng.utoronto.ca/~datamining/dmc/decision_tree_reg.htm

  22. SVM Regression Given training data Find: , such that optimally describes the data: Sum of errors ● ● ● Subject to: ● ● ● ● “Support vectors” 22 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  23. NN Regression Minimize 23 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  24. Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression 24 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  25. Regression overfitting • Which one is the best ? • The one with best fit to the data ? • How well is it going to predict future data drawn from the same distribution? x 25 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  26. Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 26 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  27. The test set method 27 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  28. The test set method 28 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  29. The test set method 29 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  30. The test set method • Pros • very simple • Can then simply choose the method with the best test-set score • Cons • Wastes data: we get an estimate of the best method to apply to 30% less data • If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performancehas high variance”) 30 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  31. Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 31 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  32. LOOCV (Leave-one-out Cross Validation) 32 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  33. LOOCV (Leave-one-out Cross Validation) 33 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  34. LOOCV (Leave-one-out Cross Validation) 34 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  35. LOOCV (Leave-one-out Cross Validation) 35 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  36. LOOCV (Leave-one-out Cross Validation) When you have done all points, report the mean error 36 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  37. LOOCV for Linear Regression For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 2.12 37

  38. LOOCV for Quadratic Regression For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 0.962 38

  39. LOOCV for Join the Dots For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 3.33 39

  40. Which one of validations ? k-fold Cross Validation gets the best of both worlds 40 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  41. Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 41 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  42. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) 42 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  43. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. 43 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  44. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. 44 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  45. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. 45 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  46. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Linear Regression MSE3FOLD = 2.05 46 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  47. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Quadratic Regression MSE3FOLD = 1.11 47 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  48. k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Join the Dots MSE3FOLD = 2.93 48 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

  49. CV-based Regression Algorithm Choice • Choosing which regression algorithm to use • Step 1: Compute 10-fold-CV error for six different model • Step 2: Whichever algorithm gave best CV score: train it with all the data, and that’s the predictive model you’ll use. 49 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

More Related