1 / 76

Linear Regression

Linear Regression. Fall 2014 The University of Iowa Tianbao Yang. Content. Linear Regression with one variable Probability Interpretation Linear Basis Function Models Optimization Multiple Outputs Regularization and Lasso Bias and Variance Tradeoff Model Selection.

gaston
Download Presentation

Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression Fall 2014 The University of Iowa Tianbao Yang

  2. Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Model Selection

  3. Linear Regression with One Variable • Example: predict house price • Training Data: a set of examples • input (feature): size of house • output (target): house price price First-order Linear Regression size

  4. Linear Regression with One Variable • How to estimate the model parameters price size

  5. Linear Regression with One Variable • How to estimate the model parameters price size

  6. Linear Regression with One Variable • How to estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error price Loss Function (function in parameters) size

  7. Linear Regression with One Variable • How to estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error price Loss Function (function in parameters) size

  8. Linear Regression with One Variable • To estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error • minimize the sum of all losses Square loss Least Square Regression

  9. Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression

  10. Supervised Learning • Training examples: • Identical independent distribution (i.i.d) assumption • A critical assumption for machine learning theory

  11. Probability Interpretation • Training Data: a set of examples • input (feature): size of house • output (target): house price Random variable Random variable price Standard Gaussian Noise size

  12. Data Likelihood • Training Data: a set of examples variance variance mean mean Data Likelihood i.i.d assumption

  13. Maximum Likelihood Estimation (MLE) • Estimate the model parameters • Maximum Likelihood Estimation

  14. MLE is Equivalent to Least Square Regression • Maximum Likelihood Estimation

  15. MLE is Equivalent to Least Square Regression • Least Square Regression IS • Maximum Likelihood Estimation

  16. Probability Interpretation of Linear Regression • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Model Selection

  17. Linear Basis Function Models • Example: Polynomial Curve Fitting

  18. 0th order Polynomial

  19. 1st order Polynomial

  20. 3rd order Polynomial

  21. 9th order Polynomial

  22. Linear Basis Function Models • generally • where are known as basis functions. • typically , so that acts as a bias.

  23. Linear Basis Function Models • Polynomial basis functions: • These are global; a small change in x affect all basis functions.

  24. Linear Basis Function Models • Gaussian basis functions: • These are local; a small change in x only affect nearby basis functions.

  25. Linear Basis Function Models • Sigmoidal basis functions: • These are local; a small change in x only affect nearby basis functions.

  26. Linear Regression with Multi-Variables • Example: predict house price • Training Data: a set of examples • input (features) • size of house • year of house • etc • output (target): house price

  27. Least Square Regression • Minimize Sum of Square Loss

  28. Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression

  29. Procedures of Machine Learning • A Three-step view of Machine Learning • data collection (and pre-processing) • model building (and analysis) • optimization Data Optimization Model

  30. Optimization • Minimize Sum of Square Loss • Unconstrained Convex Optimization 1. compute the gradient with respect to (w.r.t)

  31. Optimization • Unconstrained Convex Optimization 2. set the gradient to zero

  32. Geometry of Least Square • Minimize Sum of Square Loss • subspace • , minimize the distance between and its orthogonal projection

  33. Large-scale Regression • expensivecomputation : the number of training data points and the dimensionality are both large Too many features Too many data

  34. Gradient Descent • Gradient Descent

  35. Stochastic Gradient Descent • Stochastic Gradient Descent Step-size

  36. Stochastic Gradient Descent • Stochastic Gradient Descent VS Gradient Descent

  37. Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression

  38. Multi-task Learning • Predict multiple outputs • Example: predict current house price, and house-price after two years

  39. Multi-task Learning • predict multiple outputs from the same features

  40. Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso and more • Bias and Variance Tradeoff

  41. 9th Order Polynomial

  42. Over-fitting Root-Mean-Square (RMS) Error:

  43. Polynomial Coefficients

  44. Avoid Over-fitting: Regularization • Consider the error function: • With the sum-of-squares error function and a quadratic regularizer, we get • which is minimized by Loss term + Regularization term Regularization parameter Ridge Regression

  45. Avoid Over-fitting: Regularization

  46. Geometric Explanation

  47. Analytical Explanation • See homework

  48. Probability Interpretation • Maximum a posteriori Estimation • Bayes' Theorem • Ridge regression is to maximize a posterior distribution Data Likelihood Prior of model Posterior of model

  49. Probability Interpretation • Maximum a posteriori Estimation • Bayes' Theorem • Ridge regression is to maximize a posterior distribution

  50. Probability Interpretation • Maximum a posteriori Estimation • prior distribution is Gaussian distribution

More Related