1 / 9

Overfitting and Regularization Chapters 11 and 12 on amlbook

Overfitting and Regularization Chapters 11 and 12 on amlbook.com. Over-fitting easy to recognize in 1D Parabolic target function 4 th order hypothesis 5 data points -> E in = 0. Origin of over-fitting can be analyzed in 1D: Bias/variance dilemma. Over-fitting easy to avoid in 1D:

Download Presentation

Overfitting and Regularization Chapters 11 and 12 on amlbook

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overfitting and Regularization Chapters 11 and 12 on amlbook.com

  2. Over-fitting easy to recognize in 1D Parabolic target function 4th order hypothesis 5 data points -> Ein = 0

  3. Origin of over-fitting can be analyzed in 1D: Bias/variance dilemma

  4. Over-fitting easy to avoid in 1D: Results from HW2 Eval Sum of squared deviations Ein Degree of polynomial

  5. Using Eval to avoid over-fitting works in all dimensions but computation grows rapidly for large d Ein Ecv-1 Eval EE d = 2 Terms in F5(x) added successively Validation set needs to be large Does this compromise training?

  6. What if we want to add higher order terms to a linear model but don’t have enough data a validation set? Solution: Augment the error function used to optimize weights Example Penalizes choices with large |w|. Called “weight decay”

  7. Normal equations with weight decay essentially unchanged (ZTZ + lI) wreg =ZTy

  8. Best value l is subjective In this case l = 0.0001 large enough to suppress swings and data still important in determining optimum weights

  9. Assignment 8: due 11-13-14 Generation of in silico dataset y(x) = 1 + 9x2 + N(0,1) with 5 randomly selected values of x between -1 and +1 Fit a 4th degree polynomial to the data with and without regularization by choosing l = 0, 0.0001, 0.001,0.01,1.0, and 10. Display results as in slide 8 of lecture on regularization

More Related