1 / 69

INF 5860 Machine learning for image classification

INF 5860 Machine learning for image classification. Lecture 8: Generalization Tollef Jahren March 7 , 2018. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks. Outline.

kconti
Download Presentation

INF 5860 Machine learning for image classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INF 5860 Machine learning for image classification Lecture 8: Generalization Tollef Jahren March 7 , 2018

  2. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks Outline

  3. Part1: Learning theory Part2: Practical aspects of learning About today

  4. Learning theory (caltech course): • https://work.caltech.edu/lectures.html • Lecture (Videos): 2,5,6,7,8,11 • Read: CS231n: section “Dropouts” • http://cs231n.github.io/neural-networks-2/ • Optional: • Read: The Curse of Dimensionality in classification • http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/ • Read: Rethinking generalization • https://arxiv.org/pdf/1611.03530.pdf Readings

  5. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks Progress

  6. Classification is to find the decision boundary But is itlearning? Is learning feasible?

  7. Formalization supervised learning: • Input: • Output: • Target function: • Data: • Hypothesis: Example: Hypothesis set: A hypothesis: Notation

  8. In-sample (colored): Training data available to find your solution. Out-of-sample (gray): Data from the real world, the hypothesis will be used for. Final hypothesis: Target hypothesis: Generalization: Difference between the in-sample error and the out-of-sample error More notation

  9. Learning diagram • The Hypothesis Set • The Learning Algorithm • e.g. Gradient The hypothesis set and the learning algorithm are referred to as the Learning model

  10. Learning puzzle

  11. We cannot know what we have not seen! • What can save us? • Answer: Probability The target function is UNKNOWN

  12. Requirement: • The in-sample and out-of-sample data must be drawn from the same distribution (process) Drawing from the same distribution

  13. For a randomly selected hypothesis The closest error approximation is the in-sample error What is the expected out-of-sample error?

  14. A general view of training: • Training is a search through possible hypothesis • Use in-sample data to find the best hypothesis What is training?

  15. Smaller in-sample error Increasing the probability that the result is a coincidence The expected out-of-sample error is greater or equal to the in-sample error What is the effect of choosing the best hypothesis?

  16. The extreme case search through all possibilities Then you are guaranteed 0% in-sample error rate No information about the out-of-sample error Searching through all possibilities

  17. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks Progress

  18. The model restrict the number of hypothesis you can find Model capacity is a reference to how many possible hypothesis you have A linear model has a set of all linear functions as its hypothesis Capacity of the model (hypothesis set)

  19. Vapnik-Chervonenkis (VC) dimension • Denoted: • Definition: • The maximum number of points that can be arrange such that can shatter them. Measuring capacity

  20. Example VC dimension • (2D) Linear model • Configuration (3)

  21. Example VC dimension • (2D) Linear model • Configuration (3)

  22. Example VC dimension • (2D) Linear model • Configuration (3)

  23. Example VC dimension • (2D) Linear model • Configuration (3)

  24. Example VC dimension • (2D) Linear model • Configuration (3)

  25. Example VC dimension • (2D) Linear model • Configuration (3)

  26. Example VC dimension • (2D) Linear model • Configuration ()

  27. Example VC dimension • (2D) Linear model • Configuration ()

  28. Example VC dimension • (2D) Linear model • Configuration ()

  29. Example VC dimension • (2D) Linear model • Configuration ()

  30. Example VC dimension • (2D) Linear model • Configuration ()

  31. Example VC dimension • (2D) Linear model • Configuration ()

  32. Example VC dimension • (2D) Linear model • Configuration ()

  33. Example VC dimension • (2D) Linear model • Configuration ()

  34. Example VC dimension • (2D) Linear model • Configuration () No separating line

  35. Definition • The maximum number of points that can be arrange such that can shatter them. • The VC dimension of a linear model in dimension is: • Capacity increases with the number of effective parameters VC dimension

  36. The growth function is a measure of the capacity of the hypothesis set. Given a set of N samples and an unrestricted hypothesis set, the value of the growth function is: For a restricted (limited) hypothesis set the growth function is bounded by: Growth function Maximum power is

  37. Growth function for linear model

  38. Error measure binary classification: • In-sample error: • Out-of-sample error: • Generalization error: Generalization error

  39. Number of In-sample samples, Generalization threshold, Growth function: The Vapnik-Chervonenkis Inequality Upper generalization bound Maximum power is

  40. Restricting the capacity of the hypothesis set! • But are we satisfied? • No! • The overall goal is to have a small What makes learning feasible?

  41. T With probability

  42. A model with wrong hypothesis will never be correct Hypotheses space All hypotheses All hypotheses Hypotheses space Cannot be found correct correct

  43. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks Progress

  44. The in-sample data will contain noise. • Origin of noise: • Measurement (sensor) noise • The in-sample datamay not include all parameters • Our has not the capacity to fit the target function Noise

  45. We want to fit our hypothesis to the target function, not the noise • Example: • Target function: second order polynomial • Noisy in-sample data • Hypothesis: Fourth order polynomial Result: The role of noise

  46. Overfitting - Training to hard • Initially, the hypothesis is not selected from the data and and are similar. • While training, we are exploring more of the hypothesis space • The effective VC dimension is growing at the beginning, and defined by the number of free parameters at the end

  47. Regularization • With a tiny weight penalty, we can reduce the effect of noise significantly.

  48. Is learning feasible? Model complexity Overfitting Evaluating performance Learning from small datasets Rethinking generalization Capacity of dense neural networks Progress

  49. Training set (60%) • Used to train our model • Validation set (20%) • Used to select the best hypothesis • Test set (20%) • Used to get a representative out-of-sample error Splitting of data

  50. Keep a dataset that you don’t look at until evaluation (test set) The test set should be as different from your training set as you expect the real world to be Important! No peeking

More Related