1 / 14

Questions on Homework 1?

Questions on Homework 1? . Review of Terminology. Hypothesis or Model: A particular classifier: e.g., decision tree, neural network, etc. Hypothesis or Model Space: All possible hypotheses of a particular type (e.g., decision tree; polynomial function; neural network)

hateya
Download Presentation

Questions on Homework 1?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Questions on Homework 1?

  2. Review of Terminology • Hypothesis or Model: A particular classifier: e.g., decision tree, neural network, etc. • Hypothesis or Model Space: All possible hypotheses of a particular type (e.g., decision tree; polynomial function; neural network) • Learning algorithm: A method for choosing or constructing a hypothesis (or model) from a given hypothesis (or model) space • Hypothesis or Model Parameters: E.g., size of decision tree; degree of polynomial; number of weights for neural network [constrains the hypothesis space] • Learning algorithm parameters: E.g., “information gain” vs. “gain ratio”; or value of learning rate for perceptron learning

  3. Cross-Validation • Two uses: • Used to obtain better estimate of a model’s accuracy when data is limited. • Used for model selection.

  4. k-fold Cross Validation for Estimating Accuracy • Each example is used both as a training instance and as a test instance. • Split data into k disjoint parts: S1, S2, ..., Sk. • For i = 1 to k Select Sito be the test set. Train on the remaining data, test on Si, to obtain accuracy Ai . • Report as the final accuracy of learning algorithm.

  5. k-fold Cross Validation for Model Selection Run k-fold cross-validation with parameter i to produce k models. Compute average test accuracy of these k models. Choose parameter value with best average test accuracy. Use all training data to learn model with this parameter value. Test resulting model on separate, unseen test data.

  6. Evaluating Hypotheses, Continued • Precision: Fraction of true positives out of all predicted positives: • Recall: Fraction of true positives out of all actual positives:

  7. row = actual, column = predicted What is Precision (9)? What is Recall (9)? 75% of instances classified as “9” actually are “9” 86% of all “9”s were classified as “9”

  8. row = actual, column = predicted What is Precision (8)? What is Recall (8)?

  9. Error vs. Loss • Error rate: Fraction of incorrect answers given by a classifier h • Loss(y, ): Amount of utility lost by predicting when the correct answer is y. • Note that • Loss depends on the user and the task. E.g., for one user, we might have: L(spam, nospam) = 1, L(nospam, spam)= 10

  10. Goal of Machine Learning: Minimize expected loss over all input-output pairs (x, y) in data space. Need to define prior probability distribution P(X, Y) over input-output pairs. Let ξ be the set of all possible input-output pairs. Then, expected generalization loss for hypothesis h with respect to loss function L is: Best hypothesis, h*, is:

  11. Commonly used Loss functions

  12. Empirical Loss TypicallyP(X, Y) is not known. Learning method can only estimateGenLoss by observing empirical loss on a set of examples, E, where N = |E| . Best hypothesis, , is:

  13. Sources of Loss What are the possible reasons why would differ from the target function f ? • Unrealizability: • Variance: Different training sets return different h’s, especially when training sets are small • Noise: f is nondeterministic: returns different values of f(x) for same x. (Sometimes this is a result of not having all necessary attributes in x.) • Computational complexity: It may be intractable to search H.

  14. Regularization for Model Selection • Instead of doing cross-validation for model selection, put penalty (or, more generally, “regularization”) term directly in “Cost” function to be minimized:

More Related