1 / 16

Machine Learning CSE 681

Machine Learning CSE 681. CH2 - Supervised Learning Model Assesment. Model Assesment. We can measure the generalization ability ( performance) of a hypothesis, namely, the quality of its inductive bias, if we have access to data outside the training set.

kagami
Download Presentation

Machine Learning CSE 681

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningCSE 681 CH2 - SupervisedLearningModel Assesment

  2. Model Assesment • We can measure the generalization ability (performance)of a hypothesis, namely, thequality of its inductive bias, if we have access to data outside the trainingset. • We simulate this by dividing the training set we have into two parts.We use one part for training (i.e., to fit a hypothesis), and the remainingpart is called the validation set (V) and is used to test the generalizationability. • Given models (hypotheses) h1, ..., hk induced from the training set X, we can use E(hi | V ) to select the model hi with the smallest generalization error

  3. Model Assesment • If we need to report the error to give an idea about the expectederror (performance)of our best model, we should not use the validation error. • We have used the validation set to choose the best model, and it has effectively become a part of the training set. We need a third set, a test set,sometimes also called the publication set, containing examples not usedin training or validation.

  4. Model Assesment • To estimate generalization error (performance) , we need data unseen during training. • Standard setup: We split the data as • Training set (50-80%) • Validation set (10-25%) • Test (publication) set (10-25%) • Note: • Validation data can be added to training set before testing • Resampling methods can be used if data is limited

  5. Model Assesment • In practice, if we are using a machine learning tool, almostinvariably, all clasifiers have one or more parameters • The number of neighbors in a kNNclassification. • The network size, learning parameters and weights in Neural Networks (MLP) • How do we select the “optimal” parameter(s) or model for a given classification problem? • Methods: • Three-way data splits • Two-way data splits

  6. Model Assesment with three-way data splits • If model selection and true error estimates are to be computedsimultaneously, the data needs to be divided into three disjoint sets • Training set: a set of examples used for learning: to fit the parameters of the classifier • In the MLP case, we would use the training set to find the “optimal” weights with the back-prop rule. • Validation set: a set of examples used to tune the parameters of ofa classifier • In the MLP case, we would use the validation set to find the “optimal”number of hidden units or determine a stopping point for the back propagation algorithm • Test set: a set of examples used only to assess the performance of a fully-trained classifier • In the MLP case, we would use the test to estimate the error rate after wehave chosen the final model (MLP size and actual weights) • After assessing the final model with the test set, YOU MUST NOT further tune the model Source: Ricardo Gutierrez-Osuna

  7. Model Assesment with three-way data splits Source: Ricardo Gutierrez-Osuna

  8. Model Assesment with two-way data splits • We can prefer two-way data split approach if we are using a machine learning tool with standard parameters. • Two-way data split methods: • Holdout method • Cross Validation • Random Subsampling • K-Fold Cross-Validation • Leave-one-out Cross-Validation • Bootstrap • In two-way data splits, we split the training data into disjoint subsets. We call them as traning set and test set.

  9. The holdout method • Split dataset into two groups (%70, %30) • Training set: used to train the classifier • Test set: used to estimate the error rate of the trained classifier • A typical application the holdout method is determining a stoppingpoint for the back propagation error

  10. The holdout method • The holdout method has two basic drawbacks • In problems where we have a sparse dataset we may not be able to afford the“luxury” of setting aside a portion of the dataset for testing • Since it is a single train-and-test experiment, the holdout estimate of error ratewill be misleading if we happen to get an “unfortunate” split • The limitations of the holdout can be overcome with a family ofresamplingmethods at the expense of more computations • Cross Validation • Random Subsampling • K-Fold Cross-Validation • Leave-one-out Cross-Validation • Bootstrap Source: Ricardo Gutierrez-Osuna

  11. Random subsampling • Random subsampling performs K data splits of the entire dataset • Each data split randomly selects a (fixed) number of examples without replacement • For each data split we retrain the classifier from scratch with the training examples and then estimate 𝐸𝑖 with the test examples • The true error estimate is obtained as the average of the separate estimates 𝐸𝑖 • This estimate is significantly better than the holdout estimate Source: Ricardo Gutierrez-Osuna

  12. K-Fold Cross-Validation • Divide the data set into K partitions (K-fold) • For each of K experiments, use K-1 folds for training and the remaining one for testing (K=4 in the following example) • The advantage of K-Fold Cross validation is that all the examples in thedataset are eventually used for both training and testing • As before, the true error is estimated as the average error rate

  13. Leave-one-out Cross Validation • Leave-one-out is the degenerate case of K-Fold CrossValidation, where K is chosen as the total number of examples • For a dataset with N examples, perform N experiments • For each experiment use N-1 examples for training and the remaining example for testing • As usual, the true error is estimated as the average error rate on test examples

  14. How many folds are needed? • With a large number of folds • +The bias of the true error rate estimator will be small (the estimator will be very accurate) • –The variance of the true error rate estimator will be large • –The computational time will be very large as well (many experiments) • With a small number of folds • +The number of experiments and, therefore, computation time are reduced • +The variance of the estimator will be small • –The bias of the estimator will be large (conservative or larger than the true error rate) • In practice, the choice for K depends on the size of the dataset • –For large datasets, even 3-fold cross validation will be quite accurate • –For very sparse datasets, we may have to use leave-one-out in order to train on as many examples as possible • A common choice for is K=10

  15. The bootstraping • The bootstrap is a resampling technique with replacement • From a dataset with 𝑁 examples • Randomly select (with replacement) 𝑁 examples and use this set for training • The remaining examples that were not selected for training are used for testing • This value is likely to change from fold to fold • Repeat this process for a specified number of folds (𝐾) • As before, the true error is estimated as the average error rate on test data

  16. The bootstrap • By generating new training sets of size N from datasetby random sampling with replacement, the probabilitythat we do not pick an instance after N draws is • that is, only 36.8% of instances are new! • Compared to basic CV, the bootstrap increases the variance that can occur in each fold [Efron and Tibshirani, 1993] • This is a desirable property since it is a more realistic simulation of the real-life experiment from which our dataset was obtained • Consider a classification problem with 𝐶 classes, a total of 𝑁 examples and 𝑁𝑖 examples for each class 𝝎𝑖 • –The a priori probability of choosing an example from class 𝜔𝑖 is 𝑁𝑖/𝑁 • Once we choose an example from class 𝜔𝑖, if we do not replace it for the next selection, then the a priori probabilities will have changed since the probability of choosing an example from class 𝜔𝑖 will now be (𝑁𝑖−1)/𝑁 • Thus, sampling with replacement preserves the a priori probabilities of the classes throughout the random selection process • An additional benefit is that the bootstrap can provide accurate measures of BOTH the bias and variance of the true error estimate

More Related