230 likes | 351 Views
Main Issues. What Data to use?How to measure performance?How do we trust our measured results?. What data to use?. Typically we learn our model on some data set, known as training setSubsequently, we evaluate, model on dataset known as test setMain issue: given available data, how to generate tr
E N D
1. Measuring Classification Performance © D. Pokrajac 2003
2. Main Issues What Data to use?
How to measure performance?
How do we trust our measured results?
3. What data to use? Typically we learn our model on some data set, known as training set
Subsequently, we evaluate, model on dataset known as test set
Main issue: given available data, how to generate training and test set
4. Major technique to choose test set Re-substitution
Hold-out
Leave-one-out
K-fold cross validation
Bootstrap
5. Re-substitution Use all available data for training set
Test set=training set=available dataset
Problems:
If available data set is small our results on unseen data will be poor
Estimation of actual performance can be poor
6. Hold-Out Split available dataset into two halves Use one half for training set and the other for test set Problem: Poor use of data (half data discarded from training!)