Evaluation in Machine Learning. Pádraig Cunningham. Outline. Student’s t-test Test for paired data Cross Validation McNemar’s Test ROC Analysis Other Statistical Tests for Evaluation. William Sealy Gosset.
The t-statistic was introduced by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name. Gosset was a statistician for the Guinness brewery in Dublin, Ireland, and was hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness's industrial processes. Gosset published the t test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer - the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules.
TestEvaluation in Machine Learning
160Problems with ‘Hold-out’ Validation
explicitComparing Two Classifiers
MNS score for C2 v’s C1 = 1/2
MNS score for C2 v’s C1 = 1/6
For test to be applicable (n01 + n10) > 10
>3.84 required for statistical significance at 95%
Dietterich’s 5x2cv paired t-test (Dietterich, 1998)
+ flexible on choice of loss function
Demšar’s comparisons over multiple datasets (Demšar, 2006)
Counts of wins, losses and ties
This methodology could
become the standard
How you keep the score…
Salzberg, S., (1997) On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach, Data Mining and Knowledge Discovery, 1, 317–327.
Dietterich, T.G., (1998) Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10:1895–1924.
Demšar, J., (2006) Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, 7(Jan):1--30.