Tutorial 2

Tutorial 2 LIU Tengfei 2/19/2009

Contents • Introduction • TP, FP, ROC • Precision, recall • Confusion matrix • Other performance measures • Resource

Classifier output of Weka(1)

Classifier output of Weka(2)

TP rate, FP rate(1) Consider a diagnostic test • A false positive(FP): the person tests positive, but actually does not have the disease. • A false negative(FN): the person tests negative, suggesting he is healthy, but he actually does have the disease. Note: True positive/negative are similar

TP rate, FP rate(2) • TP rate = true positive rate FP rate = false positive rate

TP rate, FP rate(3) Definition: TP rate = TP/(TP+FN) FP rate = FP/(FP+TN) From the actual value point of view

ROC curve(1) • ROC = receiver operating characteristic Y:TP rate X:FP rate

ROC curve(2) Which method (A or B) is better? compute ROC area: area under ROC curve

Precision, Recall(1) • Precision = TP/(TP + FP) Recall = TP/(TP + FN) Precision: is the probability that a retrieved document is relevant. Recall: is the probability that a relevant document is retrieved in a search.

Precision, Recall(2) • F-measure = 2*(precision*recall)/(precision + recall) • Precision, recall and F-measure come from information retrieval domain.

Confusion matrix • Example: using J48 to process iris.arff

Other performance measures *p are predicted values and a are actual values

Resource 1. Wiki page for TP, FP, ROC 2. Wiki page for Precision and Recall 3. Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Chapter 5

Thank you !

Tutorial 2

Tutorial 2

Presentation Transcript

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial #2

Tutorial 2

Tutorial 2

Tutorial #2

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial #2

Tutorial 2

Tutorial 2

Tutorial 2

TUTORIAL 2

Tutorial 2

Tutorial 2

Tutorial 2

Tutorial 2