Considering Cost Asymmetry in Learning Classifiers
Download
1 / 20

Considering Cost Asymmetry in Learning Classifiers - PowerPoint PPT Presentation


  • 237 Views
  • Uploaded on

Considering Cost Asymmetry in Learning Classifiers . by Bach, Heckerman and Horvitz. Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007. Outline. Introduction SVM with Asymmetric Cost SVM Regularization Path ( Hastie et al., 2005 ) Path with Cost Asymmetry

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Considering Cost Asymmetry in Learning Classifiers ' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Considering Cost Asymmetry in Learning Classifiers

by Bach, Heckerman and Horvitz

Presented by Chunping Wang

Machine Learning Group, Duke University

May 21, 2007


Outline l.jpg
Outline

  • Introduction

  • SVM with Asymmetric Cost

  • SVM Regularization Path (Hastie et al., 2005)

  • Path with Cost Asymmetry

  • Results

  • Conclusions


Introduction 1 l.jpg
Introduction (1)

Binary classification

real-valued predictors

binary response

A classifier could be defined as

based on a linear decision function

Parameters


Introduction 2 l.jpg
Introduction (2)

  • Two types of misclassification:

  • false negative: cost

  • false positive: cost

Expected cost:

In terms of 0-1 loss function

Real loss function but Non-convex Non-differentiable


Introduction 3 l.jpg
Introduction (3)

Convex loss functions – surrogates for the 0-1 loss function

(for training purpose)


Introduction 4 l.jpg
Introduction (4)

Empirical cost given n labeled data points

Objective function

asymmetry

regularization

Since convex surrogates of the 0-1 loss function are used for training, the cost asymmetries for training and testing are mismatched.

Motivation: efficiently look at many training asymmetries even if the testing asymmetry is given.


Svm with asymmetric cost 1 l.jpg
SVM with Asymmetric Cost (1)

hinge loss

SVM with asymmetric cost

where


Svm with asymmetric cost 2 l.jpg
SVM with Asymmetric Cost (2)

The Lagrangian with dual variables

Karush-Kuhn-Tucker (KKT) conditions


Svm with asymmetric cost 3 l.jpg
SVM with Asymmetric Cost (3)

The dual problem

where

A quadratic optimization problem given a cost structure

Computation will be intractable for the whole space

Following the SVM regularization path algorithm (Hastie et al., 2005), the authors deal with (1)-(3) and KKT conditions instead of the dual problem.


Svm regularization path 1 l.jpg
SVM Regularization Path (1)

  • Define active sets of data points:

  • Margin:

  • Left of margin:

  • Right of margin:

KKT conditions

SVM regularization path

The cost is symmetric and thus searching is along the axis.


Svm regularization path 2 l.jpg
SVM Regularization Path (2)

Initialization ( )

Consider sufficiently large (C is very small), all the points are in L

with

Decrease

Remain

One or more positive and negative examples hit the margin simultaneously


Svm regularization path 3 l.jpg
SVM Regularization Path (3)

Initialization ( )

Define

The critical condition for first two points hitting the margin

For , this initial condition keeps the same except the definition of .


Svm regularization path 4 l.jpg
SVM Regularization Path (4)

  • The path: decrease , changes only for except that one of the following events happens

  • A point from L or R has entered M;

  • A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in


Svm regularization path 414 l.jpg
SVM Regularization Path (4)

  • The path: decrease , changes only for except that one of the following events happens

  • A point from L or R has entered M;

  • A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in .


Svm regularization path 5 l.jpg
SVM Regularization Path (5)

  • Update regularization

  • Update active sets and solutions

  • Stopping condition

  • In the separable case, we terminate when L become empty;

  • In the non-separable case, we terminate when

for all the possible events


Path with cost asymmetry 1 l.jpg
Path withCost Asymmetry (1)

Exploration in the 2-d space

Path initialization: start at situations when all points are in L

Follow the updating procedure in the 1-d case along the line

Regularization is changing and the cost asymmetry is fixed.

Among all the classifiers, find the best one , given user’s cost function

Paths starting from


Path with cost asymmetry 2 l.jpg
Path withCost Asymmetry (2)

Produce ROC

Collecting R lines in the direction of , we can build three ROC curves


Results 1 l.jpg
Results (1)

  • For 1000 testing asymmetries , three methods are compared:

  • “one” – take as training cost asymmetry;

  • “int” – vary the intercept of “one” and build an ROC, then select the optimal classifier;

  • “all” – select the optimal classifier from the ROC obtained by varying both the training asymmetry and the intercept.

  • Use a nested cross-validation:

  • The outer cross-validation: produce overall accuracy estimates for the classifier;

  • The inner cross-validation: select optimal classifier parameters (training asymmetry and/or intercept).



Conclusions l.jpg
Conclusions

  • An efficient algorithm is presented to build ROC curves by varying the training cost asymmetries for SVMs.

  • The main contribution is generalizing the SVM regularization path (Hastie et al., 2005) from a 1-d axis to a 2-d plane.

  • Because of the usage of a convex surrogate, using the testing asymmetry for training leads to non-optimal classifier.

  • Results show advantages of considering more training asymmetries.


ad