Considering Cost Asymmetry in Learning Classifiers

1 / 20

# Considering Cost Asymmetry in Learning Classifiers - PowerPoint PPT Presentation

Considering Cost Asymmetry in Learning Classifiers . by Bach, Heckerman and Horvitz. Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007. Outline. Introduction SVM with Asymmetric Cost SVM Regularization Path ( Hastie et al., 2005 ) Path with Cost Asymmetry

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Considering Cost Asymmetry in Learning Classifiers' - andrew

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Considering Cost Asymmetry in Learning Classifiers

by Bach, Heckerman and Horvitz

Presented by Chunping Wang

Machine Learning Group, Duke University

May 21, 2007

Outline
• Introduction
• SVM with Asymmetric Cost
• SVM Regularization Path (Hastie et al., 2005)
• Path with Cost Asymmetry
• Results
• Conclusions
Introduction (1)

Binary classification

real-valued predictors

binary response

A classifier could be defined as

based on a linear decision function

Parameters

Introduction (2)
• Two types of misclassification:
• false negative: cost
• false positive: cost

Expected cost:

In terms of 0-1 loss function

Real loss function but Non-convex Non-differentiable

Introduction (3)

Convex loss functions – surrogates for the 0-1 loss function

(for training purpose)

Introduction (4)

Empirical cost given n labeled data points

Objective function

asymmetry

regularization

Since convex surrogates of the 0-1 loss function are used for training, the cost asymmetries for training and testing are mismatched.

Motivation: efficiently look at many training asymmetries even if the testing asymmetry is given.

SVM with Asymmetric Cost (1)

hinge loss

SVM with asymmetric cost

where

SVM with Asymmetric Cost (2)

The Lagrangian with dual variables

Karush-Kuhn-Tucker (KKT) conditions

SVM with Asymmetric Cost (3)

The dual problem

where

A quadratic optimization problem given a cost structure

Computation will be intractable for the whole space

Following the SVM regularization path algorithm (Hastie et al., 2005), the authors deal with (1)-(3) and KKT conditions instead of the dual problem.

SVM Regularization Path (1)
• Define active sets of data points:
• Margin:
• Left of margin:
• Right of margin:

KKT conditions

SVM regularization path

The cost is symmetric and thus searching is along the axis.

SVM Regularization Path (2)

Initialization ( )

Consider sufficiently large (C is very small), all the points are in L

with

Decrease

Remain

One or more positive and negative examples hit the margin simultaneously

SVM Regularization Path (3)

Initialization ( )

Define

The critical condition for first two points hitting the margin

For , this initial condition keeps the same except the definition of .

SVM Regularization Path (4)
• The path: decrease , changes only for except that one of the following events happens
• A point from L or R has entered M;
• A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in

SVM Regularization Path (4)
• The path: decrease , changes only for except that one of the following events happens
• A point from L or R has entered M;
• A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in .

SVM Regularization Path (5)
• Update regularization
• Update active sets and solutions
• Stopping condition
• In the separable case, we terminate when L become empty;
• In the non-separable case, we terminate when

for all the possible events

Path withCost Asymmetry (1)

Exploration in the 2-d space

Path initialization: start at situations when all points are in L

Follow the updating procedure in the 1-d case along the line

Regularization is changing and the cost asymmetry is fixed.

Among all the classifiers, find the best one , given user’s cost function

Paths starting from

Path withCost Asymmetry (2)

Produce ROC

Collecting R lines in the direction of , we can build three ROC curves

Results (1)
• For 1000 testing asymmetries , three methods are compared:
• “one” – take as training cost asymmetry;
• “int” – vary the intercept of “one” and build an ROC, then select the optimal classifier;
• “all” – select the optimal classifier from the ROC obtained by varying both the training asymmetry and the intercept.
• Use a nested cross-validation:
• The outer cross-validation: produce overall accuracy estimates for the classifier;
• The inner cross-validation: select optimal classifier parameters (training asymmetry and/or intercept).
Conclusions
• An efficient algorithm is presented to build ROC curves by varying the training cost asymmetries for SVMs.
• The main contribution is generalizing the SVM regularization path (Hastie et al., 2005) from a 1-d axis to a 2-d plane.
• Because of the usage of a convex surrogate, using the testing asymmetry for training leads to non-optimal classifier.
• Results show advantages of considering more training asymmetries.