slide1
Download
Skip this Video
Download Presentation
Considering Cost Asymmetry in Learning Classifiers

Loading in 2 Seconds...

play fullscreen
1 / 20

Considering Cost Asymmetry in Learning Classifiers - PowerPoint PPT Presentation


  • 237 Views
  • Uploaded on

Considering Cost Asymmetry in Learning Classifiers . by Bach, Heckerman and Horvitz. Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007. Outline. Introduction SVM with Asymmetric Cost SVM Regularization Path ( Hastie et al., 2005 ) Path with Cost Asymmetry

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Considering Cost Asymmetry in Learning Classifiers' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Considering Cost Asymmetry in Learning Classifiers

by Bach, Heckerman and Horvitz

Presented by Chunping Wang

Machine Learning Group, Duke University

May 21, 2007

outline
Outline
  • Introduction
  • SVM with Asymmetric Cost
  • SVM Regularization Path (Hastie et al., 2005)
  • Path with Cost Asymmetry
  • Results
  • Conclusions
introduction 1
Introduction (1)

Binary classification

real-valued predictors

binary response

A classifier could be defined as

based on a linear decision function

Parameters

introduction 2
Introduction (2)
  • Two types of misclassification:
  • false negative: cost
  • false positive: cost

Expected cost:

In terms of 0-1 loss function

Real loss function but Non-convex Non-differentiable

introduction 3
Introduction (3)

Convex loss functions – surrogates for the 0-1 loss function

(for training purpose)

introduction 4
Introduction (4)

Empirical cost given n labeled data points

Objective function

asymmetry

regularization

Since convex surrogates of the 0-1 loss function are used for training, the cost asymmetries for training and testing are mismatched.

Motivation: efficiently look at many training asymmetries even if the testing asymmetry is given.

svm with asymmetric cost 1
SVM with Asymmetric Cost (1)

hinge loss

SVM with asymmetric cost

where

svm with asymmetric cost 2
SVM with Asymmetric Cost (2)

The Lagrangian with dual variables

Karush-Kuhn-Tucker (KKT) conditions

svm with asymmetric cost 3
SVM with Asymmetric Cost (3)

The dual problem

where

A quadratic optimization problem given a cost structure

Computation will be intractable for the whole space

Following the SVM regularization path algorithm (Hastie et al., 2005), the authors deal with (1)-(3) and KKT conditions instead of the dual problem.

svm regularization path 1
SVM Regularization Path (1)
  • Define active sets of data points:
  • Margin:
  • Left of margin:
  • Right of margin:

KKT conditions

SVM regularization path

The cost is symmetric and thus searching is along the axis.

svm regularization path 2
SVM Regularization Path (2)

Initialization ( )

Consider sufficiently large (C is very small), all the points are in L

with

Decrease

Remain

One or more positive and negative examples hit the margin simultaneously

svm regularization path 3
SVM Regularization Path (3)

Initialization ( )

Define

The critical condition for first two points hitting the margin

For , this initial condition keeps the same except the definition of .

svm regularization path 4
SVM Regularization Path (4)
  • The path: decrease , changes only for except that one of the following events happens
  • A point from L or R has entered M;
  • A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in

svm regularization path 414
SVM Regularization Path (4)
  • The path: decrease , changes only for except that one of the following events happens
  • A point from L or R has entered M;
  • A point in M has left the set to join either R or L

consider only the points on the margin

where is some function of ,

Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in .

svm regularization path 5
SVM Regularization Path (5)
  • Update regularization
  • Update active sets and solutions
  • Stopping condition
  • In the separable case, we terminate when L become empty;
  • In the non-separable case, we terminate when

for all the possible events

path with cost asymmetry 1
Path withCost Asymmetry (1)

Exploration in the 2-d space

Path initialization: start at situations when all points are in L

Follow the updating procedure in the 1-d case along the line

Regularization is changing and the cost asymmetry is fixed.

Among all the classifiers, find the best one , given user’s cost function

Paths starting from

path with cost asymmetry 2
Path withCost Asymmetry (2)

Produce ROC

Collecting R lines in the direction of , we can build three ROC curves

results 1
Results (1)
  • For 1000 testing asymmetries , three methods are compared:
  • “one” – take as training cost asymmetry;
  • “int” – vary the intercept of “one” and build an ROC, then select the optimal classifier;
  • “all” – select the optimal classifier from the ROC obtained by varying both the training asymmetry and the intercept.
  • Use a nested cross-validation:
  • The outer cross-validation: produce overall accuracy estimates for the classifier;
  • The inner cross-validation: select optimal classifier parameters (training asymmetry and/or intercept).
conclusions
Conclusions
  • An efficient algorithm is presented to build ROC curves by varying the training cost asymmetries for SVMs.
  • The main contribution is generalizing the SVM regularization path (Hastie et al., 2005) from a 1-d axis to a 2-d plane.
  • Because of the usage of a convex surrogate, using the testing asymmetry for training leads to non-optimal classifier.
  • Results show advantages of considering more training asymmetries.
ad