support vector machines classification algorithms and applications l.
Skip this Video
Loading SlideShow in 5 Seconds..
Support Vector Machines: Classification Algorithms and Applications PowerPoint Presentation
Download Presentation
Support Vector Machines: Classification Algorithms and Applications

Loading in 2 Seconds...

play fullscreen
1 / 52

Support Vector Machines: Classification Algorithms and Applications - PowerPoint PPT Presentation

  • Uploaded on

Support Vector Machines: Classification Algorithms and Applications. Olvi L. Mangasarian Department of Mathematics -UCSD with G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H. Wolberg University of Wisconsin – Madison and Collaborators at ExonHit – Paris. What is a Support Vector Machine?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Support Vector Machines: Classification Algorithms and Applications' - zena

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
support vector machines classification algorithms and applications

Support Vector Machines: Classification Algorithms and Applications

Olvi L. Mangasarian

Department of Mathematics -UCSD


G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H. Wolberg

University of Wisconsin – Madison


Collaborators at ExonHit – Paris

what is a support vector machine
What is a Support Vector Machine?
  • An optimally defined surface
  • Linear or nonlinear in the input space
  • Linear in a higher dimensional feature space
  • Implicitly defined by a kernel function
  • K(A,B)  C
what are support vector machines used for
What are Support Vector Machines Used For?
  • Classification
  • Regression & Data Fitting
  • Supervised & Unsupervised Learning
principal topics
Principal Topics
  • Proximal support vector machine classification
    • Classify by proximity to planes instead of halfspaces
  • Massive incremental classification
    • Classify by retiring old data & adding new data
  • Knowledge-based classification
    • Incorporate expert knowledge into a classifier
  • Fast Newton method classifier
    • Finitely terminating fast algorithm for classification
  • RSVM: Reduced Support Vector Machines
    • Kernel size reduction (up to 99%) by random projection
  • Breast cancer prognosis & chemotherapy
    • Classify patients based on distinct survival curves
    • Isolate a class of patients that may benefit from chemotherapy
principal topics5
Principal Topics
  • Proximal support vector machine classification
standard support vector machine algebra of 2 category linearly separable case

Given m points in n dimensional space

  • Represented by an m-by-n matrix A
  • Membership of each in class +1 or –1 specified by:
  • An m-by-m diagonal matrix D with +1 & -1 entries
  • Separate by two bounding planes,
  • More succinctly:

where e is a vector of ones.

Standard Support Vector MachineAlgebra of 2-Category Linearly Separable Case
standard support vector machine formulation

Solve the quadratic program for some





s. t.


, denotes



  • Marginis maximized by minimizing
Standard Support Vector Machine Formulation
proximal svm formulation psvm



s. t.

Solving for in terms of and gives:


Proximal SVM Formulation (PSVM)

Standard SVM formulation:

This simple, but critical modification, changes the nature

of the optimization problem tremendously!!

(Regularized Least Squares or Ridge Regression)

advantages of new formulation
Advantages of New Formulation
  • Objective function remains strongly convex.
  • An explicit exact solution can be written in terms of the problem data.
  • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.
  • Exact leave-one-out-correctness can be obtained in terms of problem data.
linear psvm

We want to solve:


Linear PSVM
  • Setting the gradient equal to zero, gives a nonsingular system of linear equations.
  • Solution of the system gives the desired PSVM classifier.
linear psvm solution


  • The linear system to solve depends on:

which is of size

is usually much smaller than

Linear PSVM Solution
linear nonlinear psvm matlab code
Linear & Nonlinear PSVM MATLAB Code

function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification

% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu);

[m,n]=size(A);e=ones(m,1);H=[A -e];

v=(d’*H)’ %v=H’*D*e;

r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v

w=r(1:n);gamma=r(n+1); % getting w,gamma from r

numerical experiments one billion two class dataset
Numerical experimentsOne-Billion Two-Class Dataset
  • Synthetic dataset consisting of 1 billion points in 10- dimensional input space
  • Generated by NDC (Normally Distributed Clustered) dataset generator
  • Dataset divided into 500 blocks of 2 million points each.
  • Solution obtained in less than 2 hours and 26 minutes on a 400Mhz machine
  • About 30% of the time was spent reading data from disk.
  • Testing set Correctness 90.79%
principal topics16
Principal Topics
  • Knowledge-based classification (NIPS*2002)
incoporating knowledge sets into an svm classifier

Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :

  • We therefore have the implication:
Incoporating Knowledge Sets Into an SVM Classifier
  • This implication is equivalent to a set of constraints that can be imposed on the classification problem.
numerical testing the promoter recognition dataset
Numerical TestingThe Promoter Recognition Dataset
  • Promoter: Short DNA sequence that precedes a gene sequence.
  • A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T} .
  • Important to distinguish between promoters and nonpromoters
  • This distinction identifies starting locations of genes in long uncharacterizedDNA sequences.
the promoter recognition dataset numerical representation
The Promoter Recognition DatasetNumerical Representation
  • Simple “1 of N” mapping scheme for converting nominal attributes into a real valued representation:
  • Not most economical representation, but commonly


the promoter recognition dataset numerical representation24
The Promoter Recognition DatasetNumerical Representation
  • Feature space mapped from 57-dimensional categorical space to a real valued 57 x 4=228 dimensional space.

57 categorical values

57 x 4 =228

real values

promoter recognition dataset prior knowledge rules
Promoter Recognition Dataset Prior Knowledge Rules
  • Prior knowledge consist of the following 64 rules:
promoter recognition dataset sample rules

where denotes position of a nucleotide, with respect to a meaningful reference point starting at position and ending at position


Promoter Recognition Dataset Sample Rules
the promoter recognition dataset comparative algorithms
The Promoter Recognition DatasetComparative Algorithms
  • KBANN Knowledge-based artificial neural network [Shavlik et al]
  • BP: Standard back propagation for neural networks [Rumelhart et al]
  • O’Neill’s Method Empirical method suggested by biologist O’Neill [O’Neill]
  • NN: Nearest neighbor with k=3 [Cost et al]
  • ID3: Quinlan’s decision tree builder[Quinlan]
  • SVM1: Standard 1-norm SVM [Bradley et al]
the promoter recognition dataset comparative test results
The Promoter Recognition DatasetComparative Test Results

Note: Only KSVM and SVM1 utilize a simple linear classifier

wisconsin breast cancer prognosis dataset description of the data
Wisconsin Breast Cancer Prognosis Dataset Description of the data
  • 110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred
  • 32 numerical features
  • The domain theory: two simple rules used by doctors:
wisconsin breast cancer prognosis dataset numerical testing results
Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results
  • Doctor’s rules applicable to only 32 out of 110 patients.
  • Only 22 of 32 patients are classified correctly by this rule (20% Correctness).
  • KSVM linear classifier applicable to allpatients with correctness of 66.4%.
  • Correctness comparable to best available results using conventional SVMs.
  • KSVM can get classifiers based on knowledge without using any data.
principal topics31
Principal Topics
  • Fast Newton method classifier
fast newton algorithm for classification
Fast Newton Algorithm for Classification

Standard quadratic programming (QP) formulation of SVM:

Once, but not twice differentiable. However Generlized Hessian exists!

newton algorithm
Newton Algorithm
  • Newton algorithm terminates in a finite number of steps
  • Termination at global minimum
  • Error rate decreases linearly
  • Can generate complex nonlinear classifiers
  • By using nonlinear kernels: K(x,y)
principal topics35
Principal Topics
  • RSVM:Reduced Support Vector Machines
difficulties with nonlinear svm for large problems

isfully dense

  • The nonlinear kernel
  • Runs out of memory while storing

kernel matrix


  • Long CPU time to compute
  • Computational complexity depends on
  • Complexity of nonlinear SSVM
Difficulties with Nonlinear SVM for Large Problems
  • Separating surface depends on almost entire dataset
  • Need to store the entire dataset after solving the problem
overcoming computational storage difficulties use a rectangular kernel


  • Choose a small random sample
  • The small random sample

is a representative sample

of the entire dataset

is1% to 10%of the rows of

  • Typically


  • Replace



in nonlinear SSVM

numbers for

  • Only need to compute and store

the rectangular kernel

  • Computational complexity reduces to
  • The nonlinear separator only depends on

gives lousy results!


Overcoming Computational & Storage DifficultiesUse a Rectangular Kernel
reduced support vector machine algorithm nonlinear separating surface


(i) Choose a random subset matrix

entire data matrix

(ii) Solvethe following problem by the Newton

method with corresponding



(iii) The separating surface is defined by the optimal

in step(ii):


Reduced Support Vector Machine AlgorithmNonlinear Separating Surface:
how to choose in rsvm

is a representative sample of the entire dataset

  • Need not be a subset of
  • A good selectionof

may generate a classifier using

very small


  • Possible ways to choose

random rows from the entire dataset

  • Choose

such that the distance between its rows

  • Choose

exceeds a certain tolerance



  • Use k cluster centers of
How to Choose in RSVM?
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points inSeparate 486 Asterisks from514 Dots
principal topics43
Principal Topics
  • Breast cancer prognosis & chemotherapy

Breast Cancer Prognosis & ChemotherapyGood, Intermediate & Poor Patient Groupings(6 Input Features : 5 Cytological, 1 Histological)(Grouping: Utilizes 2 Histological Features &Chemotherapy)

Kaplan-Meier Survival Curvesfor Good, Intermediate & Poor Patients82.7% Classifier Correctness via 3 SVMs
  • New methods for classification
  • All based on rigorous mathematical foundation
  • Fast computational algorithms capable of classifying massive datasets
  • Classifiers based on both abstract prior knowledge as well as conventional datasets
  • Identification of breast cancer patients that can benefit from chemotherapy
future work
Future Work
  • Extend proposed methods to broader optimization problems
    • Linear & quadratic programming
    • Preliminary results beat state-of-the-art software
  • Incorporate abstract concepts into optimization problems as constraints
  • Develop fast online algorithms for intrusion and fraud detection
  • Classify the effectiveness of new drug cocktails in combating various forms of cancer
    • Encouraging preliminary results for breast cancer
breast cancer treatment response joint with exonhit french biotech
Breast Cancer Treatment ResponseJoint with ExonHit ( French BioTech)
  • 35 patients treated by a drug cocktail
  • 9 partial responders; 26 nonresponders
  • 25 gene expression measurements made on each patient
  • 1-Norm SVM classifier selected: 12 out of 25 genes
  • Combinatorially selected 6 genes out of 12
  • Separating plane obtained:

2.7915 T11 + 0.13436 S24 -1.0269 U23 -2.8108 Z23 -1.8668 A19 -1.5177 X05 +2899.1 = 0.

  • Leave-one-out-error:1 out of 35 (97.1% correctness)

























Alternative RNA splicing























DATAS: Differential Analysis of Transcripts with Alternative Splicing

Detection of Alternative RNA Isoforms via DATAS

(Levels of mRNA that Correlate with Senitivity to Chemotherapy)

talk available
Talk Available