Loading in 2 Seconds...

Support Vector Machines: Classification Algorithms and Applications

Loading in 2 Seconds...

- By
**zena** - Follow User

- 186 Views
- Uploaded on

Download Presentation
## Support Vector Machines: Classification Algorithms and Applications

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Support Vector Machines: Classification Algorithms and Applications

Olvi L. Mangasarian

Department of Mathematics -UCSD

with

G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H. Wolberg

University of Wisconsin – Madison

and

Collaborators at ExonHit – Paris

What is a Support Vector Machine?

- An optimally defined surface
- Linear or nonlinear in the input space
- Linear in a higher dimensional feature space
- Implicitly defined by a kernel function
- K(A,B) C

What are Support Vector Machines Used For?

- Classification
- Regression & Data Fitting
- Supervised & Unsupervised Learning

Principal Topics

- Proximal support vector machine classification
- Classify by proximity to planes instead of halfspaces
- Massive incremental classification
- Classify by retiring old data & adding new data
- Knowledge-based classification
- Incorporate expert knowledge into a classifier
- Fast Newton method classifier
- Finitely terminating fast algorithm for classification
- RSVM: Reduced Support Vector Machines
- Kernel size reduction (up to 99%) by random projection
- Breast cancer prognosis & chemotherapy
- Classify patients based on distinct survival curves
- Isolate a class of patients that may benefit from chemotherapy

Principal Topics

- Proximal support vector machine classification

Given m points in n dimensional space

- Represented by an m-by-n matrix A

- Membership of each in class +1 or –1 specified by:

- An m-by-m diagonal matrix D with +1 & -1 entries

- Separate by two bounding planes,

- More succinctly:

where e is a vector of ones.

Standard Support Vector MachineAlgebra of 2-Category Linearly Separable CaseSolve the quadratic program for some

:

min

(QP)

,

s. t.

where

, denotes

or

membership.

- Marginis maximized by minimizing

(QP)

s. t.

Solving for in terms of and gives:

min

Proximal SVM Formulation (PSVM)Standard SVM formulation:

This simple, but critical modification, changes the nature

of the optimization problem tremendously!!

(Regularized Least Squares or Ridge Regression)

Advantages of New Formulation

- Objective function remains strongly convex.
- An explicit exact solution can be written in terms of the problem data.
- PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.
- Exact leave-one-out-correctness can be obtained in terms of problem data.

min

Linear PSVM- Setting the gradient equal to zero, gives a nonsingular system of linear equations.
- Solution of the system gives the desired PSVM classifier.

- The linear system to solve depends on:

which is of size

is usually much smaller than

Linear PSVM SolutionLinear & Nonlinear PSVM MATLAB Code

function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification

% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu);

[m,n]=size(A);e=ones(m,1);H=[A -e];

v=(d’*H)’ %v=H’*D*e;

r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v

w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Numerical experimentsOne-Billion Two-Class Dataset

- Synthetic dataset consisting of 1 billion points in 10- dimensional input space
- Generated by NDC (Normally Distributed Clustered) dataset generator
- Dataset divided into 500 blocks of 2 million points each.
- Solution obtained in less than 2 hours and 26 minutes on a 400Mhz machine
- About 30% of the time was spent reading data from disk.
- Testing set Correctness 90.79%

Principal Topics

- Knowledge-based classification (NIPS*2002)

Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :

- We therefore have the implication:

- This implication is equivalent to a set of constraints that can be imposed on the classification problem.

Adding one set of constraints for each knowledge set to the 1-norm SVM LP, we have:

Knowledge-Based SVM ClassificationNumerical TestingThe Promoter Recognition Dataset

- Promoter: Short DNA sequence that precedes a gene sequence.
- A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T} .
- Important to distinguish between promoters and nonpromoters
- This distinction identifies starting locations of genes in long uncharacterizedDNA sequences.

The Promoter Recognition DatasetNumerical Representation

- Simple “1 of N” mapping scheme for converting nominal attributes into a real valued representation:

- Not most economical representation, but commonly

used.

The Promoter Recognition DatasetNumerical Representation

- Feature space mapped from 57-dimensional categorical space to a real valued 57 x 4=228 dimensional space.

57 categorical values

57 x 4 =228

real values

Promoter Recognition Dataset Prior Knowledge Rules

- Prior knowledge consist of the following 64 rules:

where denotes position of a nucleotide, with respect to a meaningful reference point starting at position and ending at position

Then:

Promoter Recognition Dataset Sample RulesThe Promoter Recognition DatasetComparative Algorithms

- KBANN Knowledge-based artificial neural network [Shavlik et al]
- BP: Standard back propagation for neural networks [Rumelhart et al]
- O’Neill’s Method Empirical method suggested by biologist O’Neill [O’Neill]
- NN: Nearest neighbor with k=3 [Cost et al]
- ID3: Quinlan’s decision tree builder[Quinlan]
- SVM1: Standard 1-norm SVM [Bradley et al]

The Promoter Recognition DatasetComparative Test Results

Note: Only KSVM and SVM1 utilize a simple linear classifier

Wisconsin Breast Cancer Prognosis Dataset Description of the data

- 110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred
- 32 numerical features
- The domain theory: two simple rules used by doctors:

Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results

- Doctor’s rules applicable to only 32 out of 110 patients.
- Only 22 of 32 patients are classified correctly by this rule (20% Correctness).
- KSVM linear classifier applicable to allpatients with correctness of 66.4%.
- Correctness comparable to best available results using conventional SVMs.
- KSVM can get classifiers based on knowledge without using any data.

Principal Topics

- Fast Newton method classifier

Fast Newton Algorithm for Classification

Standard quadratic programming (QP) formulation of SVM:

Once, but not twice differentiable. However Generlized Hessian exists!

Newton Algorithm

- Newton algorithm terminates in a finite number of steps

- Termination at global minimum

- Error rate decreases linearly

- Can generate complex nonlinear classifiers

- By using nonlinear kernels: K(x,y)

Principal Topics

- RSVM:Reduced Support Vector Machines

- The nonlinear kernel

- Runs out of memory while storing

kernel matrix

numbers

- Long CPU time to compute

- Computational complexity depends on

- Complexity of nonlinear SSVM

- Separating surface depends on almost entire dataset

- Need to store the entire dataset after solving the problem

- Choose a small random sample

- The small random sample

is a representative sample

of the entire dataset

is1% to 10%of the rows of

- Typically

by

- Replace

with

corresponding

in nonlinear SSVM

numbers for

- Only need to compute and store

the rectangular kernel

- Computational complexity reduces to

- The nonlinear separator only depends on

gives lousy results!

Using

Overcoming Computational & Storage DifficultiesUse a Rectangular Kernel(i) Choose a random subset matrix

entire data matrix

(ii) Solvethe following problem by the Newton

method with corresponding

:

min

(iii) The separating surface is defined by the optimal

in step(ii):

solution

Reduced Support Vector Machine AlgorithmNonlinear Separating Surface:is a representative sample of the entire dataset

- Need not be a subset of

- A good selectionof

may generate a classifier using

very small

:

- Possible ways to choose

random rows from the entire dataset

- Choose

such that the distance between its rows

- Choose

exceeds a certain tolerance

and

as

- Use k cluster centers of

A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points inSeparate 486 Asterisks from514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

Principal Topics

- Breast cancer prognosis & chemotherapy

Breast Cancer Prognosis & ChemotherapyGood, Intermediate & Poor Patient Groupings(6 Input Features : 5 Cytological, 1 Histological)(Grouping: Utilizes 2 Histological Features &Chemotherapy)

Kaplan-Meier Survival Curvesfor Good, Intermediate & Poor Patients82.7% Classifier Correctness via 3 SVMs

Kaplan-Meier Survival Curves for Intermediate Group Note Reversed Role of Chemotherapy

Conclusion

- New methods for classification
- All based on rigorous mathematical foundation
- Fast computational algorithms capable of classifying massive datasets
- Classifiers based on both abstract prior knowledge as well as conventional datasets
- Identification of breast cancer patients that can benefit from chemotherapy

Future Work

- Extend proposed methods to broader optimization problems
- Linear & quadratic programming
- Preliminary results beat state-of-the-art software
- Incorporate abstract concepts into optimization problems as constraints
- Develop fast online algorithms for intrusion and fraud detection
- Classify the effectiveness of new drug cocktails in combating various forms of cancer
- Encouraging preliminary results for breast cancer

Breast Cancer Treatment ResponseJoint with ExonHit ( French BioTech)

- 35 patients treated by a drug cocktail
- 9 partial responders; 26 nonresponders
- 25 gene expression measurements made on each patient
- 1-Norm SVM classifier selected: 12 out of 25 genes
- Combinatorially selected 6 genes out of 12
- Separating plane obtained:

2.7915 T11 + 0.13436 S24 -1.0269 U23 -2.8108 Z23 -1.8668 A19 -1.5177 X05 +2899.1 = 0.

- Leave-one-out-error:1 out of 35 (97.1% correctness)

I1

E2

I2

E3

I3

E4

I4

E5

DNA

Transcription

E1

I1

E2

I2

E3

I3

E4

I4

E5

pre-mRNA

(m=messenger)

5'

3'

Alternative RNA splicing

E1

E2

E4

E5

E1

E2

E3

E4

E5

mRNA

(A)n

(A)n

Translation

DATAS

Proteins

NH2

COOH

NH2

COOH

E3

Chemo-Sensitive

Chemo-Resistant

DATAS: Differential Analysis of Transcripts with Alternative Splicing

Detection of Alternative RNA Isoforms via DATAS

(Levels of mRNA that Correlate with Senitivity to Chemotherapy)

Talk Available

www.cs.wisc.edu/~olvi

Download Presentation

Connecting to Server..