The disputed federalist papers svm feature selection via concave minimization
Download
1 / 22

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization - PowerPoint PPT Presentation


  • 196 Views
  • Uploaded on

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization. Glenn Fung and Olvi L. Mangasarian. CSNA 2002 June 13-16, 2002 Madison, Wisconsin. Outline of Talk. Support Vector Machines (SVM) Introduction. Standard Quadratic Programming Formulation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization' - axl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The disputed federalist papers svm feature selection via concave minimization l.jpg

TheDisputed Federalist Papers :SVM Feature Selection via Concave Minimization

Glenn Fung and Olvi L. Mangasarian

CSNA 2002

June 13-16, 2002

Madison, Wisconsin


Outline of talk l.jpg
Outline of Talk

  • Support Vector Machines (SVM) Introduction

  • Standard Quadratic Programming Formulation

  • 1-norm Linear SVMs

  • SVM Feature Selection

  • Successive Linearization Algorithm (SLA)

  • The Disputed Federalist Papers

  • Description of the Classification Problem

  • Description of Previous Work

  • Results

  • Separating Hyperplane in Three Dimensions Only

  • Classification Agrees with Previous Results


What is a support vector machine l.jpg
What is a Support Vector Machine?

  • An optimally defined surface

  • Typically nonlinear in the input space

  • Linear in a higher dimensional space

  • Implicitly defined by a kernel function


What are support vector machines used for l.jpg
What are Support Vector Machines Used For?

  • Classification

  • Regression & Data Fitting

  • Supervised & Unsupervised Learning

(Will concentrate on classification)


Geometry of the classification problem 2 category linearly separable case l.jpg
Geometry of the Classification Problem2-Category Linearly Separable Case

A+

A-


Algebra of the classification problem 2 category linearly separable case l.jpg

in class +1 or –1 specified by:

  • Membership of each

  • An m-by-m diagonal matrix D with +1 & -1 entries

  • Separate by two bounding planes,

where e is a vector of ones.

Algebra of the Classification Problem2-Category Linearly Separable Case

  • Given m points in n dimensional space

  • Represented by an m-by-n matrix A

  • More succinctly:


Support vector machines maximizing the margin between bounding planes l.jpg

Support

vectors

Support Vector MachinesMaximizing the Margin between Bounding Planes

A+

A-


Support vector machines quadratic programming formulation l.jpg

min

s.t.

where

is the weight of the training error

  • Maximize themarginby minimizing

Support Vector Machines:Quadratic Programming Formulation

  • Solve the following quadratic program:


Support vector machines linear programming formulation l.jpg

min

s.t.

min

s.t.

Support Vector Machines: Linear Programming Formulation

  • Use the 1-norm instead of the 2-norm:

  • This is equivalent to the following linear program:


Feature selection and svms l.jpg
Feature Selection and SVMs

min

s.t.

Where:

  • Use the step function to suppress components of the

    normal to the separating hyperplane:



Svm formulation with feature selection l.jpg
SVM Formulation with Feature Selection

  • For , we use the approximation of the step

    vector by the concave exponential:

  • Here is the base of natural logarithms. This leads to:

min

s.t.


Successive linearization algorithm sla for feature selection l.jpg
Successive Linearization Algorithm (SLA) for Feature Selection

  • Choose . Start with some .

    Having , determine the next iterate

    by solving the LP:

min

s.t.

  • Stop when:

  • Proposition: Algorithm terminates in a finite number

    of steps (typically 5 to 7) at a stationary point.


The federalist papers l.jpg
The Federalist Papers Selection

  • Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution.

  • Papers consisted of short essays, 900 to 3500 words in length.

  • Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.


Previous work l.jpg
Previous Work Selection

  • Mosteller and Wallace (1964)

    • Using statistical inference, determined the authorship of the 12 disputed papers.

  • Bosch and Smith (1998).

    • Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.


Description of the data l.jpg
Description of the data Selection

  • For every paper:

    • Machine readable text was created using a scanner.

    • Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.

    • Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.

  • The dataset consists of 118 papers:

    • 50 Madison papers

    • 56 Hamilton papers

    • 12 disputed papers



Sla feature selection for classifying the disputed federalist papers l.jpg

SLA Feature Selection for Classifying the Disputed Federalist Papers

  • Apply the successive linearization algorithm to:

    • Train on the 106 Federalist papers with known authors

    • Find a classification hyperplane that uses as few words as possible

  • Use the hyperplane to classify the 12 disputed papers


Hyperplane classifier using 3 words l.jpg
Hyperplane Classifier Using 3 Words Selection

  • A hyperplane depending on three words was found:

    0.5368to+24.6634upon+2.9532would=66.6159

  • Alldisputed papers ended up on the Madison side of the plane



Comparison with previous work conclusion l.jpg
Comparison with Previous Work & Conclusion Selection

  • Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs.

  • Our SLA algorithm for feature selectionrequired the solution of only6 linear programs.

  • Our classification of the disputed Federalist papers agrees with that of Mosteller-Wallace and Bosch-Smith.


More on svms l.jpg
More on SVMs: Selection

  • My web page:

    www.cs.wisc.edu/~gfung

  • Olvi Mangasarian web page:

    www.cs.wisc.edu/~olvi


ad