The disputed federalist papers svm feature selection via concave minimization
1 / 22

- PowerPoint PPT Presentation

  • Uploaded on

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization. Glenn Fung and Olvi L. Mangasarian. CSNA 2002 June 13-16, 2002 Madison, Wisconsin. Outline of Talk. Support Vector Machines (SVM) Introduction. Standard Quadratic Programming Formulation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - axl

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The disputed federalist papers svm feature selection via concave minimization l.jpg

TheDisputed Federalist Papers :SVM Feature Selection via Concave Minimization

Glenn Fung and Olvi L. Mangasarian

CSNA 2002

June 13-16, 2002

Madison, Wisconsin

Outline of talk l.jpg
Outline of Talk

  • Support Vector Machines (SVM) Introduction

  • Standard Quadratic Programming Formulation

  • 1-norm Linear SVMs

  • SVM Feature Selection

  • Successive Linearization Algorithm (SLA)

  • The Disputed Federalist Papers

  • Description of the Classification Problem

  • Description of Previous Work

  • Results

  • Separating Hyperplane in Three Dimensions Only

  • Classification Agrees with Previous Results

What is a support vector machine l.jpg
What is a Support Vector Machine?

  • An optimally defined surface

  • Typically nonlinear in the input space

  • Linear in a higher dimensional space

  • Implicitly defined by a kernel function

What are support vector machines used for l.jpg
What are Support Vector Machines Used For?

  • Classification

  • Regression & Data Fitting

  • Supervised & Unsupervised Learning

(Will concentrate on classification)

Geometry of the classification problem 2 category linearly separable case l.jpg
Geometry of the Classification Problem2-Category Linearly Separable Case



Algebra of the classification problem 2 category linearly separable case l.jpg

in class +1 or –1 specified by:

  • Membership of each

  • An m-by-m diagonal matrix D with +1 & -1 entries

  • Separate by two bounding planes,

where e is a vector of ones.

Algebra of the Classification Problem2-Category Linearly Separable Case

  • Given m points in n dimensional space

  • Represented by an m-by-n matrix A

  • More succinctly:

Support vector machines maximizing the margin between bounding planes l.jpg



Support Vector MachinesMaximizing the Margin between Bounding Planes



Support vector machines quadratic programming formulation l.jpg




is the weight of the training error

  • Maximize themarginby minimizing

Support Vector Machines:Quadratic Programming Formulation

  • Solve the following quadratic program:

Support vector machines linear programming formulation l.jpg





Support Vector Machines: Linear Programming Formulation

  • Use the 1-norm instead of the 2-norm:

  • This is equivalent to the following linear program:

Feature selection and svms l.jpg
Feature Selection and SVMs




  • Use the step function to suppress components of the

    normal to the separating hyperplane:

Svm formulation with feature selection l.jpg
SVM Formulation with Feature Selection

  • For , we use the approximation of the step

    vector by the concave exponential:

  • Here is the base of natural logarithms. This leads to:



Successive linearization algorithm sla for feature selection l.jpg
Successive Linearization Algorithm (SLA) for Feature Selection

  • Choose . Start with some .

    Having , determine the next iterate

    by solving the LP:



  • Stop when:

  • Proposition: Algorithm terminates in a finite number

    of steps (typically 5 to 7) at a stationary point.

The federalist papers l.jpg
The Federalist Papers Selection

  • Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution.

  • Papers consisted of short essays, 900 to 3500 words in length.

  • Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

Previous work l.jpg
Previous Work Selection

  • Mosteller and Wallace (1964)

    • Using statistical inference, determined the authorship of the 12 disputed papers.

  • Bosch and Smith (1998).

    • Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.

Description of the data l.jpg
Description of the data Selection

  • For every paper:

    • Machine readable text was created using a scanner.

    • Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.

    • Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.

  • The dataset consists of 118 papers:

    • 50 Madison papers

    • 56 Hamilton papers

    • 12 disputed papers

Sla feature selection for classifying the disputed federalist papers l.jpg

SLA Feature Selection for Classifying the Disputed Federalist Papers

  • Apply the successive linearization algorithm to:

    • Train on the 106 Federalist papers with known authors

    • Find a classification hyperplane that uses as few words as possible

  • Use the hyperplane to classify the 12 disputed papers

Hyperplane classifier using 3 words l.jpg
Hyperplane Classifier Using 3 Words Selection

  • A hyperplane depending on three words was found:


  • Alldisputed papers ended up on the Madison side of the plane

Comparison with previous work conclusion l.jpg
Comparison with Previous Work & Conclusion Selection

  • Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs.

  • Our SLA algorithm for feature selectionrequired the solution of only6 linear programs.

  • Our classification of the disputed Federalist papers agrees with that of Mosteller-Wallace and Bosch-Smith.

More on svms l.jpg
More on SVMs: Selection

  • My web page:

  • Olvi Mangasarian web page: