Loading in 5 sec....

Text Categorization With Support Vector Machines: Learning With Many Relevant FeaturesPowerPoint Presentation

Text Categorization With Support Vector Machines: Learning With Many Relevant Features

- 221 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Text Categorization With Support Vector Machines: Learning With Many Relevant Features' - ashley

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Text Categorization With Support Vector Machines: Learning With Many Relevant Features

By Thornsten Joachims

Presented By Meghneel Gore

Goal of Text Categorization With Many Relevant Features

- Classify documents into a number of pre-defined categories.
- Documents can be in multiple categories
- Documents can be in none of the categories

Applications of Text Categorization With Many Relevant Features

- Categorization of news stories for online retrieval
- Finding interesting information from the WWW
- Guiding a user's search through hypertext

Representation of Text With Many Relevant Features

- Removal of stop words
- Reduction of word to its stem
- Preparation of feature vector

Representation of Text With Many Relevant Features

.......................

......................

......................

......................

......................

......................

.....................

2 Comput

1 Process

2 Buy

3 Memory

....

This is a Document Vector

What's Next... With Many Relevant Features

- Appropriateness of support vector machines for this application
- Support vector machine theory
- Conventional learning methods
- Experiments
- Results
- Conclusions

Why SVMs? With Many Relevant Features

- High dimensional input space
- Few irrelevant features
- Sparse document vectors
- Text categorization problems are linearly separable

Support Vector Machines With Many Relevant Features

Visualization of a Support Vector Machine

Support Vector Machines With Many Relevant Features

- Structural risk minimization

Support Vector Machines With Many Relevant Features

- We define a structure of hypothesis spaces Hi such that their respective VC dimensions di increases

Support Vector Machines With Many Relevant Features

- Lemma [Vapnik, 1982]
Consider hyperplanes

As hypotheses

Support Vector Machines With Many Relevant Features

If all example vectors are contained in

A hypersphere of radius R and it is

Required that

Support Vector Machines With Many Relevant Features

- Then this set of hyperplane has a VC dimension d bounded by

Conventional Learning Methods With Many Relevant Features

- Naïve Bayes classifier
- Rocchio algorithm
- K-nearest Neighbors
- Decision tree classifier

Naïve Bayes Classifier With Many Relevant Features

- Consider a document vector with attributes a1, a2… an with target values v
- Bayesian approach:

Naïve Bayes Classifier With Many Relevant Features

- We can rewrite that using Bayes theorem as

Naïve Bayes Classifier With Many Relevant Features

- Naïve Bayes method assumes that the attributes are independent

Experiments With Many Relevant Features

- Datasets
- Performance measures
- Results

Datasets With Many Relevant Features

- Reuters-21578 dataset
- 9603 training examples
- 3299 testing documents

- Ohsumed Corpus
- 10000 training documents
- 10000 testing examples

Performance Measures With Many Relevant Features

- Precision
- Probability that a document predicted to be in class ‘x’ truly belongs to that class

- Recall
- Probability that a document belonging to class ‘x’ is classified into that class

- Precision/recall breakeven point

Results With Many Relevant Features

Precision/recall break-even point on Ohsumed dataset

Results With Many Relevant Features

Precision/recall break-even point on Reuters dataset

Conclusions With Many Relevant Features

- Introduces SVMs for text categorization
- Theoretical and empirical evidence that SVMs are well suited for text categorization
- Consistent improvement in accuracy over other methods

Download Presentation

Connecting to Server..