Loading in 5 sec....

Geometrical Complexity of Classification ProblemsPowerPoint Presentation

Geometrical Complexity of Classification Problems

- By
**lamis** - Follow User

- 53 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Geometrical Complexity of Classification Problems' - lamis

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Classification Problems

Tin Kam Ho

Computing Sciences Research Center

Bell Labs, Lucent Technologies

With contributions from Mitra Basu,

Ester Bernado, Martin Law

Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

- Robots can see, read, hear, and smell.
- We can get rid of all spam email, and find exactly what we need from the web.
- We can protect everything with our signatures, fingerprints, iris patterns, or just our face.
- We can track anybody by look, gesture, and gait.
- We can be warned of disease outbreaks, terrorist threats.
- Feed in our vital data and we will have perfect diagnosis.
- We will know whom we can trust to lend our money.

Automatic Pattern Recognition

And more …

- We can tell the true emotions of all our encounters!
- We can predict stock prices!
- We can identify all potential criminals!
- Our weapons will never lose their targets!
- We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!
- We will have machines discovering all knowledge!

… But how far are we from these?

Automatic Pattern Recognition

Samples

Supervised learning

Unsupervised learning

Feature vectors

Feature vectors

Feature Extraction

Classifier Training

Clustering

classifier

feature 2

Cluster Validation

Classification

Cluster Interpretation

class membership

feature 1

Supervised Learning:Many automatically trainable classifiers

- Statistical Classifiers (for numerical vectors)
- Bayesian classifiers,
- polynomial discriminators,
- nearest-neighbors,
- decision trees & forests,
- neural networks,
- support vector machines,
- ensemble methods, …
Also:

- Syntactic Classifiers (for symbol strings)
- Structural Classifiers (for graphs)

Feature 2

Feature 1

We have had some successes. But …

- Why are we still far from perfect?
- What is still missing in our techniques?

Large Variations in Accuracies of Different Classifiers

classifier

application

Will I have any luck for my problem?

Many new classifiers are in close rivalry with each other. Why?

- Do they represent the limit of our technology?
- What have they added to the body of methods?
- Is there still value in the older methods?
- What is still missing in our techniques?
When I face a new problem …

- How much can automatic classifiers do?
- How should I choose a classifier?

Sources of Difficulty Why?in Classification

- Class ambiguity
- Boundary complexity
- Sample size and dimensionality

Class Ambiguity Why?

- Is the concept intrinsically ambiguous?
- Are the classes well defined?
- What information do the features carry?
- Are the features sufficient for discrimination?

Boundary Complexity Why?

- Kolmogorov complexity
- Length can be exponential in dimensionality
- Trivial description: list all points & class labels
- Is there a shorter description?

Classification Boundaries As Decided by Different Classfiers Why?

Training samples for a 2D classification problem

feature 2

feature 1

Classification Boundaries Why?

- XCS (a genetic algorithm): hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

Classification Boundaries Why?

- XCS : hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

Other Classifiers: Other Boundaries Why?

- Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

Other Classifiers: Other Boundaries Why?

- Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

Other Classifiers: Other Boundaries Why?

- Linear Classifier: hyperplanes

Classification boundaries in feature space

feature 2

feature 1

Match between Classifiers and Problems Why?

Problem A

Problem B

Better!

Better!

error=0.6%

error=0.06%

error=0.7%

error=1.9%

XCS

NN

NN

XCS

Sampling Density Why?

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points

Measures of Geometrical Complexity Why?of Classification Problems

Our goal: tools and languages for studying

- Characteristics of geometry & topology of high-dim data sets
- How they change with feature transformations and sampling
- How they interact with classifier geometry
We want to know:

- What are real-world problems like?
- What is my problem like?
- What can be expected of a method on a specific problem?

Building up the Language Why?

- Identify key features of data geometry that are relevant for classification
- Develop algorithms to extract such features from a dataset
- Describe patterns of classifier behavior in terms of geometrical features
- … pattern recognition in pattern recognition problems
- … classification of classification problems

Geometry of Datasets and Classifiers Why?

- Data sets:
- length of class boundary
- fragmentation of classes / existence of subclasses
- global or local linear separability
- convexity and smoothness of boundaries
- intrinsic / extrinsic dimensionality
- stability of these characteristics as sampling rate changes

- Classifier models:
- polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …

Fisher’s Discriminant Ratio Why?

- Defined for one feature:

means, variances of classes 1,2

- One good feature makes a problem easy
- Take maximum over all features

Linear Separability Why?

- Studies from early literature:
- Perceptrons, Perceptron Cycling Theorem, 1962
- Fractional Correction Rule, 1954
- Widrow-Hoff Delta Rule, 1960
- Ho-Kashyap algorithm, 1965

- Many algorithms stop only with positive conclusions.
- But this one works:
- Linear programming, 1968

Length of Class Boundary Why?

- Friedman & Rafsky 1979
- Find MST (minimum spanning tree) connecting all points regardless of class
- Count edges joining opposite classes
- Sensitive to separability
and clustering effects

Shapes of Class Manifolds Why?

- Lebourgeois & Emptoz 1996
- Packing of same-class
points in hyper-spheres

- Thick and spherical,
or thin and elongated manifolds

Data Sets: UCI Why?

- UC-Irvine machine learning archive
- 14 datasets (no missing values, > 500 pts)
- 844 two-class problems
- 452 linearly separable
- 392 linearly nonseparable
- 2 - 4648 points each
- 8 - 480 dimensional feature spaces

Data Sets: Random Noise Why?

- Randomly located and labeled points
- 100 artificial problems
- 1 to 100 dimensional feature spaces
- 2 classes, 1000 points per class

Patterns in Complexity Measure Space Why?lin.seplin.nonseprandom

Observations on Why?Distributions of Problems

- Noise sets and linearly separable sets occupy opposite ends in many dimensions
- In-between positions tell relative difficulty
- Fan-like structure in most plots
- At least 2 independent factors, joint effects
- Noise sets are far from real data
- Ranges of complexity value for the noise sets show effects of apparent complexity under the influence of sampling density

k=99 Why?

k=1

Continuum of Difficulty Spanned by Problems of the Same Family

- Study specific application domains, alternative problem formulations
- Guide feature transformations
- Compare classifiers’ domain of competence
- Help in choosing methods
- Set expectations on recognition accuracy

Random labels

? Why?

LC

Complexity measure 2

XCS

Decision

Forest

NN

Complexity measure 1

Domains of Competence of Classifiers- Given a classification problem,
determine which classifier is the best for it

Here is my problem !

ensemble methods Why?

ensemble methods

ensemble methods

Domain of Competence Experiment- Use a set of 9 complexity measures
Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

Fisher, MaxEff, VolumeOverlap, Npts/Ndim

- Characterize 392 two-class problems from UCI data
All shown to be linearly non-separable

- Evaluate 6 classifiers
NN (1-nearest neighbor)

LP (linear classifier by linear programming)

Odt (oblique decision tree)

Pdfc (random subspace decision forest)

Bdfc (bagging based decision forest)

XCS (a genetic-algorithm based classifier)

Best Classifier Why?(Seen in Projection to Boundary-NonLinNN)

Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several

Best Classifier Why?(Seen in Projection to IntraInter-Pretop)

Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several

Best Classifier Why?(Seen in Projection to MaxEff-VolumeOverlap)

Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several

Worst Classifier Why?

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several

Best Classifier Being Why?nn,lp,odt vs an ensemble technique

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

ensemble

+ nn,lp,odt

Summary of Observations Why?

- 270 / 392 problems have a significantly best (dominant) classifier
- 157 / 392 have a significant worst
- 4 classifiers (NN, LP, Bdfc, XCS) are dominant for some problems
- Almost all problems with a dominant classifier are best solved by either NN or LP
- Simple methods (NN, LP) have extreme behavior:
When conditions are favorable, they are optimal for the problem

- Single decision trees (by Odt) have no domain of dominant competence: Significantly worst methods are almost always single decision trees (Odt), followed by LP and NN
- Ensemble methods are in close rivalry for many problems;
they are safe choices when boundary complexity is unknown

Extension to Multiple Classes Why?

- Fisher’s discriminant score Mulitple discriminant scores
- Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST

Intrinsic Ambiguity Why?

- The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)
- Example: FeatureOverlap (in 1D only)

- Cannot distinguish between intrinsic ambiguity or complex class decision boundary

Tackling Intrinsic Ambiguity Why?

- Compute the complexity measure at different error levels
- f(D): a complexity measure on the data set D
- D*: a “perturbed” version of D, so that some points are relabeled
- h(D, D*): a distance measure between D and D* (error level)
- The new complexity measure is defined as a curve:
- The curve can be summarized by, say, area under curve

- Minimization by greedy procedures
- Discard erroneous points that decrease complexity by the most

Global vs. Local Properties Why?

- Boundaries can be simple locally but complex globally
- These types of problems are relatively
simple, but are characterized as

complex by the measures

- These types of problems are relatively
- Solution: complexity measure at different scales
- This can be combined with different error levels

- Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is

Real Problems Have a Mixture of Difficulties Why?

E.g. Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry

To conclude: Why?

- We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research direction.
To continue:

- More, better measures;
- Detailed studies of their utilities, interactions, and estimation uncertainty;
- Deeper understanding of constraints on these measures from point set geometry and topology;
- Apply these to understand practical problems;
- Apply these to make better pattern recognition algorithms.

Ongoing Applications Why?

- Simplifying class boundaries in MR spectra
via feature reduction

with Richard Baumgartner, Ray Somorjai et al.

- Risk assessment of biometrical authentication
with Anil Jain et al.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, to appear, 2005

A collection of 15 papers about this subject

Download Presentation

Connecting to Server..