Loading in 5 sec....

Geometrical Complexity of Classification ProblemsPowerPoint Presentation

Geometrical Complexity of Classification Problems

- 72 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Geometrical Complexity of Classification Problems' - marina

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Classification Problems

Tin Kam Ho

Bell Labs, Lucent Technologies

With contributions from

Mitra Basu, Ester Bernado, Martin Law

Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

- Robots can see, read, hear, and smell.
- We can get rid of all spam email, and findexactly what we need from the web.
- We can protect everything with our signatures, fingerprints, iris patterns, or just our face.
- We can track anybody by look, gesture, and gait.
- We can be warned of disease outbreaks, terrorist threats.
- Feed in our vital data and we will have perfect diagnosis.
- We will know whom we can trustto lend our money.

Automatic Pattern Recognition

And more …

- We can tell the true emotions of all our encounters!
- We can predict stock prices!
- We can identify all potential criminals!
- Our weapons will never lose their targets!
- We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!
- We will have machines discovering all knowledge!

… But how far are we from these?

samples

- Statistical Classifiers
- Bayesian classifiers
- polynomial discriminators
- nearest-neighbor methods
- decision trees & forests
- neural networks
- genetic algorithms
- support vector machines
- ensembles and classifier combination

- Why are machines still far from perfect?
- What is still missing in our techniques?

features

Large Variations in Accuracies of Different Classifiers

classifier

application

Will I have any luck for my problem?

Many classifiers are in close rivalry with each other. Why?

- Do they represent the limit of our technology?
- What do the new classifiers add to the methodology?
- Is there still value in the older methods?
- Have they used up all information contained in a data set?
When I face a new recognition task …

- How much can automatic classifiers do?
- How should I choose a classifier?
- Can I make the problem easier for a specific classifier?

Sources of Difficultyin Classification

- Class ambiguity
- Boundary complexity
- Sample size and dimensionality

Class Ambiguity

- Is the concept intrinsically ambiguous?
- Are the classes well defined?
- What information do the features carry?
- Are the features sufficient for discrimination?

Bayes error

Boundary Complexity

- Kolmogorov complexity
- Length can be exponential in dimensionality
- Trivial description: list all points & class labels
- Is there a shorter description?

Classification Boundaries As Decided by Different Classfiers

Training samples for a 2D classification problem

feature 2

feature 1

Classification Boundaries Inferred by Different Classfiers

- XCS: a genetic algorithm

- Nearest neighbor classifier

- Linear classifier

Match between Classifiers and Problems

Problem A

Problem B

Better!

Better!

error=0.6%

error=0.06%

error=0.7%

error=1.9%

XCS

NN

NN

XCS

Measures of Geometrical Complexity of Classification Problems

Our approach: develop mathematical language and algorithmic tools for studying

- Characteristics of geometry & topology of high-dim data
- How they change with feature transformations, noise conditions, and sampling strategies
- How they interact with classifier geometry
Focus on descriptors computable from real data and relevant to classifier geometry

Geometry of Datasets and Classifiers

- Data sets:
- length of class boundary
- fragmentation of classes / existence of subclasses
- global or local linear separability
- convexity and smoothness of boundaries
- intrinsic / extrinsic dimensionality
- stability of these characteristics as sampling rate changes

- Classifier models:
- polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …

Measures of Geometric Complexity

Degree of Linear Separability

Fisher’s Discriminant Ratio

- Find separating hyper-plane by linear programming
- Error counts and distances to plane measure separability

- Classical measure of class separability
- Maximize over all features to find the most discriminating

Shapes of Class Manifolds

Length of Class Boundary

- Cover same-class pts with maximal balls
- Ball counts describe shape of class manifold

- Compute minimum spanning tree
- Count class-crossing edges

Experiments with Controlled Data Sets

- Real-World Data Sets:
Benchmarking data from UC-Irvine archive

844 two-class problems

452 are linearly separable, 392 nonseparable

- Synthetic Data Sets:
Random labeling of randomly located points

100 problems in 1-100 dimensions

Patterns in Complexity Measure Spacelin.sep lin.nonsep random

Problem Distribution in 1st & 2nd Principal Components of Complexity Space

Interpretation of the First 4 Prin. Comp.

- 50% of variance: Linearity of boundary and proximity of opposite class neighbors
- 12% of variance: Balance between within-class scatter and between-class distance
- 11% of variance: Concentration & orientation of intrusion into opposite class
- 9% of variance: Within-class scatter

Problem Distribution in 1st & 2nd Principal Components of Complexity Space

- Continuous distribution
- Known easy & difficult problems occupy opposite ends
- Few outliers
- Empty regions

Linearly separable

Random labels

Questions for the Theoretician

- Is the distribution necessarily continuous?
- What caused the outliers?
- Will the empty regions ever be filled?
- How are the complexity measures related?
- What is the intrinsic dimensionality of the distribution?

Questions for the Practitioner

- Where does a particular problem fit in this continuum?
- Can I use this to guide feature selection & transformation?
- How do I set expectations on recognition accuracy?
- Can I use this to help choosing classifiers?

LC

Complexity measure 2

XCS

Decision

Forest

NN

Complexity measure 1

Domains of Competence of Classifiers- Given a classification problem,
determine which classifier is the best for it

Here is my problem !

ensemble methods

ensemble methods

Domain of Competence Experiment- Use a set of 9 complexity measures
Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

Fisher, MaxEff, VolumeOverlap, Npts/Ndim

- Characterize 392 two-class problems from UCI data
All shown to be linearly non-separable

- Evaluate 6 classifiers
NN (1-nearest neighbor)

LP (linear classifier by linear programming)

Odt (oblique decision tree)

Pdfc (random subspace decision forest)

Bdfc (bagging based decision forest)

XCS (a genetic-algorithm based classifier)

Classifier Domains of Competence

Best Classifier for Benchmarking Data

Best Classifier Being nn,lp,odt vs an ensemble technique

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

ensemble

+ nn,lp,odt

Other Studies on Data Complexity

Global vs. Local Properties

Multi-Class Measures

Intrinsic Ambiguity & Mislabeling

Task Trajectory with Changing Sampling & Noise Conditions

k=99

k=1

Extension to Multiple Classes

- Fisher’s discriminant score Mulitple discriminant scores
- Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST

Global vs. Local Properties

- Boundaries can be simple locally but complex globally
- These types of problems are relatively
simple, but are characterized as

complex by the measures

- These types of problems are relatively
- Solution: complexity measure at different scales
- This can be combined with different error levels

- Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is

Intrinsic Ambiguity

- The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)
- Example: FeatureOverlap (in 1D only)

- Cannot distinguish between intrinsic ambiguity or complex class decision boundary

Tackling Intrinsic Ambiguity

- Compute the complexity measure at different error levels
- f(D): a complexity measure on the data set D
- D*: a “perturbed” version of D, so that some points are relabeled
- h(D, D*): a distance measure between D and D* (error level)
- The new complexity measure is defined as a curve:
- The curve can be summarized by, say, area under curve

- Minimization by greedy procedures
- Discard erroneous points that decrease complexity by the most

Sampling Density

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points

Real Problems Have a Mixture of Difficulties

Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry

- We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research area.
Future Directions:

- More, better measures;
- Detailed studies of their utilities, interactions, and estimation uncertainty;
- Deeper understanding of constraints on these measures from point set geometry and topology;
- Apply these to understand practical recognition tasks;
- Apply these to find transformations that simplify boundaries;
- Apply these to make better pattern recognition algorithms.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, in press.

Download Presentation

Connecting to Server..