Geometrical Complexity of
1 / 37

Geometrical Complexity of Classification Problems - PowerPoint PPT Presentation

  • Uploaded on

Geometrical Complexity of Classification Problems. Tin Kam Ho Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law. What Is the Story in this Image?. Automatic Pattern Recognition. A fascinating theme.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Geometrical Complexity of Classification Problems' - marina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Geometrical Complexity of

Classification Problems

Tin Kam Ho

Bell Labs, Lucent Technologies

With contributions from

Mitra Basu, Ester Bernado, Martin Law

Automatic pattern recognition
Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

  • Robots can see, read, hear, and smell.

  • We can get rid of all spam email, and findexactly what we need from the web.

  • We can protect everything with our signatures, fingerprints, iris patterns, or just our face.

  • We can track anybody by look, gesture, and gait.

  • We can be warned of disease outbreaks, terrorist threats.

  • Feed in our vital data and we will have perfect diagnosis.

  • We will know whom we can trustto lend our money.

Automatic pattern recognition1
Automatic Pattern Recognition

And more …

  • We can tell the true emotions of all our encounters!

  • We can predict stock prices!

  • We can identify all potential criminals!

  • Our weapons will never lose their targets!

  • We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!

  • We will have machines discovering all knowledge!

… But how far are we from these?

Automatic Pattern Recognition


  • Statistical Classifiers

    • Bayesian classifiers

    • polynomial discriminators

    • nearest-neighbor methods

    • decision trees & forests

    • neural networks

    • genetic algorithms

    • support vector machines

    • ensembles and classifier combination

  • Why are machines still far from perfect?

  • What is still missing in our techniques?


Large Variations in Accuracies of Different Classifiers



Will I have any luck for my problem?

Many classifiers are in close rivalry with each other why
Many classifiers are in close rivalry with each other. Why?

  • Do they represent the limit of our technology?

  • What do the new classifiers add to the methodology?

  • Is there still value in the older methods?

  • Have they used up all information contained in a data set?

    When I face a new recognition task …

  • How much can automatic classifiers do?

  • How should I choose a classifier?

  • Can I make the problem easier for a specific classifier?

Sources of difficulty in classification
Sources of Difficultyin Classification

  • Class ambiguity

  • Boundary complexity

  • Sample size and dimensionality

Class ambiguity
Class Ambiguity

  • Is the concept intrinsically ambiguous?

  • Are the classes well defined?

  • What information do the features carry?

  • Are the features sufficient for discrimination?

Bayes error

Boundary complexity
Boundary Complexity

  • Kolmogorov complexity

  • Length can be exponential in dimensionality

  • Trivial description: list all points & class labels

  • Is there a shorter description?

Classification boundaries as decided by different classfiers
Classification Boundaries As Decided by Different Classfiers

Training samples for a 2D classification problem

feature 2

feature 1

Classification Boundaries Inferred by Different Classfiers

  • XCS: a genetic algorithm

  • Nearest neighbor classifier

  • Linear classifier

Match between classifiers and problems
Match between Classifiers and Problems

Problem A

Problem B











Measures of geometrical complexity of classification problems
Measures of Geometrical Complexity of Classification Problems

Our approach: develop mathematical language and algorithmic tools for studying

  • Characteristics of geometry & topology of high-dim data

  • How they change with feature transformations, noise conditions, and sampling strategies

  • How they interact with classifier geometry

    Focus on descriptors computable from real data and relevant to classifier geometry

Geometry of datasets and classifiers
Geometry of Datasets and Classifiers

  • Data sets:

    • length of class boundary

    • fragmentation of classes / existence of subclasses

    • global or local linear separability

    • convexity and smoothness of boundaries

    • intrinsic / extrinsic dimensionality

    • stability of these characteristics as sampling rate changes

  • Classifier models:

    • polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …

Measures of geometric complexity
Measures of Geometric Complexity

Degree of Linear Separability

Fisher’s Discriminant Ratio

  • Find separating hyper-plane by linear programming

  • Error counts and distances to plane measure separability

  • Classical measure of class separability

  • Maximize over all features to find the most discriminating

Shapes of Class Manifolds

Length of Class Boundary

  • Cover same-class pts with maximal balls

  • Ball counts describe shape of class manifold

  • Compute minimum spanning tree

  • Count class-crossing edges

Experiments with controlled data sets
Experiments with Controlled Data Sets

  • Real-World Data Sets:

    Benchmarking data from UC-Irvine archive

    844 two-class problems

    452 are linearly separable, 392 nonseparable

  • Synthetic Data Sets:

    Random labeling of randomly located points

    100 problems in 1-100 dimensions

Patterns in complexity measure space lin sep lin nonsep random
Patterns in Complexity Measure Spacelin.sep lin.nonsep random

Problem Distribution in 1st & 2nd Principal Components of Complexity Space

Interpretation of the first 4 prin comp
Interpretation of the First 4 Prin. Comp.

  • 50% of variance: Linearity of boundary and proximity of opposite class neighbors

  • 12% of variance: Balance between within-class scatter and between-class distance

  • 11% of variance: Concentration & orientation of intrusion into opposite class

  • 9% of variance: Within-class scatter

Problem Distribution in 1st & 2nd Principal Components of Complexity Space

  • Continuous distribution

  • Known easy & difficult problems occupy opposite ends

  • Few outliers

  • Empty regions

Linearly separable

Random labels

Questions for the Theoretician

  • Is the distribution necessarily continuous?

  • What caused the outliers?

  • Will the empty regions ever be filled?

  • How are the complexity measures related?

  • What is the intrinsic dimensionality of the distribution?

Questions for the Practitioner

  • Where does a particular problem fit in this continuum?

  • Can I use this to guide feature selection & transformation?

  • How do I set expectations on recognition accuracy?

  • Can I use this to help choosing classifiers?

Domains of competence of classifiers



Complexity measure 2





Complexity measure 1

Domains of Competence of Classifiers

  • Given a classification problem,

    determine which classifier is the best for it

Here is my problem !

Domain of competence experiment

ensemble methods

ensemble methods

ensemble methods

Domain of Competence Experiment

  • Use a set of 9 complexity measures

    Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

    Fisher, MaxEff, VolumeOverlap, Npts/Ndim

  • Characterize 392 two-class problems from UCI data

    All shown to be linearly non-separable

  • Evaluate 6 classifiers

    NN (1-nearest neighbor)

    LP (linear classifier by linear programming)

    Odt (oblique decision tree)

    Pdfc (random subspace decision forest)

    Bdfc (bagging based decision forest)

    XCS (a genetic-algorithm based classifier)

Classifier domains of competence
Classifier Domains of Competence

Best Classifier for Benchmarking Data

Best classifier being nn lp odt vs an ensemble technique
Best Classifier Being nn,lp,odt vs an ensemble technique




 ensemble

+ nn,lp,odt

Other studies on data complexity
Other Studies on Data Complexity

Global vs. Local Properties

Multi-Class Measures

Intrinsic Ambiguity & Mislabeling

Task Trajectory with Changing Sampling & Noise Conditions



Extension to multiple classes
Extension to Multiple Classes

  • Fisher’s discriminant score  Mulitple discriminant scores

  • Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST

Global vs local properties
Global vs. Local Properties

  • Boundaries can be simple locally but complex globally

    • These types of problems are relatively

      simple, but are characterized as

      complex by the measures

  • Solution: complexity measure at different scales

    • This can be combined with different error levels

  • Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is

Intrinsic ambiguity
Intrinsic Ambiguity

  • The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)

    • Example: FeatureOverlap (in 1D only)

  • Cannot distinguish between intrinsic ambiguity or complex class decision boundary

Tackling intrinsic ambiguity
Tackling Intrinsic Ambiguity

  • Compute the complexity measure at different error levels

    • f(D): a complexity measure on the data set D

    • D*: a “perturbed” version of D, so that some points are relabeled

    • h(D, D*): a distance measure between D and D* (error level)

    • The new complexity measure is defined as a curve:

    • The curve can be summarized by, say, area under curve

  • Minimization by greedy procedures

    • Discard erroneous points that decrease complexity by the most

Sampling density
Sampling Density

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points

Real problems have a mixture of difficulties
Real Problems Have a Mixture of Difficulties

Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry

To Conclude:

  • We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research area.

    Future Directions:

  • More, better measures;

  • Detailed studies of their utilities, interactions, and estimation uncertainty;

  • Deeper understanding of constraints on these measures from point set geometry and topology;

  • Apply these to understand practical recognition tasks;

  • Apply these to find transformations that simplify boundaries;

  • Apply these to make better pattern recognition algorithms.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, in press.