slide1 n.
Skip this Video
Download Presentation
Geometrical Complexity of Classification Problems

Loading in 2 Seconds...

play fullscreen
1 / 49

Geometrical Complexity of Classification Problems - PowerPoint PPT Presentation

  • Uploaded on

Geometrical Complexity of Classification Problems. Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law. What Is the Story in this Image?. Automatic Pattern Recognition. A fascinating theme.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Geometrical Complexity of Classification Problems' - lamis

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Geometrical Complexity of

Classification Problems

Tin Kam Ho

Computing Sciences Research Center

Bell Labs, Lucent Technologies

With contributions from Mitra Basu,

Ester Bernado, Martin Law

automatic pattern recognition
Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

  • Robots can see, read, hear, and smell.
  • We can get rid of all spam email, and find exactly what we need from the web.
  • We can protect everything with our signatures, fingerprints, iris patterns, or just our face.
  • We can track anybody by look, gesture, and gait.
  • We can be warned of disease outbreaks, terrorist threats.
  • Feed in our vital data and we will have perfect diagnosis.
  • We will know whom we can trust to lend our money.
automatic pattern recognition1
Automatic Pattern Recognition

And more …

  • We can tell the true emotions of all our encounters!
  • We can predict stock prices!
  • We can identify all potential criminals!
  • Our weapons will never lose their targets!
  • We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!
  • We will have machines discovering all knowledge!

… But how far are we from these?

automatic pattern recognition2
Automatic Pattern Recognition


Supervised learning

Unsupervised learning

Feature vectors

Feature vectors

Feature Extraction

Classifier Training



feature 2

Cluster Validation


Cluster Interpretation

class membership

feature 1

supervised learning many automatically trainable classifiers
Supervised Learning:Many automatically trainable classifiers
  • Statistical Classifiers (for numerical vectors)
    • Bayesian classifiers,
    • polynomial discriminators,
    • nearest-neighbors,
    • decision trees & forests,
    • neural networks,
    • support vector machines,
    • ensemble methods, …


  • Syntactic Classifiers (for symbol strings)
  • Structural Classifiers (for graphs)

Feature 2

Feature 1

we have had some successes but
We have had some successes. But …
  • Why are we still far from perfect?
  • What is still missing in our techniques?

Large Variations in Accuracies of Different Classifiers



Will I have any luck for my problem?

many new classifiers are in close rivalry with each other why
Many new classifiers are in close rivalry with each other. Why?
  • Do they represent the limit of our technology?
  • What have they added to the body of methods?
  • Is there still value in the older methods?
  • What is still missing in our techniques?

When I face a new problem …

  • How much can automatic classifiers do?
  • How should I choose a classifier?
sources of difficulty in classification
Sources of Difficultyin Classification
  • Class ambiguity
  • Boundary complexity
  • Sample size and dimensionality
class ambiguity
Class Ambiguity
  • Is the concept intrinsically ambiguous?
  • Are the classes well defined?
  • What information do the features carry?
  • Are the features sufficient for discrimination?
boundary complexity
Boundary Complexity
  • Kolmogorov complexity
  • Length can be exponential in dimensionality
  • Trivial description: list all points & class labels
  • Is there a shorter description?
classification boundaries as decided by different classfiers
Classification Boundaries As Decided by Different Classfiers

Training samples for a 2D classification problem

feature 2

feature 1

classification boundaries
Classification Boundaries
  • XCS (a genetic algorithm): hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

classification boundaries1
Classification Boundaries
  • XCS : hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

other classifiers other boundaries
Other Classifiers: Other Boundaries
  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

other classifiers other boundaries1
Other Classifiers: Other Boundaries
  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

other classifiers other boundaries2
Other Classifiers: Other Boundaries
  • Linear Classifier: hyperplanes

Classification boundaries in feature space

feature 2

feature 1

match between classifiers and problems
Match between Classifiers and Problems

Problem A

Problem B











sampling density
Sampling Density

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points

measures of geometrical complexity of classification problems
Measures of Geometrical Complexity of Classification Problems

Our goal: tools and languages for studying

  • Characteristics of geometry & topology of high-dim data sets
  • How they change with feature transformations and sampling
  • How they interact with classifier geometry

We want to know:

  • What are real-world problems like?
  • What is my problem like?
  • What can be expected of a method on a specific problem?
building up the language
Building up the Language
  • Identify key features of data geometry that are relevant for classification
  • Develop algorithms to extract such features from a dataset
  • Describe patterns of classifier behavior in terms of geometrical features
  • … pattern recognition in pattern recognition problems
  • … classification of classification problems
geometry of datasets and classifiers
Geometry of Datasets and Classifiers
  • Data sets:
    • length of class boundary
    • fragmentation of classes / existence of subclasses
    • global or local linear separability
    • convexity and smoothness of boundaries
    • intrinsic / extrinsic dimensionality
    • stability of these characteristics as sampling rate changes
  • Classifier models:
    • polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …
fisher s discriminant ratio
Fisher’s Discriminant Ratio
  • Defined for one feature:

means, variances of classes 1,2

  • One good feature makes a problem easy
  • Take maximum over all features
linear separability
Linear Separability
  • Studies from early literature:
    • Perceptrons, Perceptron Cycling Theorem, 1962
    • Fractional Correction Rule, 1954
    • Widrow-Hoff Delta Rule, 1960
    • Ho-Kashyap algorithm, 1965
  • Many algorithms stop only with positive conclusions.
  • But this one works:
    • Linear programming, 1968
length of class boundary
Length of Class Boundary
  • Friedman & Rafsky 1979
    • Find MST (minimum spanning tree) connecting all points regardless of class
    • Count edges joining opposite classes
    • Sensitive to separability

and clustering effects

shapes of class manifolds
Shapes of Class Manifolds
  • Lebourgeois & Emptoz 1996
  • Packing of same-class

points in hyper-spheres

  • Thick and spherical,

or thin and elongated manifolds

data sets uci
Data Sets: UCI
  • UC-Irvine machine learning archive
  • 14 datasets (no missing values, > 500 pts)
  • 844 two-class problems
  • 452 linearly separable
  • 392 linearly nonseparable
  • 2 - 4648 points each
  • 8 - 480 dimensional feature spaces
data sets random noise
Data Sets: Random Noise
  • Randomly located and labeled points
  • 100 artificial problems
  • 1 to 100 dimensional feature spaces
  • 2 classes, 1000 points per class

Space of Complexity Measures

Random noise

Linearly separable

patterns in complexity measure space lin sep lin nonsep random
Patterns in Complexity Measure Spacelin.seplin.nonseprandom
observations on distributions of problems
Observations on Distributions of Problems
  • Noise sets and linearly separable sets occupy opposite ends in many dimensions
  • In-between positions tell relative difficulty
  • Fan-like structure in most plots
  • At least 2 independent factors, joint effects
  • Noise sets are far from real data
  • Ranges of complexity value for the noise sets show effects of apparent complexity under the influence of sampling density



Continuum of Difficulty Spanned by Problems of the Same Family

  • Study specific application domains, alternative problem formulations
  • Guide feature transformations
  • Compare classifiers’ domain of competence
  • Help in choosing methods
  • Set expectations on recognition accuracy

Random labels

domains of competence of classifiers



Complexity measure 2





Complexity measure 1

Domains of Competence of Classifiers
  • Given a classification problem,

determine which classifier is the best for it

Here is my problem !

domain of competence experiment

ensemble methods

ensemble methods

ensemble methods

Domain of Competence Experiment
  • Use a set of 9 complexity measures

Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

Fisher, MaxEff, VolumeOverlap, Npts/Ndim

  • Characterize 392 two-class problems from UCI data

All shown to be linearly non-separable

  • Evaluate 6 classifiers

NN (1-nearest neighbor)

LP (linear classifier by linear programming)

Odt (oblique decision tree)

Pdfc (random subspace decision forest)

Bdfc (bagging based decision forest)

XCS (a genetic-algorithm based classifier)

worst classifier
Worst Classifier




 Bdfc

+ LP



⃟ Odt

. Several

best classifier being nn lp odt vs an ensemble technique
Best Classifier Being nn,lp,odt vs an ensemble technique




 ensemble

+ nn,lp,odt

summary of observations
Summary of Observations
  • 270 / 392 problems have a significantly best (dominant) classifier
  • 157 / 392 have a significant worst
  • 4 classifiers (NN, LP, Bdfc, XCS) are dominant for some problems
  • Almost all problems with a dominant classifier are best solved by either NN or LP
  • Simple methods (NN, LP) have extreme behavior:

When conditions are favorable, they are optimal for the problem

  • Single decision trees (by Odt) have no domain of dominant competence: Significantly worst methods are almost always single decision trees (Odt), followed by LP and NN
  • Ensemble methods are in close rivalry for many problems;

they are safe choices when boundary complexity is unknown

extension to multiple classes
Extension to Multiple Classes
  • Fisher’s discriminant score  Mulitple discriminant scores
  • Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST
intrinsic ambiguity
Intrinsic Ambiguity
  • The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)
    • Example: FeatureOverlap (in 1D only)
  • Cannot distinguish between intrinsic ambiguity or complex class decision boundary
tackling intrinsic ambiguity
Tackling Intrinsic Ambiguity
  • Compute the complexity measure at different error levels
    • f(D): a complexity measure on the data set D
    • D*: a “perturbed” version of D, so that some points are relabeled
    • h(D, D*): a distance measure between D and D* (error level)
    • The new complexity measure is defined as a curve:
    • The curve can be summarized by, say, area under curve
  • Minimization by greedy procedures
    • Discard erroneous points that decrease complexity by the most
global vs local properties
Global vs. Local Properties
  • Boundaries can be simple locally but complex globally
    • These types of problems are relatively

simple, but are characterized as

complex by the measures

  • Solution: complexity measure at different scales
    • This can be combined with different error levels
  • Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is
real problems have a mixture of difficulties
Real Problems Have a Mixture of Difficulties

E.g. Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry

To conclude:
  • We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research direction.

To continue:

  • More, better measures;
  • Detailed studies of their utilities, interactions, and estimation uncertainty;
  • Deeper understanding of constraints on these measures from point set geometry and topology;
  • Apply these to understand practical problems;
  • Apply these to make better pattern recognition algorithms.
ongoing applications
Ongoing Applications
  • Simplifying class boundaries in MR spectra

via feature reduction

with Richard Baumgartner, Ray Somorjai et al.

  • Risk assessment of biometrical authentication

with Anil Jain et al.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, to appear, 2005

A collection of 15 papers about this subject