Geometrical Complexity of Classification Problems - PowerPoint PPT Presentation

Geometrical Complexity of
1 / 49

  • Uploaded on
  • Presentation posted in: General

Geometrical Complexity of Classification Problems. Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law. What Is the Story in this Image?. Automatic Pattern Recognition. A fascinating theme.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentationdownload

Geometrical Complexity of Classification Problems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Geometrical complexity of classification problems

Geometrical Complexity of

Classification Problems

Tin Kam Ho

Computing Sciences Research Center

Bell Labs, Lucent Technologies

With contributions from Mitra Basu,

Ester Bernado, Martin Law

Geometrical complexity of classification problems

What Is the Story in this Image?

Automatic pattern recognition

Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

  • Robots can see, read, hear, and smell.

  • We can get rid of all spam email, and find exactly what we need from the web.

  • We can protect everything with our signatures, fingerprints, iris patterns, or just our face.

  • We can track anybody by look, gesture, and gait.

  • We can be warned of disease outbreaks, terrorist threats.

  • Feed in our vital data and we will have perfect diagnosis.

  • We will know whom we can trust to lend our money.

Automatic pattern recognition1

Automatic Pattern Recognition

And more …

  • We can tell the true emotions of all our encounters!

  • We can predict stock prices!

  • We can identify all potential criminals!

  • Our weapons will never lose their targets!

  • We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!

  • We will have machines discovering all knowledge!

… But how far are we from these?

Automatic pattern recognition2

Automatic Pattern Recognition


Supervised learning

Unsupervised learning

Feature vectors

Feature vectors

Feature Extraction

Classifier Training



feature 2

Cluster Validation


Cluster Interpretation

class membership

feature 1

Supervised learning many automatically trainable classifiers

Supervised Learning:Many automatically trainable classifiers

  • Statistical Classifiers (for numerical vectors)

    • Bayesian classifiers,

    • polynomial discriminators,

    • nearest-neighbors,

    • decision trees & forests,

    • neural networks,

    • support vector machines,

    • ensemble methods, …


  • Syntactic Classifiers (for symbol strings)

  • Structural Classifiers (for graphs)

Feature 2

Feature 1

We have had some successes but

We have had some successes. But …

  • Why are we still far from perfect?

  • What is still missing in our techniques?

Geometrical complexity of classification problems

Large Variations in Accuracies of Different Classifiers



Will I have any luck for my problem?

Many new classifiers are in close rivalry with each other why

Many new classifiers are in close rivalry with each other. Why?

  • Do they represent the limit of our technology?

  • What have they added to the body of methods?

  • Is there still value in the older methods?

  • What is still missing in our techniques?

    When I face a new problem …

  • How much can automatic classifiers do?

  • How should I choose a classifier?

Sources of difficulty in classification

Sources of Difficultyin Classification

  • Class ambiguity

  • Boundary complexity

  • Sample size and dimensionality

Class ambiguity

Class Ambiguity

  • Is the concept intrinsically ambiguous?

  • Are the classes well defined?

  • What information do the features carry?

  • Are the features sufficient for discrimination?

Boundary complexity

Boundary Complexity

  • Kolmogorov complexity

  • Length can be exponential in dimensionality

  • Trivial description: list all points & class labels

  • Is there a shorter description?

Classification boundaries as decided by different classfiers

Classification Boundaries As Decided by Different Classfiers

Training samples for a 2D classification problem

feature 2

feature 1

Classification boundaries

Classification Boundaries

  • XCS (a genetic algorithm): hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

Classification boundaries1

Classification Boundaries

  • XCS : hyperrectangles

Classification boundaries in feature space

feature 2

feature 1

Other classifiers other boundaries

Other Classifiers: Other Boundaries

  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

Other classifiers other boundaries1

Other Classifiers: Other Boundaries

  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1

Other classifiers other boundaries2

Other Classifiers: Other Boundaries

  • Linear Classifier: hyperplanes

Classification boundaries in feature space

feature 2

feature 1

Match between classifiers and problems

Match between Classifiers and Problems

Problem A

Problem B











Sampling density

Sampling Density

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points

Measures of geometrical complexity of classification problems

Measures of Geometrical Complexity of Classification Problems

Our goal: tools and languages for studying

  • Characteristics of geometry & topology of high-dim data sets

  • How they change with feature transformations and sampling

  • How they interact with classifier geometry

    We want to know:

  • What are real-world problems like?

  • What is my problem like?

  • What can be expected of a method on a specific problem?

Building up the language

Building up the Language

  • Identify key features of data geometry that are relevant for classification

  • Develop algorithms to extract such features from a dataset

  • Describe patterns of classifier behavior in terms of geometrical features

  • … pattern recognition in pattern recognition problems

  • … classification of classification problems

Geometry of datasets and classifiers

Geometry of Datasets and Classifiers

  • Data sets:

    • length of class boundary

    • fragmentation of classes / existence of subclasses

    • global or local linear separability

    • convexity and smoothness of boundaries

    • intrinsic / extrinsic dimensionality

    • stability of these characteristics as sampling rate changes

  • Classifier models:

    • polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …

Fisher s discriminant ratio

Fisher’s Discriminant Ratio

  • Defined for one feature:

means, variances of classes 1,2

  • One good feature makes a problem easy

  • Take maximum over all features

Linear separability

Linear Separability

  • Studies from early literature:

    • Perceptrons, Perceptron Cycling Theorem, 1962

    • Fractional Correction Rule, 1954

    • Widrow-Hoff Delta Rule, 1960

    • Ho-Kashyap algorithm, 1965

  • Many algorithms stop only with positive conclusions.

  • But this one works:

    • Linear programming, 1968

Length of class boundary

Length of Class Boundary

  • Friedman & Rafsky 1979

    • Find MST (minimum spanning tree) connecting all points regardless of class

    • Count edges joining opposite classes

    • Sensitive to separability

      and clustering effects

Shapes of class manifolds

Shapes of Class Manifolds

  • Lebourgeois & Emptoz 1996

  • Packing of same-class

    points in hyper-spheres

  • Thick and spherical,

    or thin and elongated manifolds

Measures of geometrical complexity

Measures of Geometrical Complexity

Data sets uci

Data Sets: UCI

  • UC-Irvine machine learning archive

  • 14 datasets (no missing values, > 500 pts)

  • 844 two-class problems

  • 452 linearly separable

  • 392 linearly nonseparable

  • 2 - 4648 points each

  • 8 - 480 dimensional feature spaces

Data sets random noise

Data Sets: Random Noise

  • Randomly located and labeled points

  • 100 artificial problems

  • 1 to 100 dimensional feature spaces

  • 2 classes, 1000 points per class

Geometrical complexity of classification problems

Space of Complexity Measures

Random noise

Linearly separable

Patterns in complexity measure space lin sep lin nonsep random

Patterns in Complexity Measure Spacelin.seplin.nonseprandom

Observations on distributions of problems

Observations on Distributions of Problems

  • Noise sets and linearly separable sets occupy opposite ends in many dimensions

  • In-between positions tell relative difficulty

  • Fan-like structure in most plots

  • At least 2 independent factors, joint effects

  • Noise sets are far from real data

  • Ranges of complexity value for the noise sets show effects of apparent complexity under the influence of sampling density

Geometrical complexity of classification problems



Continuum of Difficulty Spanned by Problems of the Same Family

  • Study specific application domains, alternative problem formulations

  • Guide feature transformations

  • Compare classifiers’ domain of competence

  • Help in choosing methods

  • Set expectations on recognition accuracy

Random labels

Domains of competence of classifiers



Complexity measure 2





Complexity measure 1

Domains of Competence of Classifiers

  • Given a classification problem,

    determine which classifier is the best for it

Here is my problem !

Domain of competence experiment

ensemble methods

ensemble methods

ensemble methods

Domain of Competence Experiment

  • Use a set of 9 complexity measures

    Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

    Fisher, MaxEff, VolumeOverlap, Npts/Ndim

  • Characterize 392 two-class problems from UCI data

    All shown to be linearly non-separable

  • Evaluate 6 classifiers

    NN (1-nearest neighbor)

    LP (linear classifier by linear programming)

    Odt (oblique decision tree)

    Pdfc (random subspace decision forest)

    Bdfc (bagging based decision forest)

    XCS (a genetic-algorithm based classifier)

Best classifier seen in projection to boundary nonlinnn

Best Classifier (Seen in Projection to Boundary-NonLinNN)

 Bdfc

+ LP



⃟ Odt

. Several

Best classifier seen in projection to intrainter pretop

Best Classifier (Seen in Projection to IntraInter-Pretop)

 Bdfc

+ LP



⃟ Odt

. Several

Best classifier seen in projection to maxeff volumeoverlap

Best Classifier (Seen in Projection to MaxEff-VolumeOverlap)

 Bdfc

+ LP



⃟ Odt

. Several

Worst classifier

Worst Classifier




 Bdfc

+ LP



⃟ Odt

. Several

Best classifier being nn lp odt vs an ensemble technique

Best Classifier Being nn,lp,odt vs an ensemble technique




 ensemble

+ nn,lp,odt

Summary of observations

Summary of Observations

  • 270 / 392 problems have a significantly best (dominant) classifier

  • 157 / 392 have a significant worst

  • 4 classifiers (NN, LP, Bdfc, XCS) are dominant for some problems

  • Almost all problems with a dominant classifier are best solved by either NN or LP

  • Simple methods (NN, LP) have extreme behavior:

    When conditions are favorable, they are optimal for the problem

  • Single decision trees (by Odt) have no domain of dominant competence: Significantly worst methods are almost always single decision trees (Odt), followed by LP and NN

  • Ensemble methods are in close rivalry for many problems;

    they are safe choices when boundary complexity is unknown

Extension to multiple classes

Extension to Multiple Classes

  • Fisher’s discriminant score  Mulitple discriminant scores

  • Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST

Intrinsic ambiguity

Intrinsic Ambiguity

  • The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)

    • Example: FeatureOverlap (in 1D only)

  • Cannot distinguish between intrinsic ambiguity or complex class decision boundary

Tackling intrinsic ambiguity

Tackling Intrinsic Ambiguity

  • Compute the complexity measure at different error levels

    • f(D): a complexity measure on the data set D

    • D*: a “perturbed” version of D, so that some points are relabeled

    • h(D, D*): a distance measure between D and D* (error level)

    • The new complexity measure is defined as a curve:

    • The curve can be summarized by, say, area under curve

  • Minimization by greedy procedures

    • Discard erroneous points that decrease complexity by the most

Global vs local properties

Global vs. Local Properties

  • Boundaries can be simple locally but complex globally

    • These types of problems are relatively

      simple, but are characterized as

      complex by the measures

  • Solution: complexity measure at different scales

    • This can be combined with different error levels

  • Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is

Real problems have a mixture of difficulties

Real Problems Have a Mixture of Difficulties

E.g. Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry

Geometrical complexity of classification problems

To conclude:

  • We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research direction.

    To continue:

  • More, better measures;

  • Detailed studies of their utilities, interactions, and estimation uncertainty;

  • Deeper understanding of constraints on these measures from point set geometry and topology;

  • Apply these to understand practical problems;

  • Apply these to make better pattern recognition algorithms.

Ongoing applications

Ongoing Applications

  • Simplifying class boundaries in MR spectra

    via feature reduction

    with Richard Baumgartner, Ray Somorjai et al.

  • Risk assessment of biometrical authentication

    with Anil Jain et al.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, to appear, 2005

A collection of 15 papers about this subject

  • Login