Geometrical Complexity of
Download
1 / 49

Geometrical Complexity of Classification Problems - PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on

Geometrical Complexity of Classification Problems. Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law. What Is the Story in this Image?. Automatic Pattern Recognition. A fascinating theme.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Geometrical Complexity of Classification Problems' - lamis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Geometrical Complexity of

Classification Problems

Tin Kam Ho

Computing Sciences Research Center

Bell Labs, Lucent Technologies

With contributions from Mitra Basu,

Ester Bernado, Martin Law



Automatic pattern recognition
Automatic Pattern Recognition

A fascinating theme.

What will the world be like if machines can do it?

  • Robots can see, read, hear, and smell.

  • We can get rid of all spam email, and find exactly what we need from the web.

  • We can protect everything with our signatures, fingerprints, iris patterns, or just our face.

  • We can track anybody by look, gesture, and gait.

  • We can be warned of disease outbreaks, terrorist threats.

  • Feed in our vital data and we will have perfect diagnosis.

  • We will know whom we can trust to lend our money.


Automatic pattern recognition1
Automatic Pattern Recognition

And more …

  • We can tell the true emotions of all our encounters!

  • We can predict stock prices!

  • We can identify all potential criminals!

  • Our weapons will never lose their targets!

  • We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth!

  • We will have machines discovering all knowledge!

… But how far are we from these?


Automatic pattern recognition2
Automatic Pattern Recognition

Samples

Supervised learning

Unsupervised learning

Feature vectors

Feature vectors

Feature Extraction

Classifier Training

Clustering

classifier

feature 2

Cluster Validation

Classification

Cluster Interpretation

class membership

feature 1


Supervised learning many automatically trainable classifiers
Supervised Learning:Many automatically trainable classifiers

  • Statistical Classifiers (for numerical vectors)

    • Bayesian classifiers,

    • polynomial discriminators,

    • nearest-neighbors,

    • decision trees & forests,

    • neural networks,

    • support vector machines,

    • ensemble methods, …

      Also:

  • Syntactic Classifiers (for symbol strings)

  • Structural Classifiers (for graphs)

Feature 2

Feature 1


We have had some successes but
We have had some successes. But …

  • Why are we still far from perfect?

  • What is still missing in our techniques?


Large Variations in Accuracies of Different Classifiers

classifier

application

Will I have any luck for my problem?


Many new classifiers are in close rivalry with each other why
Many new classifiers are in close rivalry with each other. Why?

  • Do they represent the limit of our technology?

  • What have they added to the body of methods?

  • Is there still value in the older methods?

  • What is still missing in our techniques?

    When I face a new problem …

  • How much can automatic classifiers do?

  • How should I choose a classifier?


Sources of difficulty in classification
Sources of Difficulty Why?in Classification

  • Class ambiguity

  • Boundary complexity

  • Sample size and dimensionality


Class ambiguity
Class Ambiguity Why?

  • Is the concept intrinsically ambiguous?

  • Are the classes well defined?

  • What information do the features carry?

  • Are the features sufficient for discrimination?


Boundary complexity
Boundary Complexity Why?

  • Kolmogorov complexity

  • Length can be exponential in dimensionality

  • Trivial description: list all points & class labels

  • Is there a shorter description?


Classification boundaries as decided by different classfiers
Classification Boundaries As Decided by Different Classfiers Why?

Training samples for a 2D classification problem

feature 2

feature 1


Classification boundaries
Classification Boundaries Why?

  • XCS (a genetic algorithm): hyperrectangles

Classification boundaries in feature space

feature 2

feature 1


Classification boundaries1
Classification Boundaries Why?

  • XCS : hyperrectangles

Classification boundaries in feature space

feature 2

feature 1


Other classifiers other boundaries
Other Classifiers: Other Boundaries Why?

  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1


Other classifiers other boundaries1
Other Classifiers: Other Boundaries Why?

  • Nearest neighbor : Voronoi cells

Classification boundaries in feature space

feature 2

feature 1


Other classifiers other boundaries2
Other Classifiers: Other Boundaries Why?

  • Linear Classifier: hyperplanes

Classification boundaries in feature space

feature 2

feature 1


Match between classifiers and problems
Match between Classifiers and Problems Why?

Problem A

Problem B

Better!

Better!

error=0.6%

error=0.06%

error=0.7%

error=1.9%

XCS

NN

NN

XCS


Sampling density
Sampling Density Why?

Problem may appear deceptively simple or complex with small samples

2 points

10 points

100 points

500 points

1000 points


Measures of geometrical complexity of classification problems
Measures of Geometrical Complexity Why?of Classification Problems

Our goal: tools and languages for studying

  • Characteristics of geometry & topology of high-dim data sets

  • How they change with feature transformations and sampling

  • How they interact with classifier geometry

    We want to know:

  • What are real-world problems like?

  • What is my problem like?

  • What can be expected of a method on a specific problem?


Building up the language
Building up the Language Why?

  • Identify key features of data geometry that are relevant for classification

  • Develop algorithms to extract such features from a dataset

  • Describe patterns of classifier behavior in terms of geometrical features

  • … pattern recognition in pattern recognition problems

  • … classification of classification problems


Geometry of datasets and classifiers
Geometry of Datasets and Classifiers Why?

  • Data sets:

    • length of class boundary

    • fragmentation of classes / existence of subclasses

    • global or local linear separability

    • convexity and smoothness of boundaries

    • intrinsic / extrinsic dimensionality

    • stability of these characteristics as sampling rate changes

  • Classifier models:

    • polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …


Fisher s discriminant ratio
Fisher’s Discriminant Ratio Why?

  • Defined for one feature:

means, variances of classes 1,2

  • One good feature makes a problem easy

  • Take maximum over all features


Linear separability
Linear Separability Why?

  • Studies from early literature:

    • Perceptrons, Perceptron Cycling Theorem, 1962

    • Fractional Correction Rule, 1954

    • Widrow-Hoff Delta Rule, 1960

    • Ho-Kashyap algorithm, 1965

  • Many algorithms stop only with positive conclusions.

  • But this one works:

    • Linear programming, 1968


Length of class boundary
Length of Class Boundary Why?

  • Friedman & Rafsky 1979

    • Find MST (minimum spanning tree) connecting all points regardless of class

    • Count edges joining opposite classes

    • Sensitive to separability

      and clustering effects


Shapes of class manifolds
Shapes of Class Manifolds Why?

  • Lebourgeois & Emptoz 1996

  • Packing of same-class

    points in hyper-spheres

  • Thick and spherical,

    or thin and elongated manifolds



Data sets uci
Data Sets: UCI Why?

  • UC-Irvine machine learning archive

  • 14 datasets (no missing values, > 500 pts)

  • 844 two-class problems

  • 452 linearly separable

  • 392 linearly nonseparable

  • 2 - 4648 points each

  • 8 - 480 dimensional feature spaces


Data sets random noise
Data Sets: Random Noise Why?

  • Randomly located and labeled points

  • 100 artificial problems

  • 1 to 100 dimensional feature spaces

  • 2 classes, 1000 points per class


Space of Complexity Measures Why?

Random noise

Linearly separable


Patterns in complexity measure space lin sep lin nonsep random
Patterns in Complexity Measure Space Why?lin.seplin.nonseprandom


Observations on distributions of problems
Observations on Why?Distributions of Problems

  • Noise sets and linearly separable sets occupy opposite ends in many dimensions

  • In-between positions tell relative difficulty

  • Fan-like structure in most plots

  • At least 2 independent factors, joint effects

  • Noise sets are far from real data

  • Ranges of complexity value for the noise sets show effects of apparent complexity under the influence of sampling density


k=99 Why?

k=1

Continuum of Difficulty Spanned by Problems of the Same Family

  • Study specific application domains, alternative problem formulations

  • Guide feature transformations

  • Compare classifiers’ domain of competence

  • Help in choosing methods

  • Set expectations on recognition accuracy

Random labels


Domains of competence of classifiers

? Why?

LC

Complexity measure 2

XCS

Decision

Forest

NN

Complexity measure 1

Domains of Competence of Classifiers

  • Given a classification problem,

    determine which classifier is the best for it

Here is my problem !


Domain of competence experiment

ensemble methods Why?

ensemble methods

ensemble methods

Domain of Competence Experiment

  • Use a set of 9 complexity measures

    Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,

    Fisher, MaxEff, VolumeOverlap, Npts/Ndim

  • Characterize 392 two-class problems from UCI data

    All shown to be linearly non-separable

  • Evaluate 6 classifiers

    NN (1-nearest neighbor)

    LP (linear classifier by linear programming)

    Odt (oblique decision tree)

    Pdfc (random subspace decision forest)

    Bdfc (bagging based decision forest)

    XCS (a genetic-algorithm based classifier)


Best classifier seen in projection to boundary nonlinnn
Best Classifier Why?(Seen in Projection to Boundary-NonLinNN)

 Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several


Best classifier seen in projection to intrainter pretop
Best Classifier Why?(Seen in Projection to IntraInter-Pretop)

 Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several


Best classifier seen in projection to maxeff volumeoverlap
Best Classifier Why?(Seen in Projection to MaxEff-VolumeOverlap)

 Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several


Worst classifier
Worst Classifier Why?

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

 Bdfc

+ LP

X NN

O XCS

⃟ Odt

. Several


Best classifier being nn lp odt vs an ensemble technique
Best Classifier Being Why?nn,lp,odt vs an ensemble technique

Boundary-NonLinNN

IntraInter-Pretop

MaxEff-VolumeOverlap

 ensemble

+ nn,lp,odt


Summary of observations
Summary of Observations Why?

  • 270 / 392 problems have a significantly best (dominant) classifier

  • 157 / 392 have a significant worst

  • 4 classifiers (NN, LP, Bdfc, XCS) are dominant for some problems

  • Almost all problems with a dominant classifier are best solved by either NN or LP

  • Simple methods (NN, LP) have extreme behavior:

    When conditions are favorable, they are optimal for the problem

  • Single decision trees (by Odt) have no domain of dominant competence: Significantly worst methods are almost always single decision trees (Odt), followed by LP and NN

  • Ensemble methods are in close rivalry for many problems;

    they are safe choices when boundary complexity is unknown


Extension to multiple classes
Extension to Multiple Classes Why?

  • Fisher’s discriminant score  Mulitple discriminant scores

  • Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST


Intrinsic ambiguity
Intrinsic Ambiguity Why?

  • The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling)

    • Example: FeatureOverlap (in 1D only)

  • Cannot distinguish between intrinsic ambiguity or complex class decision boundary


Tackling intrinsic ambiguity
Tackling Intrinsic Ambiguity Why?

  • Compute the complexity measure at different error levels

    • f(D): a complexity measure on the data set D

    • D*: a “perturbed” version of D, so that some points are relabeled

    • h(D, D*): a distance measure between D and D* (error level)

    • The new complexity measure is defined as a curve:

    • The curve can be summarized by, say, area under curve

  • Minimization by greedy procedures

    • Discard erroneous points that decrease complexity by the most


Global vs local properties
Global vs. Local Properties Why?

  • Boundaries can be simple locally but complex globally

    • These types of problems are relatively

      simple, but are characterized as

      complex by the measures

  • Solution: complexity measure at different scales

    • This can be combined with different error levels

  • Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is


Real problems have a mixture of difficulties
Real Problems Have a Mixture of Difficulties Why?

E.g. Sparse Sample & Complex Geometry cause ill-posedness

Can’t tell which is better!

Requires further hypotheses on data geometry


To conclude: Why?

  • We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research direction.

    To continue:

  • More, better measures;

  • Detailed studies of their utilities, interactions, and estimation uncertainty;

  • Deeper understanding of constraints on these measures from point set geometry and topology;

  • Apply these to understand practical problems;

  • Apply these to make better pattern recognition algorithms.


Ongoing applications
Ongoing Applications Why?

  • Simplifying class boundaries in MR spectra

    via feature reduction

    with Richard Baumgartner, Ray Somorjai et al.

  • Risk assessment of biometrical authentication

    with Anil Jain et al.

“Data Complexity In Pattern Recognition”,

M.Basu, T.K. Ho (eds.),

Springer-Verlag, to appear, 2005

A collection of 15 papers about this subject


ad