1 / 37

Geometrical Complexity of Classification Problems

Geometrical Complexity of Classification Problems. Tin Kam Ho Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law. What Is the Story in this Image?. Automatic Pattern Recognition. A fascinating theme.

marina
Download Presentation

Geometrical Complexity of Classification Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geometrical Complexity of Classification Problems Tin Kam Ho Bell Labs, Lucent Technologies With contributions from Mitra Basu, Ester Bernado, Martin Law

  2. What Is the Story in this Image?

  3. Automatic Pattern Recognition A fascinating theme. What will the world be like if machines can do it? • Robots can see, read, hear, and smell. • We can get rid of all spam email, and findexactly what we need from the web. • We can protect everything with our signatures, fingerprints, iris patterns, or just our face. • We can track anybody by look, gesture, and gait. • We can be warned of disease outbreaks, terrorist threats. • Feed in our vital data and we will have perfect diagnosis. • We will know whom we can trustto lend our money.

  4. Automatic Pattern Recognition And more … • We can tell the true emotions of all our encounters! • We can predict stock prices! • We can identify all potential criminals! • Our weapons will never lose their targets! • We will be alerted about any extraordinary events going on in the heavens, on or inside the Earth! • We will have machines discovering all knowledge! … But how far are we from these?

  5. Automatic Pattern Recognition samples • Statistical Classifiers • Bayesian classifiers • polynomial discriminators • nearest-neighbor methods • decision trees & forests • neural networks • genetic algorithms • support vector machines • ensembles and classifier combination • Why are machines still far from perfect? • What is still missing in our techniques? features

  6. Large Variations in Accuracies of Different Classifiers classifier application Will I have any luck for my problem?

  7. Many classifiers are in close rivalry with each other. Why? • Do they represent the limit of our technology? • What do the new classifiers add to the methodology? • Is there still value in the older methods? • Have they used up all information contained in a data set? When I face a new recognition task … • How much can automatic classifiers do? • How should I choose a classifier? • Can I make the problem easier for a specific classifier?

  8. Sources of Difficultyin Classification • Class ambiguity • Boundary complexity • Sample size and dimensionality

  9. Class Ambiguity • Is the concept intrinsically ambiguous? • Are the classes well defined? • What information do the features carry? • Are the features sufficient for discrimination? Bayes error

  10. Boundary Complexity • Kolmogorov complexity • Length can be exponential in dimensionality • Trivial description: list all points & class labels • Is there a shorter description?

  11. Classification Boundaries As Decided by Different Classfiers Training samples for a 2D classification problem feature 2 feature 1

  12. Classification Boundaries Inferred by Different Classfiers • XCS: a genetic algorithm • Nearest neighbor classifier • Linear classifier

  13. Match between Classifiers and Problems Problem A Problem B Better! Better! error=0.6% error=0.06% error=0.7% error=1.9% XCS NN NN XCS

  14. Measures of Geometrical Complexity of Classification Problems Our approach: develop mathematical language and algorithmic tools for studying • Characteristics of geometry & topology of high-dim data • How they change with feature transformations, noise conditions, and sampling strategies • How they interact with classifier geometry Focus on descriptors computable from real data and relevant to classifier geometry

  15. Geometry of Datasets and Classifiers • Data sets: • length of class boundary • fragmentation of classes / existence of subclasses • global or local linear separability • convexity and smoothness of boundaries • intrinsic / extrinsic dimensionality • stability of these characteristics as sampling rate changes • Classifier models: • polygons, hyper-spheres, Gaussian kernels, axis-parallel hyper-planes, piece-wise linear surfaces, polynomial surfaces, their unions or intersections, …

  16. Measures of Geometric Complexity Degree of Linear Separability Fisher’s Discriminant Ratio • Find separating hyper-plane by linear programming • Error counts and distances to plane measure separability • Classical measure of class separability • Maximize over all features to find the most discriminating Shapes of Class Manifolds Length of Class Boundary • Cover same-class pts with maximal balls • Ball counts describe shape of class manifold • Compute minimum spanning tree • Count class-crossing edges

  17. Measures of Geometrical Complexity

  18. Experiments with Controlled Data Sets • Real-World Data Sets: Benchmarking data from UC-Irvine archive 844 two-class problems 452 are linearly separable, 392 nonseparable • Synthetic Data Sets: Random labeling of randomly located points 100 problems in 1-100 dimensions

  19. Patterns in Complexity Measure Spacelin.sep lin.nonsep random

  20. Problem Distribution in 1st & 2nd Principal Components of Complexity Space

  21. Loadings of the First 6 Prin. Comp.

  22. Interpretation of the First 4 Prin. Comp. • 50% of variance: Linearity of boundary and proximity of opposite class neighbors • 12% of variance: Balance between within-class scatter and between-class distance • 11% of variance: Concentration & orientation of intrusion into opposite class • 9% of variance: Within-class scatter

  23. Problem Distribution in 1st & 2nd Principal Components of Complexity Space • Continuous distribution • Known easy & difficult problems occupy opposite ends • Few outliers • Empty regions Linearly separable Random labels

  24. Questions for the Theoretician • Is the distribution necessarily continuous? • What caused the outliers? • Will the empty regions ever be filled? • How are the complexity measures related? • What is the intrinsic dimensionality of the distribution?

  25. Questions for the Practitioner • Where does a particular problem fit in this continuum? • Can I use this to guide feature selection & transformation? • How do I set expectations on recognition accuracy? • Can I use this to help choosing classifiers?

  26. ? LC Complexity measure 2 XCS Decision Forest NN Complexity measure 1 Domains of Competence of Classifiers • Given a classification problem, determine which classifier is the best for it Here is my problem !

  27. ensemble methods ensemble methods ensemble methods Domain of Competence Experiment • Use a set of 9 complexity measures Boundary, Pretop, IntraInter, NonLinNN, NonLinLP, Fisher, MaxEff, VolumeOverlap, Npts/Ndim • Characterize 392 two-class problems from UCI data All shown to be linearly non-separable • Evaluate 6 classifiers NN (1-nearest neighbor) LP (linear classifier by linear programming) Odt (oblique decision tree) Pdfc (random subspace decision forest) Bdfc (bagging based decision forest) XCS (a genetic-algorithm based classifier)

  28. Classifier Domains of Competence Best Classifier for Benchmarking Data

  29. Best Classifier Being nn,lp,odt vs an ensemble technique Boundary-NonLinNN IntraInter-Pretop MaxEff-VolumeOverlap  ensemble + nn,lp,odt

  30. Other Studies on Data Complexity Global vs. Local Properties Multi-Class Measures Intrinsic Ambiguity & Mislabeling Task Trajectory with Changing Sampling & Noise Conditions k=99 k=1

  31. Extension to Multiple Classes • Fisher’s discriminant score  Mulitple discriminant scores • Boundary point in a MST: a point is a boundary point as long as it is next to a point from other classes in the MST

  32. Global vs. Local Properties • Boundaries can be simple locally but complex globally • These types of problems are relatively simple, but are characterized as complex by the measures • Solution: complexity measure at different scales • This can be combined with different error levels • Let Ni,k be the k neighbors of the i-th point defined by, say, Euclidean distance. The complexity measure for data set D, error level , evaluated at scale k is

  33. Intrinsic Ambiguity • The complexity measures can be severely affected when there exists intrinsic class ambiguity (or data mislabeling) • Example: FeatureOverlap (in 1D only) • Cannot distinguish between intrinsic ambiguity or complex class decision boundary

  34. Tackling Intrinsic Ambiguity • Compute the complexity measure at different error levels • f(D): a complexity measure on the data set D • D*: a “perturbed” version of D, so that some points are relabeled • h(D, D*): a distance measure between D and D* (error level) • The new complexity measure is defined as a curve: • The curve can be summarized by, say, area under curve • Minimization by greedy procedures • Discard erroneous points that decrease complexity by the most

  35. Sampling Density Problem may appear deceptively simple or complex with small samples 2 points 10 points 100 points 500 points 1000 points

  36. Real Problems Have a Mixture of Difficulties Sparse Sample & Complex Geometry cause ill-posedness Can’t tell which is better! Requires further hypotheses on data geometry

  37. To Conclude: • We have some early success in using geometrical measures to characterize classification complexity, pointing to a potentially fruitful research area. Future Directions: • More, better measures; • Detailed studies of their utilities, interactions, and estimation uncertainty; • Deeper understanding of constraints on these measures from point set geometry and topology; • Apply these to understand practical recognition tasks; • Apply these to find transformations that simplify boundaries; • Apply these to make better pattern recognition algorithms. “Data Complexity In Pattern Recognition”, M.Basu, T.K. Ho (eds.), Springer-Verlag, in press.

More Related