Efficient Optimal Linear Boosting of a Pair of Classifiers

Efficient Optimal Linear Boostingof a Pair of Classifiers Victor Boyarshinov and Malik Magdon-Ismail, IEEE Transaction on neural networks, Vol. 18, No. 2, 2007, pp. 317-328. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/5/16

Outline • Introduction • Optimal fat separators in two dimension • Mirrored-radial coordinates • Maximizing the Margin Subject to Minimum Weight • Leave-one-out error • Discussions • Comments

Motivation • MODEL aggregation • Enhancing the statistical performance of a set of weak classifiers to obtain a stronger classifier. • For example, boosting and bagging. • Combinatorial problem of boosting a pair of classifiers • What is the optimal linear combination of this pair of classifiers?

Objective • Optimal linear separation (l) • the optimal linear boosted classifier for g1 and g2 classifier (for whatever optimal means) in terms of its corresponding optimal linear classifier in the 2-D space.

A (linearly) boosted classification function • A (linearly) boosted classification function g(z) • g(z) = w0+w1g1(z) + w2g2(z) • The corresponding classifier is • sign(g(z)) = sign(w0+w1g1(z) + w2g2(z)) • Each data point zid can be mapped onto this 2-D feature space by { zi , yi}  • A linearly boosted classifier corresponds exactly to a linear separator in this 2-D space.

Optimal linear separation in 2 • Efficient algorithms for exact linear separation • Applicable to the case where the data points are not linearly separable. Definition 1.2: Two sets A and B are linearly separable iff v, v0 such that vTx + v0 > 0, xAand vTx + v0 < 0, xB . The pair (v, v0) defines an (oriented) separating hyperplane. • If two sets and are linearly separable, the margin of a separating hyperplane is the minimum distance of a data point to the hyperplane. • The maximum margin separating hyperplane is a separating hyperplane with maximum possible margin. • The weight (or error) for is the total weight summed over the points misclassified by • A separator l* is optimal if it has minimum weight

Optimal fat separator • Definition 1.3 (Optimal fat separator) • A hyperplane l = (v, v0) is an optimal fat separator for A and B if it is optimal and is also a maximum margin separator for A’(l) and B’(l) . • Analogously define the optimal fat separator with respect to the new separable set that would be obtained if instead of removing the misclassified points, we flip the classes of these points—all our results apply here as well. • Definition 2.1: A separator set Q  A  B • A set with the following property: if the points in Q are deleted, the remaining points are linearly separable. • Definition 2.4: For hyperplane l, the positive separator set Q+(l) • Contains all misclassified points except the positive points (in A) that lie on l.

The best positive separator hyperplane • Lemma 2.6: The optimal positive separator set over all hyperplanes passing through a candidate central point. • Mirrored-radial coordinate • A point s(x) as the projection of x onto the upper hemisphere of the unit circle, through the origin a+. • The mirrored-radial coordinate θ(x) is then the angle of s(x) , i.e., θ (s(x)). • Find l with minimum weight

Maximizing the margin subject to minimum weight • Definition 2.12: For the set A  B, the margin of separator set Q ,mar(Q) , is the margin of the optimal fat separator for A  B \ Q. • Theorem 2.13: An optimal separator set Q* • (1) For any other separator set W(Q*) ≦W(Q), • (2) and if W(Q*) =W(Q), then mar(Q*) ≧mar(Q). • Convex hull

Leave-one-out error • Estimation of the accuracy of learning algorithms • Let ei denote the error of C(i) applied to the input point xi • Three types of xi • Type I: xi is classified correctly by all distinct optimal fat separators constructed for . Such an makes no contribution to the leave-one-out error. • Type II: xi is misclassified by all optimal fat separators constructed for X(i). Such an xi contributes wi to the leave-one-out error. • Type III: There are distinct optimal separator sets for X(i). Let Nc of these occurrences result in fat separators that classify correctly and Ne of them misclassify .

Discussions • Optimal linear boosting of a pair of classification functions • A significant improvement over the brute force exponential algorithm. • A smarter way to enumerate these lines to result in a speed up of a factor close to n. • Extended to maximize the margin among all optimal separator set. • Limitation and future work • Only applies to the 2-D case. This is a significant limitation for linear separation. • Extend the approach further to obtain an optimal boosting of an arbitrary number of classification functions.

Comments • Advantage • A model aggregation method for combining different classifiers. • Mirrored-radial coordinates save the search space for finding optimal separator hyperplane. • Drawback • It lacks extra diagrams to explain lemmas and theorems in detail. • There is no experiment to demonstrate the performance of this method with other related methods in this paper. • Application • Model aggregation for classification related applications.

Efficient Optimal Linear Boosting of a Pair of Classifiers

Efficient Optimal Linear Boosting of a Pair of Classifiers

Presentation Transcript

Machine Learning – Classifiers and Boosting

Linear Classifiers

Boosting Textual Compression in Optimal Linear Time

Linear Pair Postulate

Functions of Classifiers

Boosting of classifiers

Linear Classifiers

Linear Classifiers/SVMs

On the Hardness of Evading Combinations of Linear Classifiers

Non Linear Classifiers

a pair of…

Linear hyperplanes as classifiers

Classification and Linear Classifiers

Optimal Adaptation for Statistical Classifiers

On Optimal Pairwise Linear Classifiers for Normal Distributions

linear pair postulate

Linear Classifiers

Machine Learning – Classifiers and Boosting

Linear classifiers

Non Linear Classifiers