1 / 12

Efficient Optimal Linear Boosting of a Pair of Classifiers

Efficient Optimal Linear Boosting of a Pair of Classifiers. Victor Boyarshinov and Malik Magdon-Ismail, IEEE Transaction on neural networks, Vol. 18, No. 2, 2007, pp. 317-328. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 200 7 / 5/16. Outline. Introduction

rwhitlock
Download Presentation

Efficient Optimal Linear Boosting of a Pair of Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Optimal Linear Boostingof a Pair of Classifiers Victor Boyarshinov and Malik Magdon-Ismail, IEEE Transaction on neural networks, Vol. 18, No. 2, 2007, pp. 317-328. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/5/16

  2. Outline • Introduction • Optimal fat separators in two dimension • Mirrored-radial coordinates • Maximizing the Margin Subject to Minimum Weight • Leave-one-out error • Discussions • Comments

  3. Motivation • MODEL aggregation • Enhancing the statistical performance of a set of weak classifiers to obtain a stronger classifier. • For example, boosting and bagging. • Combinatorial problem of boosting a pair of classifiers • What is the optimal linear combination of this pair of classifiers?

  4. Objective • Optimal linear separation (l) • the optimal linear boosted classifier for g1 and g2 classifier (for whatever optimal means) in terms of its corresponding optimal linear classifier in the 2-D space.

  5. A (linearly) boosted classification function • A (linearly) boosted classification function g(z) • g(z) = w0+w1g1(z) + w2g2(z) • The corresponding classifier is • sign(g(z)) = sign(w0+w1g1(z) + w2g2(z)) • Each data point zid can be mapped onto this 2-D feature space by { zi , yi}  • A linearly boosted classifier corresponds exactly to a linear separator in this 2-D space.

  6. Optimal linear separation in 2 • Efficient algorithms for exact linear separation • Applicable to the case where the data points are not linearly separable. Definition 1.2: Two sets A and B are linearly separable iff v, v0 such that vTx + v0 > 0, xAand vTx + v0 < 0, xB . The pair (v, v0) defines an (oriented) separating hyperplane. • If two sets and are linearly separable, the margin of a separating hyperplane is the minimum distance of a data point to the hyperplane. • The maximum margin separating hyperplane is a separating hyperplane with maximum possible margin. • The weight (or error) for is the total weight summed over the points misclassified by • A separator l* is optimal if it has minimum weight

  7. Optimal fat separator • Definition 1.3 (Optimal fat separator) • A hyperplane l = (v, v0) is an optimal fat separator for A and B if it is optimal and is also a maximum margin separator for A’(l) and B’(l) . • Analogously define the optimal fat separator with respect to the new separable set that would be obtained if instead of removing the misclassified points, we flip the classes of these points—all our results apply here as well. • Definition 2.1: A separator set Q  A  B • A set with the following property: if the points in Q are deleted, the remaining points are linearly separable. • Definition 2.4: For hyperplane l, the positive separator set Q+(l) • Contains all misclassified points except the positive points (in A) that lie on l.

  8. The best positive separator hyperplane • Lemma 2.6: The optimal positive separator set over all hyperplanes passing through a candidate central point. • Mirrored-radial coordinate • A point s(x) as the projection of x onto the upper hemisphere of the unit circle, through the origin a+. • The mirrored-radial coordinate θ(x) is then the angle of s(x) , i.e., θ (s(x)). • Find l with minimum weight

  9. Maximizing the margin subject to minimum weight • Definition 2.12: For the set A  B, the margin of separator set Q ,mar(Q) , is the margin of the optimal fat separator for A  B \ Q. • Theorem 2.13: An optimal separator set Q* • (1) For any other separator set W(Q*) ≦W(Q), • (2) and if W(Q*) =W(Q), then mar(Q*) ≧mar(Q). • Convex hull

  10. Leave-one-out error • Estimation of the accuracy of learning algorithms • Let ei denote the error of C(i) applied to the input point xi • Three types of xi • Type I: xi is classified correctly by all distinct optimal fat separators constructed for . Such an makes no contribution to the leave-one-out error. • Type II: xi is misclassified by all optimal fat separators constructed for X(i). Such an xi contributes wi to the leave-one-out error. • Type III: There are distinct optimal separator sets for X(i). Let Nc of these occurrences result in fat separators that classify correctly and Ne of them misclassify .

  11. Discussions • Optimal linear boosting of a pair of classification functions • A significant improvement over the brute force exponential algorithm. • A smarter way to enumerate these lines to result in a speed up of a factor close to n. • Extended to maximize the margin among all optimal separator set. • Limitation and future work • Only applies to the 2-D case. This is a significant limitation for linear separation. • Extend the approach further to obtain an optimal boosting of an arbitrary number of classification functions.

  12. Comments • Advantage • A model aggregation method for combining different classifiers. • Mirrored-radial coordinates save the search space for finding optimal separator hyperplane. • Drawback • It lacks extra diagrams to explain lemmas and theorems in detail. • There is no experiment to demonstrate the performance of this method with other related methods in this paper. • Application • Model aggregation for classification related applications.

More Related