1 / 20

# Margin Trees for High-dimensional Classification

Margin Trees for High-dimensional Classification. Tibshirani and Hastie. Errata (confirmed by Tibshirani). Section 2 (a) about the property of 'single linkage‘. M should be M 0 Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

## Margin Trees for High-dimensional Classification

E N D

### Presentation Transcript

1. Margin Trees for High-dimensional Classification Tibshirani and Hastie

2. Errata (confirmed by Tibshirani) • Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).

3. Targeted Problem • Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients

4. T ( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer

5. Higher Interpretability • Multi-class problems  reduce to binary • 1vs1 voting  not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)

6. Training Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right  class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}

7. Tree Structure(1/2) • Top-down Construction • Greedy

8. Greedy (1/3) 1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!

9. Greedy (2/3) 1,2,3 2,3 • Repeat in child nodes.

10. Greedy (2/3) 1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin

11. Tree Structure(2/2) • Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.

12. Complete Tree

13. Complete Tree Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree

14. Efficient Greedy Tree Construction • Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0

15. Comparable testing performances (also 1vs1 voting) • Complete linkage tree more balance  more interpretable

16. T T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function

17. Feature Selection • Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM  force βi to zero

18. T T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0

19. Feature Selection Result

20. Discussion • Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).

More Related