Create Presentation
Download Presentation

Download Presentation
## Margin Trees for High-dimensional Classification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Margin Trees for High-dimensional Classification**Tibshirani and Hastie**Errata (confirmed by Tibshirani)**• Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).**Targeted Problem**• Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients**T**( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer**Higher Interpretability**• Multi-class problems reduce to binary • 1vs1 voting not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)**Training**Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}**Tree Structure(1/2)**• Top-down Construction • Greedy**Greedy (1/3)**1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!**Greedy (2/3)**1,2,3 2,3 • Repeat in child nodes.**Greedy (2/3)**1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin**Tree Structure(2/2)**• Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.**Complete Tree**Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree**Efficient Greedy Tree Construction**• Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0**Comparable testing performances (also 1vs1 voting)**• Complete linkage tree more balance more interpretable**T**T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function**Feature Selection**• Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM force βi to zero**T**T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0**Discussion**• Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).