1 / 20

# Margin Trees for High-dimensional Classification - PowerPoint PPT Presentation

Margin Trees for High-dimensional Classification. Tibshirani and Hastie. Errata (confirmed by Tibshirani). Section 2 (a) about the property of 'single linkage‘. M should be M 0 Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Margin Trees for High-dimensional Classification' - india

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Margin Trees for High-dimensional Classification

Tibshirani and Hastie

• Section 2 (a) about the property of 'single linkage‘. M should be M0

• Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

• The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).

• Multi-class

• #class >> 2

• High-dimensional, few samples

• #features >> #data linear separable

• already good accuracy, need interpretable model

• Ex. micro-array data

• feature : gene expression measurement

• class: type of cancer

• Instances: patients

(

)

¯

¯

S

i

+

x

g

n

x

0

Learn a Highly Interpretable Structure for Domain Experts

Check certain genes

Help create the link of gene to cancer

• Multi-class problems  reduce to binary

• 1vs1 voting  not meaningful

• tree representation

• Non-linear-separable data

• single non-linear classifier

• organized teams of linear classifiers

• Solution:

• Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)

Construct tree structure

Train max-margin classifier at each splitter

Testing

Start from root node

Going down following the prediction of classifiers at splitting points

ex. Right, Right  class: 3

Using margin-Tree

{1} vs{2,3}

{2} vs {3}

• Top-down Construction

• Greedy

1,2,3

• Starting from root with all classes {1,2,3}

• find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!

1,2,3

2,3

• Repeat in child nodes.

1,2,3

2,3

• Done!

• Warning: Greedy not necessary lead to global optimum

• i.e. find out the global maximal margin

• Bottom-up Treeiteratively merge closest groups.

• Single linkage: distance = nearest pair.

• Complete linkage: distance = farthest pair.

Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree)

When looking for a Margin > Height(substree), never break classes in the subtree

• Construct a complete linkage tree T

• Estimate current lower bound of maximal margin M0= max Margin(individual class, rest)

• To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3}

M0

T

(

)

¯

¯

¯

¯

D

S

i

i

i

0

+

+

e

c

x

s

o

n

g

n

x

=

=

0

0

Recall the cutting plane

βis the weight of features in decision function

• Hard-thresholding at each split

• Discard n features with low abs(βi) by setting βi=0

• Proportional to margin: n = α|Margin|

• α chosen by cross-validation error

• βunavailable using non-linear kernel

• Alternative methods

• L1-norm SVM  force βi to zero

T

(

)

¯

¯

¯

¯

¯

D

S

i

i

i

0

+

+

e

c

x

s

o

n

g

n

x

=

=

0

0

Setting βi=0

• Good for multi-class, high-dimensional data

• Bad for non-linear separable data.

• Each node will contain impure dataimpure β

• Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).