Margin trees for high dimensional classification
Download
1 / 20

Margin Trees for High-dimensional Classification - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

Margin Trees for High-dimensional Classification. Tibshirani and Hastie. Errata (confirmed by Tibshirani). Section 2 (a) about the property of 'single linkage‘. M should be M 0 Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Margin Trees for High-dimensional Classification' - india


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Errata confirmed by tibshirani
Errata (confirmed by Tibshirani)

  • Section 2 (a) about the property of 'single linkage‘. M should be M0

  • Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

  • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).


Targeted problem
Targeted Problem

  • Multi-class

    • #class >> 2

  • High-dimensional, few samples

    • #features >> #data linear separable

    • already good accuracy, need interpretable model

  • Ex. micro-array data

    • feature : gene expression measurement

    • class: type of cancer

    • Instances: patients


Learn a highly interpretable structure for domain experts

T

(

)

¯

¯

S

i

+

x

g

n

x

0

Learn a Highly Interpretable Structure for Domain Experts

Check certain genes

Help create the link of gene to cancer


Higher interpretability
Higher Interpretability

  • Multi-class problems  reduce to binary

    • 1vs1 voting  not meaningful

    • tree representation

  • Non-linear-separable data

    • single non-linear classifier

    • organized teams of linear classifiers

  • Solution:

    • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)


Using margin tree

Training

Construct tree structure

Train max-margin classifier at each splitter

Testing

Start from root node

Going down following the prediction of classifiers at splitting points

ex. Right, Right  class: 3

Using margin-Tree

{1} vs{2,3}

{2} vs {3}


Tree structure 1 2
Tree Structure(1/2)

  • Top-down Construction

    • Greedy


Greedy 1 3
Greedy (1/3)

1,2,3

  • Starting from root with all classes {1,2,3}

  • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!


Greedy 2 3
Greedy (2/3)

1,2,3

2,3

  • Repeat in child nodes.


Greedy 2 31
Greedy (2/3)

1,2,3

2,3

  • Done!

  • Warning: Greedy not necessary lead to global optimum

  • i.e. find out the global maximal margin


Tree structure 2 2
Tree Structure(2/2)

  • Bottom-up Treeiteratively merge closest groups.

    • Single linkage: distance = nearest pair.

    • Complete linkage: distance = farthest pair.



Complete tree1
Complete Tree

Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree)

When looking for a Margin > Height(substree), never break classes in the subtree


Efficient greedy tree construction
Efficient Greedy Tree Construction

  • Construct a complete linkage tree T

  • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest)

  • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3}

M0



Recall the cutting plane

T

T

(

)

¯

¯

¯

¯

D

S

i

i

i

0

+

+

e

c

x

s

o

n

g

n

x

=

=

0

0

Recall the cutting plane

βis the weight of features in decision function


Feature selection
Feature Selection

  • Hard-thresholding at each split

    • Discard n features with low abs(βi) by setting βi=0

    • Proportional to margin: n = α|Margin|

    • α chosen by cross-validation error

  • βunavailable using non-linear kernel

  • Alternative methods

    • L1-norm SVM  force βi to zero


Setting i 0

T

T

(

)

¯

¯

¯

¯

¯

D

S

i

i

i

0

+

+

e

c

x

s

o

n

g

n

x

=

=

0

0

Setting βi=0



Discussion
Discussion

  • Good for multi-class, high-dimensional data

  • Bad for non-linear separable data.

    • Each node will contain impure dataimpure β

  • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).


ad