1 / 24

Decision Tree Pruning

Decision Tree Pruning. Problem Statement. We like to output small decision tree Model Selection The building is done until zero training error Option I : Stop Early Small decrease in index function Cons: may miss structure Option 2: Prune after building. Pruning. Input: tree T

essiem
Download Presentation

Decision Tree Pruning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Tree Pruning

  2. Problem Statement • We like to output small decision tree • Model Selection • The building is done until zero training error • Option I : Stop Early • Small decrease in index function • Cons: may miss structure • Option 2: Prune after building.

  3. Pruning • Input: tree T • Sample: S • Output: Tree T’ • Basic Pruning: T’ is a sub-tree of T • Can only replace inner nodes by leaves • More advanced: • Replace an inner node by one of its children

  4. Reduced Error Pruning • Split the sample to two part S1 and S2 • Use S1 to build a tree. • Use S2 to sample whether to prune. • Process every inner node v • After all its children has been process • Compute the observed error of Tv and leaf(v) • If leaf(v) has less errors replace Tv by leaf(v)

  5. Reduced Error Pruning: Example

  6. Pruning: CV & SRM • Generate for each pruning size • compute the minimal error pruning • At most m different sub-trees • Select between the prunings • Cross Validation • Structural Risk Minimization • Any other index method

  7. Finding the minimum pruning • Procedure Compute • Inputs: • k : number of errors • T : tree • S : sample • Output: • P : pruned tree • size : size of P

  8. Procedure compute • IF IsLeaf(T) • IF Errors(T)  k • THEN size=1 • ELSE size =  • P=T; return; • IF Errors(root(T))  k • size=1; P=root(T); return;

  9. Procedure compute • For i = 0 to k DO • Call Compute(i, T[0], S0, sizei,0,Pi.0) • Call Compute(k-i, T[1], S1, sizei,1,Pi.1) • size = minimum {sizei,0 + sizei,1 +1} • I = arg min {sizei,0 + sizei,1 +1} • P = MakeTree(root(T),PI,0, PI,1} • What is the time complexity?

  10. Cross Validation • Split the sample S1 and S2 • Build a tree using S1 • Compute the candidate pruning • Select using S2 • Output the tree with smallest error on S2

  11. SRM • Build a Tree T using S • Compute the candidate pruning • kd the size of the pruning with d errors • Select using the SRM formula

  12. Drawbacks • Running time • Since |T| = O(m) • Running time O(m2) • Many passes over the data • Significant drawback for large data sets

  13. Linear Time Pruning • Single Bottom-up pass • linear time • Use SRM like formula • Local soundness • Competitiveness to any pruning

  14. Algorithm • Process a node after processing its children • Local parameters: • Tv current sub-tree at v, of size sizev • Sv sample reaching v, of size mv • lv length of path leading to v • Local Test: obs(Tv,Sv) + a(mv,sizev,lv,) > obs (root(Tv),Sv)

  15. The function a() • Parameters: • paths(H,l) set of paths of length l over H. • trees(H,s) set of trees of size s over H. • Formula:

  16. The function a() • Finite Class H • |paths(H,l)| < |H|l. • |trees(H,s)| < (4|H|)s. • Formula: • Infinite Classes: VC-dim

  17. Example m lv =3 mv a(mv,sizev,lv,) sizev

  18. Local uniform convergence • Sample S • Sc = { x S |c(x)=1}, mc=|Sc| • Finite classes C and H • e(h|c) = Pr[ h(x) f(x) | c(x)=1 ] • obs(h|c) • Lemma: with probability 1-

  19. Global Analysis • Notation • Toriginal tree (depends on S) • T*pruned tree • Toptoptimal tree • rv= (lv+sizev)log|H| +log (mv/) • a(mv,sizev,lv,)= O( sqrt{ rv/mv }) 1

  20. Sub-Tree Property • Lemma: with probability 1- T* is a sub-tree of Topt • Proof: • Assume the all the local lemmas hold. • Each pruning reduces the error. • Assume T* has a subtree outside Topt • Adding that subtree to Topt will improve it!

  21. Comparing T*and Topt • Additional pruned nodes: V={v1, … , vt} • Additional error: • e(T*) - e(Topt) =  (e(vi)-e(T*vi))Pr[vi] • Claim: With high probability

  22. Analysis • Lemma: With probability 1- • If Pr[vi] > 12(lopt log |H|+ log 1/ )/m =b • THEN Pr[vi] > 2obs(vi) • Proof: • Relative Chernoff Bound. • Union over |H|lpaths. • V’ = {viV | Pr[vi]>b}

  23. Analysis of  • Sum over V-V’ bounded by sopt b

  24. Analysis of  • Sum of mv < loptsizeopt • Sum of rv <sizeopt(lopt log |H|+ log m/ ) • Putting it all together

More Related