1 / 27

Near-Minimax Optimal Learning with Decision Trees

Near-Minimax Optimal Learning with Decision Trees. Rob Nowak and Clay Scott. University of Wisconsin-Madison and Rice University. nowak@engr.wisc.edu. Supported by the NSF and the ONR. Basic Problem. Classification : build a decision rule based on labeled training data.

axelle
Download Presentation

Near-Minimax Optimal Learning with Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Near-Minimax Optimal Learning with Decision Trees Rob Nowak and Clay Scott University of Wisconsin-Madison and Rice University nowak@engr.wisc.edu Supported by the NSF and the ONR

  2. Basic Problem Classification: build a decision rule based on labeled training data Given n training points, how well can we do ?

  3. Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99

  4. Dyadic Thinking about Classification Trees recursive dyadic partition

  5. Dyadic Thinking about Classification Trees Pruned dyadic partition Pruned dyadic tree Hierarchical structure facilitates optimization

  6. The Classification Problem Problem:

  7. Classifiers The Bayes Classifier: Minimum Empirical Risk Classifier:

  8. Generalization Error Bounds

  9. Generalization Error Bounds

  10. Generalization Error Bounds

  11. Selecting a good h

  12. Convergence to Bayes Error

  13. Ex. Dyadic Classification Trees Bayes decision boundary labeled training data pruned RDP complete RDP Dyadic classification tree

  14. 0 0 0 1 1 0 0 1 1 1 1 Codes for DCTs code-lengths: ex: code: 0001001111 + 6 bits for leaf labels

  15. Compare with CART: Error Bounds for DCTs

  16. Rate of Convergence Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99 C. Scott & RN ‘02

  17. Why too slow ? because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced all |T| leaf trees are equally favored

  18. Local Error Bounds in Classification Spatial Error Decomposition: Mansour & McAllester ‘00

  19. Relative Chernoff Bound

  20. Relative Chernoff Bound

  21. Local Error Bounds in Classification

  22. Bounded Densities

  23. Global vs. Local Key: local complexity is offset by small volumes!

  24. Local Bounds for DCTs

  25. Unbalanced Tree Global bound: J leafs depth J-1 Local bound:

  26. Mammen & Tsybakov ‘99 C. Scott & RN ‘03 Convergence to Bayes Error

  27. Concluding Remarks ~ data dependent bound Neural Information Processing Systems 2002, 2003 nowak@engr.wisc.edu

More Related