1 / 23

Oblique Decision Trees Using Householder Reflection

Oblique Decision Trees Using Householder Reflection. Chitraka Wickramarachchi Dr. Blair Robertson Dr. Marco Reale Dr. Chris Price Prof. Jennifer Brown. Outline of the Presentation. Introduction Literature Review Methodology Results and Discussion.

burt
Download Presentation

Oblique Decision Trees Using Householder Reflection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oblique Decision Trees Using Householder Reflection ChitrakaWickramarachchi Dr. Blair Robertson Dr. Marco Reale Dr. Chris Price Prof. Jennifer Brown

  2. Outline of the Presentation • Introduction • Literature Review • Methodology • Results and Discussion

  3. Example: A bank wants to predict the potential status (Default or not) of a new credit card customer For the existing customers the bank has following data Introduction Possible approach - Generalized Linear models with binomial errors Model become complex if the structure of the data is complex.

  4. Decision tree is a tree structured classifier. Decision Tree (DT) Root Node Salary <= s Test based on features TCA < tc Non-Terminal Node TLA < tl ND D D Terminal Node

  5. Recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class Partitions X2 X1

  6. Choosing the best split X2 <= 0.6819 X1<= 0.4026 0.0345 X1<= 0.5713 0.1586 0.1586 0.1412 0.1224 0.0654 0.0015 0.0895 0.1123 0.0221 0.1546 0.1546

  7. Types of DTs Decision Trees Univariate DT Multivariate DT Non-Linear DT Linear DT Axis parallel splits Oblique splits

  8. Axis parallel splits Advantages • Easy to implement • Computer complexity is low • Easy to interpret Disadvantage • When the true boundaries are not axis parallel it produces complicated boundary structure

  9. Axis parallel boundaries X2 X1

  10. Oblique splits Advantage - Simple boundary structure X2 X1

  11. Oblique splits Disadvantages • Implementation is challenging • Computer complexity is high X2 X1 Therefore computationally less expensive oblique tree induction method would be desirable

  12. Literature Review Oblique splits search for splits in the form of Breiman et al. (1984) • CART – LC • Starts with the best axis parallel split • Perturb each coefficient until find the best split Limitations • Can get trapped in local mimina • No upper bound on the time spent at any node

  13. Literature Review Heath et al. (1993) • Simulated annealing Decision Trees (SADT) • First places a hyperplane in a canonical location • Perturb each coefficient randomly By randomization - try to escape from the local mimima Limitations Algorithm runs much slower than CART- LC

  14. Literature Review Murthy et al. (1994) • Oblique Classifier 1 (OC1) • Start with the best axis parallel split • Perturb each coefficient • At a local mimima, perturb the hyperlane randomly Since 1994, there are many ODT induction methods have been developed based on EA algorithms and neural network concept

  15. Proposed Methodology Our approach is to • Transform the data set parallel to one of the feature axes • Implement axis parallel splits • Back-transform them in to the original space Transformation is done using Householder reflection.

  16. Householder Reflection Let X and Y are vectors with the same norm there exists orthogonal symmetric matrix P such that where

  17. Householder Reflection Orientation of a cluster can be represented by the dominant Eigen vector of its variance covariance matrix. X2 X1

  18. Householder Reflection

  19. Householder Reflection X2 X1

  20. Cost-complexity pruning To avoid over-fitting Accuracy Number of Terminal Nodes

  21. Results and Discussion Data sets - UCI Machine Learning Repository Estimate of the accuracy was obtained by ten 5-fold cross validation experiments.

  22. Results and Discussion • Results • High accuracy • Computationally inexpensive

  23. THANK YOU

More Related