1 / 31

Additive Models, Trees, and Related Methods

Additive Models, Trees, and Related Methods. 2006. 02. 17. Partly based on Prof. Prem Goel ’ s Slides. 9.1 Generalized Additive Models. Mean function: f j : unspecified smooth (nonparametric) functions Relate conditional mean of Y to an additive function of X ’ s via a link function g.

myrna
Download Presentation

Additive Models, Trees, and Related Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Additive Models, Trees, and Related Methods 2006. 02. 17. Partly based on Prof. Prem Goel’s Slides

  2. 9.1 Generalized Additive Models • Mean function: • fj: unspecified smooth (nonparametric) functions • Relate conditional mean of Y to an additive function of X’s via a link function g.

  3. Standard Link Functions

  4. Advanced Link Functions

  5. Fitting Additive Models • Fit each fj using scatterplot smoother and estimate all p functions simultaneously • For example, the cubic smoothing spline as smoother • Criterion: penalized sum of squares (9.7) • An additve cubic spline model minimizes this • Each fj is cubic spline in the component Xj • Knots at each of the unique values xij

  6. The backfitting algorithm • Can accommodate other fitting methods in same way, by specifying appropritate smoothing operator Sj. • For a large class of linear smoothers, backfitting is equivalent to a Gauss-Seidel algorithm

  7. Additive Logistic Regression • For the logistic regression model and other generalized additive models, the appropriate criterion is a penalized log-likelihood. • To maximize it, the backfitting procedure is used in conjunction with a likelihood maximizer.

  8. Local Scoring Algorithm for the Additive Logistic Regression

  9. Partition the feature space into a set of rectangles and fit a simple model in each one. CART and C4.5 9.2 Tree-Based Methods

  10. Regression Tree • Assume recursive binary partition • In each partition, Y is modeled with a different constant. • For each split, choose the variable and split-point which minimizes sum of squares. • Repeat with each subset, until reach a minimum node size

  11. Regression Tree • How large should we grow the tree? • Cost-complexity pruning • Find tree which minimizes • Choosing  adaptively by weakest link pruning • Collapse the smallest per-node increase in RSS until we get the single-node tree. • Among these sequence of trees, there exists a tree that minimizes cost-complexity • Cross-validation

  12. Classification Trees • Only change in the criteria to split nodes and pruning the tree.

  13. Node Impurity Measures • Cross-entropy and Gini index are more sensitive to changes in the node probabilities than the misclassification rate. • Either cross-entropy and Gini index should be used when growing the tree. • When pruning, any of the three can be used.

  14. Other Issues • Instability • Hierarchical process: error on the upper split is propagated down. • Bagging • Lack of smoothness in prediction surface. • Can degrade performance in regression. • MARS • ROC curves • By varying relative sizes of the losses L01 and L10 in loss matrix, increase/decrease the sensitivity/specificity

  15. 9.3 PRIM-Bump Hunting • Patient Rule Induction Method • Seeks boxes in which the response average is high. • Not binary split • Hard to interpret the collection of rules. • Individual rule is simpler. • Patient • Do not fragment the data quickly as binary partition. • Can help the top-down greedy algorithm find a better solution.

  16. PRIM

  17. PRIM

  18. Basic element – pair of piecewise linear basis function Form each reflected pairs for each input Xj with knots at each observed value of that input. Total 2Np basis functions 9.4 MARS: Multivariate Adaptive Regression Splines

  19. Model Building

  20. Forward Selection

  21. General Basis Selection Rule

  22. Backward Deletion

  23. Effective # of Parameters

  24. Other Issues • MARS for classification • Two classes: 0/1 code and regression • More than two classes: optimal scoring (Section 12.5) • MARS vs. CART • Piecewise linear basis vs. step functions • Multiplication vs. splitting • Not necessarily binary splitting.

  25. 9.5 Hierarchical Mixtures of Experts • Soft gating network with expert at terminal node.

  26. Hierarchical Mixtures of Experts

  27. Hierarchical Mixtures of Experts • Estimation of parameters • EM algorithm • E-step: compute expectations of gating probabilities • M-step: estimate the parameters in the expert networks by multiple logistic regression. • HME vs. CART • Similar to CART with linear combination splits. • Soft split: better to model gradual response transition • No method to find a good tree topology for HME

  28. 9.6 Missing Data • Whether the missing data mechanism distorted the observed data. • Missing at random(MAR) – missing data mechanism is independent of the observed data. • Missing completely at random(MCAR) – missing data mechanism is independent of data.

  29. Missing Data • Assuming MCAR • Discard observations with any missing values. • Rely on the learning algorithm to deal with missing values in its training phase. • Impute all missing values before training.

  30. 9.7 Computational Considerations • Additive Model fitting: O(mpN+pNlogN), m is # of iterations. • Trees: O(pNlogN) for initial sorting and split computation) • MARS: O(NM2+pM2N), M is # of terms • HME: O(Np2) for the regression, Np2K2(EM algorith takes long to converge)

More Related