1 / 42

Boosting (2)

Boosting (2). Understanding boosting as an additive model Boosted trees. Boosting. Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible.

spradlin
Download Presentation

Boosting (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boosting (2) Understanding boosting as an additive model Boosted trees

  2. Boosting Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to do feature selection. Good generalization. Could fit noise.

  3. Boosting

  4. Boosting

  5. Boosting 10 predictors The weak classifier is a Stump: a two-level tree.

  6. Boosting Boosting can be seen as fitting an additive model, with the general form: Expansion coefficients Examples of γ: Sigmoidal function in neural networks; A split in a tree model; Basis functions: Simple functions of feature x, with parameters γ

  7. Additive Model (regression)

  8. Boosting In general, such functions are fit by minimizing a loss function This could be computationally intensive. An alternative is to go stepwise, fitting a sub-problem of a single basis function

  9. Boosting Forward stagewise additive modeling --- add new basis functions without adjusting previously added ones. Example: * Squared loss function is not good for classification.

  10. Boosting The version of Adaboost we discussed uses this loss function: The basis functions are individual weak classifiers.

  11. Boosting Margin: y*f(x) >0, correct <0, incorrect The goal of classification – to produce positive margin as much as possible. Negative margin should be penalized more. Exponential penalize negative margin more heavily.

  12. Boosting classifier To be solved: All fixed. Independent from β and G

  13. Boosting Observations are either correctly or incorrectly classified. Then the target function to be minimized is: For any β> 0, Gm has to satisfy: G is the classifier that minimizes the weighted error rate.

  14. Boosting Solving for the Gm will give us a weighted error rate. Plug it back to get β: Update the overall classifier by plugging these in:

  15. Boosting The weight for next iteration becomes: Using Independent of i. Ignored.

  16. Boosting trees Trees partition the space into disjoint regions Rj, j = 1,2,...,J, as represented by the terminal nodes of the tree. A tree is expressed as A boosted tree model is a sum of trees In each step of the boosting procedure, need to find

  17. Boosting trees Finding gamma is easy. Given the Regions, Finding R is difficult. Approximate solutions are found. The Adaboost solution using exponential loss: Find tree that minimizes weighted error rate Gradient boosting is a generalization of Adaboost.

  18. Boosting trees Gradient boosting. Consider the boosting procedure as a stepwise optimization. If the loss function were differentiable, we may find the gradient for optimization. The loss function is f(x) is constrained to be a sum of trees . The gradient is:

  19. Induce a tree T(x;Θm) whose predictions tm are as close as possible to the negative gradient

  20. Boosting trees In regression tree with square loss, -gim=yi-fm-1(xi) Going along the gradient is fitting the residuals with a tree. In classification tree with deviance loss, the logistic model can be used as the link, Trees are induced to predict the corresponding current residuals on the probability scale.

  21. Boosting trees

  22. Boosting trees

  23. Boosting trees

  24. Boosting trees

  25. Boosting trees

  26. Boosting trees

  27. Boosting trees

  28. Boosting trees

  29. Boosting trees

  30. Boosting trees

  31. Boosting trees

  32. Boosting trees

  33. Boosting trees

  34. Boosting trees

  35. Boosting trees

  36. Boosting trees

  37. Boosting trees

  38. Boosting trees Failure in bagging a single-level tree.

  39. Boosted trees and Random Forest Example comparing RF to boosted trees.

  40. Boosted trees and Random Forest Example comparing RF to boosted trees.

  41. probability that a relevant variable will be selected Boosted trees and Random Forest However, when the number of relevant variables increases, the performance of random forests is robust to an increase in the number of noise variables.

  42. Boosted trees and Random Forest Thesameideawasappliedtogradientboosting: Stochasticgradientboosting: -subsamplerowsbeforecreatingeachtree -subsamplecolumnsbeforecreatingeachtree -subsamplecolumnsbeforeconsideringeverysplit

More Related