Additive Models and Trees. Lecture Notes for CMPUT 466/551 Nilanjan Ray. Principal Source: Department of Statistics, CMU. Topics to cover. GAM: Generalized Additive Models CART: Classification and Regression Trees MARS: Multiple Adaptive Regression Splines. Generalized Additive Models. - PowerPoint PPT Presentation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Additive Models and Trees
Lecture Notes for CMPUT 466/551
Nilanjan Ray
Principal Source: Department of Statistics, CMU
What is GAM?
The functions fjare smoothing functions in general, such as splines, kernel
functions, linear functions, and so on…
Each function could be different, e.g., f1 can be linear, f2 can be a natural
spline, etc.
Compare GAM with Linear Basis Expansions (Ch. 5 of [HTF])
Similarities? Dissimilarities?
Any similarity (in principle) with Naïve Bayes model?
Backfitting algorithm
Until the functions change less than a prespecified threshold
Computational Advantage?
Convergence?
How to choose fitting functions?
Model:
Additive Logistic Regression: Backfitting
Fitting logistic regression (P99)
Fitting additive logistic regression (P262)
1. where
1.
2.
2.
Iterate:
Iterate:
a.
a.
b.
b.
c.
Using weighted least squares to fit a linear model to zi with weights wi, give new estimates
c. Using weighted backfitting algorithm to fit an additive model to zi with weights wi, give new estimates
3. Continue step 2 until converge
3.Continue step 2 until converge
Sensitivity: Probability of predicting spam given true state is spam =
Specificity: Probability of predicting email given true state is email =
Note that this is still an additive model
Penalty on the complexity/size of the tree
Cost: sum of squared errors
Node impurity measures versus class proportion for 2-class problem
Construct B number of trees from B bootstrap samples– bootstrap trees
is computed from the bth bootstrap sample
in this case a tree
Bagging reduces the variance of the original tree by aggregation
Majority vote
Average
|C| = 2 * N * p
M (new)
M (old)
C
The final model M typically overfits the data
=>Need to reduce the model size (# of terms)
Backward deletion procedure
Choose the model size with minimum GCV.
X[i-1]
X[i]
X[i+1]
X[i+2]
e.g. (Xj - t1) * (Xj - t1) is not considered
IF
THEN
MARS forward procedure
= CART tree growing procedure