200 likes | 335 Views
This presentation, delivered by Lindsay Stetson at the ACM Conference on Knowledge Discovery and Data Mining, explores the development of accurate and interpretable models in classification and regression tasks. It emphasizes the importance of intelligibility in applied fields such as biology, physics, and medicine, highlighting the use of Generalized Additive Models (GAMs) as a solution. The session includes background motivation, an overview of experimental design, results from various modeling techniques, and conclusions that underscore the performance of bagged and boosted trees in achieving high accuracy while maintaining interpretability.
E N D
Intelligible Models for Classification and Regression Yin Lou, Rich Caruana, Johannes Gerhke2012 ACM Conference on Knowledge Discovery and Data Mining Presented by Lindsay Stetson
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Background • Linear Model • Regression: y = β0+ β1x1 + … + βnxn • Classification: y = logit(β0 + β1x1 + … + βnxn) • Easy to interpret, intelligible, but less accurate • Complex Model (SVM, Random Forest, Neural Networks) • y = (x1, …, xn) • More accurate, but usually unintelligble
Goals of Work “…construct accurate models that are interpretable.” Intelligibility is important! In applied fields like biology, physics, and medicine we need to understand the individual contributions of the features in the model.
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Generalized Additive Model • Regression: y = f1(x1) + … + fn(xn) • Classification: y = logit(f1(x1) + … + fn(xn)) • Each feature gets shaped by a function fi • Goal: Accurate and intelligble
Fitting Generalized Additive Models • Splines (SP) • Single Tree (TR) • Bagged Trees (bagTR) • Boosted Trees (bstTR) • Boosted Bagged Trees (bbTR)
Learning Methods • Least Squares (P-LS/P-IRLS) • Backfitting (BF) • Gradient Boosting (BST)
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Outline • Background and Motivation • Generalized Additive Models • Experimental Overview • Results • Conclusion
Conclusion • Generalized additive models are accurate and intelligible • Trees have a low bias but a high variance • Bagging reduces the variance, making the trees methods high performers • Bagged trees, with a low number of leaves, that are gradient boosted are the most accurate