Ensemble Learning in Machine Learning: Random Forest & Boosting

Random Forest Boosting (1) Ensemble learning Reminder - Bagging of Trees Random Forest Adaboost

Ensemble learning Aggregating a group of classifiers (“base classifiers”) as an ensemble committee and making the prediction by consensus. Weak learner ensembles (each base learner has high EPE, but is easy to train): CurrentBioinformatics, 5, (4):296-308, 2010.

Ensemble learning Strong learner ensembles (“Stacking” and beyond): CurrentBioinformatics, 5, (4):296-308, 2010.

Ensemble learning Why? Statistical A learning algorithm searches a space of hypotheses for the best fit to the data. With insufficient data (almost always), the algorithm can find many equally good solutions. Averaging reduces risk. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Ensemble learning Why? (2) Computational Modern learning algorithms represent complicated optimization problems. Often a search cannot guarantee global optimum. Ensemble can be seen as running the search from many starting points. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Ensemble learning Why? (3) Representational A true function may not be represented by any of the (group of) hypotheses. Ensemble expands the space of representable functions. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Reminder - Bootstrapping • Directly assess uncertainty from the training data Basic thinking: assuming the data approaches true underlying density, re-sampling from it will give us an idea of the uncertainty caused by sampling

Bagging “Bootstrap aggregation.” Resample the training dataset. Build a prediction model on each resampled dataset. Average the prediction. It’s a Monte Carlo estimate of , where is the empirical distribution putting equal probability 1/N on each of the data points. Bagging only differs from the original estimate when f() is a non-linear or adaptive function of the data! When f() is a linear function, Tree is a perfect candidate for bagging – each bootstrap tree will differ in structure.

Bagging trees Bagged trees are of different structure.

Random Forest Bagging can be seen as a method to reduce variance of an estimated prediction function. It mostly helps high-variance, low-bias classifiers. Comparatively, boosting build weak classifiers one-by-one, allowing the collection to evolve to the right direction. Random forest is a substantial modification to bagging – build a collection of de-correlated trees. - Similar performance to boosting - Simpler to train and tune compared to boosting

Random Forest The intuition – the average of random variables. B i.i.d. random variables, each with variance The mean has variance B i.d. random variables, each with variance , with pairwise correlation , The mean has variance ------------------------------------------------------------------------------------- Bagged trees are similar to i.d.samples. Random forest aims at reducing the correlation to reduce variance. This is achieved by random selection of variables.

Random Forest

Random Forest Benefit of RF – out of bag (OOB) sample  cross validation error. For sample i, find its RF error from only trees built from samples where sample i did not appear. The OOB error rate is close to N-fold cross validation error rate. Unlike many other nonlinear estimators, RF can be fit in a single sequence. Stop growing forest when OOB error stabilizes.

Random Forest Variable importance – find the most relevant predictors. At every split of every tree, a variable contributed to the improvement of the impurity measure. Accumulate the reduction of i(N) for every variable, we have a measure of relative importance of the variables. The predictors that appears the most times at split points, and lead to the most reduction of impurity, are the ones that are important. ------------------ Another method – Permute the predictor values of the OOB samples at every tree, the resulting decrease in prediction accuracy is also a measure of importance. Accumulate it over all trees.

Random Forest

Random Forest Finding interactions between variables? Y=sin(2V2)+V52+V2V5+V8V9+|V9|

Boosting Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to do feature selection. Good generalization. Could fit noise.

Boosting Adaboost:

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting

Boosting This is the weight of the current weak classifier in the final model. This weight is for individual observations. Notice it is stacked from step 1. If an observation is correctly classified at this step, its weight doesn’t change. If incorrectly classified, its weight increases.

Boosting

Boosting 10 predictors The weak classifier is a Stump: a two-level tree.

Ensemble Learning in Machine Learning: Random Forest & Boosting

Ensemble Learning in Machine Learning: Random Forest & Boosting

Presentation Transcript

Random Forest

Random Forest

Random variables, Random processes

Random Sampling - Random Samples

Improving Protein-Ligand Binding Affinity Prediction using Random Forest

Random Forest 101

Unsupervised Learning with Random Forest Predictors: Applied to Tissue Microarray Data

Random

Random Forest Photometric Redshift Estimation

Classification and Regression trees: CART BOOSTING AND BAGGING RANDOM FOREST

Last Week– Voting, Hashing, and Random Forest techniques

Random Forest for Metric Learning with Pairwise Position Dependence

Identifying Feature Relevance Using a Random Forest

RANDOM

Using a Random Forest model to predict enrollment

random

Facts About Random Forest - And Why They Matter

Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science Training | Edureka

Last Week– Voting, Hashing, and Random Forest techniques

Random Sampling - Random Samples

Weather Prediction Model using Random Forest Algorithm and Apache Spark