Loading in 2 Seconds...
Loading in 2 Seconds...
An experimental study on a new ensemble method using robust and order statistics. Faisal Zaman and Hideo Hirose Kyushu Institute of Technology Fukuoka, Japan. Outline. Ensemble Learning Popular Ensemble Methods Properties Base Classifiers Combination Rules Diversity
Faisal Zaman and Hideo Hirose
Kyushu Institute of Technology
Ensemble learning is a method to learn decisions of several predictors
on a same problem, then combine these decisions to predict an unseen problem.
An ensemble is preferred over a single classifier for its
Original Training Set
Create Multiple datasets
Built Multiple Classifiers (Ensemble)
Combine the decisions of the classifiers
General Ensemble Architecture
The following ensemble methods are popular and standard among ensemble of classifier :
Bagging (Breiman, 1994).
Adaboost (Freund and Schapire 1995).
Random Forest (Breiman, 2001).
Rotation Forest (Rodriguez et. al, 2006).
Error of an ensemble can be decomposed into three parts
Error = Intrinsic error + Bias + Variance
Ensemble methods usually reduce variance and sometimes the bias also.
In ensemble methods following base classifiers are used:
Classification and Regression Tree (CART)
Neural Network (NN)
Decision Stump (DS)==> CART with 2-3 nodes.
This will select “Class1”, as
two classifiers resulted in
Class1 among three classifiers
Weighted Majority Vote:
Weighted majority will select
“Class2”, as it get the highest
It will select “Class2”, as the average class posteriori probability for Class2
is more than Class1.
Class Posteriori Probability
FIGURE: Kappa-Error Diagram of popular ensemble methods
From General Point of view (only cloud centers are pointed)
Kappa-Error diagram is an efficient method to check the diversity of the base
classifiers of an ensemble. In this diagram, the lower value of kappa, indicates
In this diagram average error of a pair of base classifiers from each ensemble
is plotted against the kappa value of the pair.
To design an ensemble of classifiers one need to ensure:
Higher accuracy of the individual base classifiers. This refers to bias and
variance of the prediction error of each base classifier.
Diversity among the decisions of the base classifiers.
Good combination rule, to combine the decisions of the base classifiers.
Better accuracy can be achieved by training the base classifiers in the whole feature space. To do that we need to construct base classifiers on larger training sets.
To make the base classifiers disagree with each other, each of them should be constructed on independent training sets.
Use of subsampling with high subsample rate, instead of bootstrapping can facilitate us with both these criterion.
It is feasible to use a combination rule which is not susceptible to imbalance
decisions of any base classifier.
Sort the class posteriori probabilities
(CPP) for each class.
2. Trim a portion of them, and average the
3. Select the class with highest average.
In this example, if we sort the CPP for each class and then trim the lowest ones,
average the remaining CPP, the result is 0.65 for both classes. So we can select,
“Class1” or “Class2”.
Sort the CPP for each class.
Take the MAX CPP and MIN CPP for each class.
Compute the average of MAX CPP and MIN CPP.
Select the class with highest average.
In this example the MAX CPP and MIN CPP for Class1 is 0.1 and 0.7, do their
Average is 0.4; similarly for Class2 this value is 0.6. So we will select, “Class2”.
The new ensemble method is constructed as follows:
Use subsampling to generate training sets to train the base classifiers.
Select classifiers on the basis of a rejection-threshold value.
Use a robust or order statistic to combine the decisions of the base classifiers.
We have generated the training sets for each base classifier with subsample
rate 0.75 , i.e., taking 75% observations from the original training set for each
subsample. This ensures that each base classifier is trained on larger subsample.
The rejection-threshold is Leave-One-Out Cross-validation (LOOCV) error
of the base classifier, computed on the original training set. We select
only those base classifiers which have generalization error less than or equal
to this value.
We have used one robust statistic named, “Trimmed Mean” and another
order statistic named, “Spread” as the combination rule.
Compute the LOOCV error of the base classifier C(x) on the training set X, denote this as εLOOCV.
Repeat for l = 1, 2, … L
Generate a subsample Xl from the training set X , with 75% observations from X .
Construct a base classifier Cl on Xl .
Compute the error εl* of Cl on an independent bootstrap Xl*sample from X.
Select classifier Cl, if εl* ≤ εLOOCV , otherwise, train a weaker version of CL. Denote these classifier as C lVALID
5. Combine the selected classifiers C lVALID , l = 1, 2, … L , using Trimmean or Spread combination rule.
Firstly, we wanted to check the performance of the new ensemble method with
Linear base classifiers.
a. We used here 2 linear classifiers: Fisher Linear Discriminant (FLD)
classifier and Logistic Linear Classifier (LogLC).
b. We compared the performance of the new ensemble method with Bagging
c. We also checked whether the ensemble of these linear classifiers achieved
lower error rate than single linear classifier.
2. Secondly, we compared the performance of the new ensemble method with
Bagging, Adaboost, Random Forest and Rotation Forest, with Classification
And Regression Tree (CART) as the base classifier.
Thirdly, check the diversity of the proposed ensemble method using κ-error
diagram, with CART as the base classifier. Also check the relation between the
two combination rules with several diversity measures.
We have used 15 datasets from UCI Machine Learning Repository.
We have used mean of 10 repetitions of 10 fold cross-validation (CV) to compute
the error of all methods.
Usually the linear classifiers have high bias and low variance, so ensembles of
linear classifiers usually do not improve the error rate that much than the
single linear classifier.
In the new ensemble method,
Due to the selection phase, the base linear classifiers generated are more
accurate (which automatically imply that they have lower bias than the single
Also as we have imposed to select the weaker version of the linear classifier,
this will increase the variance of the base classifiers; we all know that bagging
type ensembles reduce the variance of the base classifiers.
TABLE: Misclassification Error of single FLD and Ensembles of FLD
TABLE: Misclassification Error of single LogLC and Ensembles of LogLC
TABLE: Misclassification Error of Ensembles of CART
TABLE: Correlation between Trimmean and Spread Combination rule
with several diversity measures.
It is apparent from the table that there is no substantial amount of correlation
between the combination rules and the diversity measures.
FIGURE: Kappa-Error Diagram for the New ensemble of CART with
Trimmean combination rule.
In the figure only the cloud center for Sonar and Wisconsin dataset are pointed
The proposed ensemble is able to reduce the error of linear classifiers
It has performed better with FLD than LoglC.
Both the combination rule has performed similarly with the linear
The new ensemble method produced lower error rate than Bagging and
With CART the new ensemble method has similar performance with
Bagging and Adaboost. But performed worse that Rotation Forest.
Trimmean and Spread rule has low correlation with several diversity
measures, from which nothing significant can be concluded.
The new ensemble with trimmean has either low diversity high accuracy or
high diversity low accuracy format in the κ-error diagram.