Loading in 2 Seconds...

An experimental study on a new ensemble method using robust and order statistics

Loading in 2 Seconds...

- By
**spiro** - Follow User

- 119 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'An experimental study on a new ensemble method using robust and order statistics' - spiro

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### An experimental study on a new ensemble method using robust and order statistics

Faisal Zaman and Hideo Hirose

Kyushu Institute of Technology

Fukuoka, Japan

- Ensemble Learning
- Popular Ensemble Methods
- Properties
- Base Classifiers
- Combination Rules
- Diversity
- New Ensemble Method
- Design of an ensemble
- Trimmean and Spread combination rule
- New ensemble method-overview
- New ensemble method-algorithm
- Experiments and Discussion of Results
- Aim and set up of the experiments
- Experiment with linear classifiers-discussion
- Experiment with linear classifiers-results fld
- Experiment with linear classifiers-results loglc
- Experiment with cart-error report
- Experiment with cart-diversity
- Conclusion

18/06/2009

Ensemble learning is a method to learn decisions of several predictors

on a same problem, then combine these decisions to predict an unseen problem.

An ensemble is preferred over a single classifier for its

- Accuracy: as more reliable mapping can be obtained by combining
- the output of multiple “experts”.
- Efficiency: as a complex problem can be decomposed into multiple
- sub-problems that are easier to understand and solve.

18/06/2009

1

General architecture of ensemble of classifiers

T

Original Training Set

Step 1:

Create Multiple datasets

T1

T2

TB-1

TB

Step 2:

Built Multiple Classifiers (Ensemble)

C1

C2

CB-1

CB

Step 3:

Combine the decisions of the classifiers

CCOM

General Ensemble Architecture

18/06/2009 2

The following ensemble methods are popular and standard among ensemble of classifier :

Bagging (Breiman, 1994).

Adaboost (Freund and Schapire 1995).

Random Forest (Breiman, 2001).

Rotation Forest (Rodriguez et. al, 2006).

Error of an ensemble can be decomposed into three parts

Error = Intrinsic error + Bias + Variance

Ensemble methods usually reduce variance and sometimes the bias also.

18/06/2009

3

Review of ensemble methods-properties

18/06/2009 4

Review of ensemble methods-base classifiers

In ensemble methods following base classifiers are used:

Classification and Regression Tree (CART)

Neural Network (NN)

Decision Stump (DS)==> CART with 2-3 nodes.

18/06/2009 5

Review of ensemble methods-combination rules

Majority Vote:

This will select “Class1”, as

two classifiers resulted in

Class1 among three classifiers

Weighted Majority Vote:

Weighted majority will select

“Class2”, as it get the highest

weight.

Average:

It will select “Class2”, as the average class posteriori probability for Class2

is more than Class1.

0.47

0.53

Class Posteriori Probability

18/06/2009 6

Review of ensemble methods-diversity

FIGURE: Kappa-Error Diagram of popular ensemble methods

From General Point of view (only cloud centers are pointed)

Kappa-Error diagram is an efficient method to check the diversity of the base

classifiers of an ensemble. In this diagram, the lower value of kappa, indicates

higher diversity.

In this diagram average error of a pair of base classifiers from each ensemble

is plotted against the kappa value of the pair.

18/06/2009 7

To design an ensemble of classifiers one need to ensure:

Higher accuracy of the individual base classifiers. This refers to bias and

variance of the prediction error of each base classifier.

Diversity among the decisions of the base classifiers.

Good combination rule, to combine the decisions of the base classifiers.

Better accuracy can be achieved by training the base classifiers in the whole feature space. To do that we need to construct base classifiers on larger training sets.

To make the base classifiers disagree with each other, each of them should be constructed on independent training sets.

Use of subsampling with high subsample rate, instead of bootstrapping can facilitate us with both these criterion.

It is feasible to use a combination rule which is not susceptible to imbalance

decisions of any base classifier.

18/06/2009 8

Trimmean and Spread combination rule

TRIMMEAN:

Sort the class posteriori probabilities

(CPP) for each class.

2. Trim a portion of them, and average the

remaining probabilities.

3. Select the class with highest average.

In this example, if we sort the CPP for each class and then trim the lowest ones,

average the remaining CPP, the result is 0.65 for both classes. So we can select,

“Class1” or “Class2”.

SPREAD:

Sort the CPP for each class.

Take the MAX CPP and MIN CPP for each class.

Compute the average of MAX CPP and MIN CPP.

Select the class with highest average.

In this example the MAX CPP and MIN CPP for Class1 is 0.1 and 0.7, do their

Average is 0.4; similarly for Class2 this value is 0.6. So we will select, “Class2”.

18/06/2009 11

The new ensemble method-overview

The new ensemble method is constructed as follows:

Use subsampling to generate training sets to train the base classifiers.

Select classifiers on the basis of a rejection-threshold value.

Use a robust or order statistic to combine the decisions of the base classifiers.

We have generated the training sets for each base classifier with subsample

rate 0.75 , i.e., taking 75% observations from the original training set for each

subsample. This ensures that each base classifier is trained on larger subsample.

The rejection-threshold is Leave-One-Out Cross-validation (LOOCV) error

of the base classifier, computed on the original training set. We select

only those base classifiers which have generalization error less than or equal

to this value.

We have used one robust statistic named, “Trimmed Mean” and another

order statistic named, “Spread” as the combination rule.

18/06/2009 9

The new ensemble method-algorithm

INITIAL PHASE:

Compute the LOOCV error of the base classifier C(x) on the training set X, denote this as εLOOCV.

Repeat for l = 1, 2, … L

Generate a subsample Xl from the training set X , with 75% observations from X .

Construct a base classifier Cl on Xl .

Compute the error εl* of Cl on an independent bootstrap Xl*sample from X.

Select classifier Cl, if εl* ≤ εLOOCV , otherwise, train a weaker version of CL. Denote these classifier as C lVALID

5. Combine the selected classifiers C lVALID , l = 1, 2, … L , using Trimmean or Spread combination rule.

INITIAL PHASE

0

TRAINING PHASE:

1

2

3

SELECTION PHASE:

4

COMBINATION PHASE

5

18/06/2009 10

Aim and Set up of the experiments

Firstly, we wanted to check the performance of the new ensemble method with

Linear base classifiers.

a. We used here 2 linear classifiers: Fisher Linear Discriminant (FLD)

classifier and Logistic Linear Classifier (LogLC).

b. We compared the performance of the new ensemble method with Bagging

and Adaboost.

c. We also checked whether the ensemble of these linear classifiers achieved

lower error rate than single linear classifier.

2. Secondly, we compared the performance of the new ensemble method with

Bagging, Adaboost, Random Forest and Rotation Forest, with Classification

And Regression Tree (CART) as the base classifier.

Thirdly, check the diversity of the proposed ensemble method using κ-error

diagram, with CART as the base classifier. Also check the relation between the

two combination rules with several diversity measures.

We have used 15 datasets from UCI Machine Learning Repository.

We have used mean of 10 repetitions of 10 fold cross-validation (CV) to compute

the error of all methods.

18/06/2009 11

Experiments with linear classifiers-discussion

Usually the linear classifiers have high bias and low variance, so ensembles of

linear classifiers usually do not improve the error rate that much than the

single linear classifier.

In the new ensemble method,

Due to the selection phase, the base linear classifiers generated are more

accurate (which automatically imply that they have lower bias than the single

classifier).

Also as we have imposed to select the weaker version of the linear classifier,

this will increase the variance of the base classifiers; we all know that bagging

type ensembles reduce the variance of the base classifiers.

18/06/2009 12

Experiments with linear classifiers-results FLD

TABLE: Misclassification Error of single FLD and Ensembles of FLD

18/06/2009 13

Experiments with linear classifiers-results FLD

18/06/2009 14

Experiments with linear classifiers-results LOGLC

TABLE: Misclassification Error of single LogLC and Ensembles of LogLC

18/06/2009 15

Experiments with linear classifiers-results LOGLC

18/06/2009 16

Experiment with cart-error report

18/06/2009 19

Experiment with cart-diversity

TABLE: Correlation between Trimmean and Spread Combination rule

with several diversity measures.

It is apparent from the table that there is no substantial amount of correlation

between the combination rules and the diversity measures.

18/06/2009 20

Experiment with cart-diversity

FIGURE: Kappa-Error Diagram for the New ensemble of CART with

Trimmean combination rule.

Sonar

In the figure only the cloud center for Sonar and Wisconsin dataset are pointed

18/06/2009 21

The proposed ensemble is able to reduce the error of linear classifiers

It has performed better with FLD than LoglC.

Both the combination rule has performed similarly with the linear

classifiers.

The new ensemble method produced lower error rate than Bagging and

Adaboost.

With CART the new ensemble method has similar performance with

Bagging and Adaboost. But performed worse that Rotation Forest.

Trimmean and Spread rule has low correlation with several diversity

measures, from which nothing significant can be concluded.

The new ensemble with trimmean has either low diversity high accuracy or

high diversity low accuracy format in the κ-error diagram.

18/06/2009 22

Download Presentation

Connecting to Server..