Ensemble techniques for Parallel Genetic Programming based Classifiers

Ensemble techniques for Parallel Genetic Programming based Classifiers Gianluigi Folino, Clara Pizzuti, Giandomenico Spezzano folino, pizzuti, spezzano@icar.cnr.it

Outline • Introduction • Classification • Bagging and ensemble techniques • CGPC and ensemble techniques • The algorithm CGPC • BagCGPC • Experimental results • Conclusion and Future Works • More experiments • Boosting techniques

Classification task Supervised learning technique that identifies common characteristics in a set of objects and categorizes them in different groups.

Decision tree and GP NODES ATTRIBUTES  FUNCTIONS ARCS  ATTRIBUTE VALUES  ARITY OF THE FUNCTIONS LEAFS  CLASSES  TERMINAL

Ensemble techniques • Bagging and Boosting aggregate multiple hypotheses generated by the same learning algorithm invoked over different distributions of training data [Breiman 1996, Freund & Schapire 1996]. • Bagging and Boosting generate a classifier with a smaller error on the training data as it combines multiple hypotheses which individually have a larger error.

cBA (x)=+ x … C1(x)=+ C2(x)=- CT(x) =+ train train train … Training set1 Training set2 Training setT Bagging • From the overall training S set randomly sample (with replacement) T different training sets S1,...,ST of size N. • For each sample set St obtain a hypothesis Ct . • To an unseen instance x assign the majority classification cBA(x) among the hypotheses Ct classifications ct(x).

Bagging • Bagging requires ”instable” classifiers like for example decision trees or concept learner. • ”The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.” (Breiman 1996)

Boosting • Boosting maintains a weight wi for each instance <xi, ci> in the training set. • The higher the weight wi, the more the instance xi influences the next hypothesis learned. • At each trial, the weights are adjusted to reflect the performance of the previously learned hypothesis, with the result that the weight of correctly classified instances is decreased and the weight of incorrectly classified instances is increased.

Boosting • Construct an hypothesis Ct from the current distribution of instances described by wt. • Adjust the weights according to the classification error et of classifier Ct. • The strength at of a hypothesis depends on its training error et. at = ½ ln ((1-et)/et) learn hypothesis Set of weighted wit instances xi hypothesis Ct strength at adjust weights

Using small bags • In bagging and boosting training set dimension is the same of the original dataset. • This is impractical for real datasets. • Breiman use small samples of training set and if the dataset is quite complex the results are comparable with bagging and boosting.

CGPC (What is) • CGPC (Cellular Genetic Programming for Data Classification) was our parallel tool for the classification of dataset evolving decision trees. • It handles large populations and reduces the execution time • Cellular model improves the accuracy of the classifier

CGPC (How works) • The population is arranged in a two-dimensional grid, where each point represents a program tree. • CAGE uses a one-dimensional domain decomposition along the x direction. • For each element in the grid: • Mutation and unary operators are applied to the current tree • Crossover choices as second parent, the best tree among • the neighbours (Moore neighbourhood). • It is applied a policy of replacement. • The chosen individual is put in the new population in the • same position of the old one.

CGPC (Software Architecture) • Each processor element handles a slice of the population. • Parallel file system is used to partition the dataset on different disks • All the PEs need the entire data set.

BagCGPC • Extension of CGPC to generate an ensemble of classifiers (like bagging). • Each PE hosts a subpopulation, generating a classifier. • subpopulation works on samples of the entire training set. • K subpopulations generate k classifiers using different samples.

BagCGPC • All the classifiers vote to classify the tuples in the most voted class (like bagging). • Subpopulations do not evolve independently. • They exchange the outmost individuals with neighbouring subpopulations, like cellular model. • Our experiment will show the positive effect of these exchanges.

Datasets (one large)

Accuracy and execution time • Comparing execution times and accuracy of CGPC, BAGCGP (with and without communications) using 5 classifiers and 1/5 of the training set on 5 processors.

Cens dataset accuracy A population of 100 trees is used to generate each classifiers ( so for ten classifiers we use a population of 1000 element for CGPC and for BAGCGP)

Cens dataset (execution time) For example using five classifiers and using a population of 500 elements and 50000 tuples, CGCP requires 6053 secs, BAGCGPC 1117 secs (4081 without communications).

Conclusions • An extension of CGPC to induce an ensemble of predictors was presented. • Copes with large data sets that do not fit in memory. • Choosing a suitable sample size, with a low number of classifiers we can obtain high accuracy.

Conclusions • Sharing of information in the population produces trees with a smaller size. • The method is fault tolerant. • More experiments on large datasets. • New experiments using boosting. Future works

Ensemble techniques for Parallel Genetic Programming based Classifiers