250 likes | 334 Views
Explore the transformation of accurate opaque models into comprehensible ones using genetic programming with clustering in soft computing. This work balances accuracy and comprehensibility by evolving fuzzy decision trees. Experiments with classification datasets like IRIS and WINE show Fuzzy GP vs. standard GP performance. Discussion includes handling outliers and using categorical variables. Future work involves alternative membership functions and fuzzy regression for predictive improvement.
E N D
Evolving Fuzzy Rules with Genetic Programming and Clustering Soft Computing For Controle
G-REX (Previous work) • The transformation of an highly accurate opaque model to a comprehensible model. • Genetic programming • Black box • Arbitary representation and fitness function • Balances Accuracy and Comprehensibility IF Age > 25 IF Salary > 5000 Reject Accept Reject
Background • Evolving Fuzzy Decision Trees With Genetic Programming and Clustering • J. Eggermont, (2001) • Automatic fuzzyfication using K-Means • Genetic Programming • Fuzzy Representation
Membership functions • Three types of membership function • Distances does not need to be equal • Based on medioids/centroids
K-means • Most frequently used clustering method • Fast, deterministic and easy to implement. • J.B MacWueen (1967) • K- stand for the number of clusters • Each cluster is represented by one membership function • A cluster is represented by a centroid. • The mean value of the members • An instance belongs to the closest centroid
1 • Euclidian distance
2 • The new centroid is the mean of its members
3 • Recalculate members • Repeat until no change
Kaufmans Initialization • K-Means is sensitive to the initialization method • Pêna J.M. Lozano J. A. and Larranga P. (1999) • An Empirical Investigation of Four Initialization Methods for the K-Means Algorithm Step 1. The instance closest to the mean value Step 2-3 Choose a instance far away from the other medioids with many instance close by.
Membership functions • Three types of membership function • Distance does not need to be equal • Based on medioids
GP Representation • All variables with less than k unique values are treated as crisp sets.
Fitness function Not precise enough Reward is equal to the membership Value for the correctly predicted instance 1- the MSE of each membership function
Experiments • 5 classification datasets • Only continuous variables • IRIS, WINE • Categorical and continuous • COLIC, CLEAVLAND, PIMA • 10-fold cross validation • Stratification • Fuzzy GP vs standard GP (if rules) • Evaluated against • Accuracy (ACC) • Area under ROC-curve (AUC) • Brier Score (BRI)
Disscussion • Current membership function removes information from the variable • A way to handle outliers • Some extremely simply if rules are better for some dataset. • Categorical variables • Should not be used as only method • Easy to remember rules but how accurate will they be as a decision support? • Gives a comprehensible explanation that could ad trust and there by improve predictions.
Future work • Alternative membership function • Fuzzy regression ?