Towards a Mapping of Modern AIS and Learning Classifier Systems

Towards a Mapping of Modern AIS and Learning Classifier Systems Larry Bull Department of Computer Science & Creative Technologies University of the West of England, U.K.

Background • For 25 years correlations between aspects of AIS and Learning Classifier Systems (LCS) have been highlighted. • Neither field appears to have benefitted. • More recently, an LCS has been presented for unsupervised learning which, with hindsight, may be viewed as a form of AIS. • Purpose is to bring this LCS to the attention of the AIS community with the aim of serving as a catalyst for sharing ideas and mechanisms.

LCS in a Nutshell • Invented by John Holland circa 1976. • Consist of an “ecology” of rules. • IF <states> AND <action> THEN Reward • Traditionally use reinforcement learning techniques to approximate rule utility. • Use evolutionary computing techniques to discover new rules. • Often incorporate other heuristics.

state action Environment reward [P] 10#0:11 EA Action selection [A] Q-learning Prediction 0,10,2,9 [M] [A]-1

CS-1 Holland & Reitman ‘78 Learning Classifier Systems Family Tree 1978-2008 LCS Holland ‘80 Gofer Booker ‘82 Animat Wilson ‘85 Boole Wilson ‘87 CFCS2 Riolo ‘90 New Boole Bonelli et al. ‘90 ZCS Wilson ‘94 ACS Stolzmann ‘98 XCS Wilson ‘95 Reinforcement UCS Bernado-Mansilla & Garrell ‘03 XCSC Tammee et al.’08 ACS2 Butz et al. ‘02 XCSF Wilson ‘00 Supervised Unsupervised Models Regression (& Reinforcement)

From LCS to AIS • Recently presented a novel variant of XCS for data clustering. • Approach exploits the mechanisms inherent to XCS but for unsupervised learning. • Aim is to develop an approach to learning rules which accurately describe clusters - without prior assumptions as to their number within a given dataset. • With hindsight approach is a form of clonal selection AIS.

YCSC Schematic Data cluster descriptor data [P] EA [M] Error updates

Rule Representation: Bounded Affinity • A condition consists of intervals: { {c1 ,s1}, ….. {cd ,sd} } • c is the interval’s range centre from [0.0,1.0] • s is the “spread” from that centre (truncated). • d is the number of dimensions. • Each interval predicates’ upper and lower bounds are calculated as: [ci - si, ci + si].

Fitness • Each rule maintains a running estimate of matching error and niche size. • Error e is derived from the Euclidean distance with respect to the input x and c in the condition of each member of [M]:

Niches • Niche size estimates (s) are based on match sets, i.e., number of concurrently active rules: sjsj + b( |[M]| - sj) • A time-triggered Genetic Algorithm is run in the match sets.

Selection • All rules maintain a time-stamp of the cycle when they were last in an [M] where the GA was used. • If qGA cycles or more have passed on average for all rules in a current [M], the GA is triggered. • The GA uses roulette-wheel selection with a scalable function: 1 Fitness = ev + 1 • Time-stamps are reset for all members of [M]

Search • Offspring are produced via mutation (probability m) where we mutate an allele by adding an amount + or - rand(m0). • Crossover (probability c, two-point) can occur between any two alleles, i.e., within an interval predicate as well as between predicates. • If no rules match on a given time step, then a covering operator is used which creates a rule with its condition centre on the input value and the spread with a range of rand(s0), which then replaces an existing member of the rulebase.

Replacement • Rule replacement is population wide and proportional to niche occupancy. • Each rule maintains an estimate of the size of [M] in which it occurs. • Roulette-wheel selection. • Encourages all niches to contain the same number of rules; rule resource is balanced.

Learning Process 1 niche 1/error Fitness 0 Max gen. Generalization

Experiments • Clustering is an important unsupervised classification technique where a set of data are grouped into clusters. • Done in such a way that data in the same cluster are similar in some sense and data in different clusters are dissimilar in the same sense.

Some Data • Used randomly generated synthetic datasets. • The first dataset is well-separated and has k = 25 true clusters arranged in a 5x5 grid in d = 2 dimension. • Each cluster is generated from 400 data points using a Gaussian distribution with a standard deviation of 0.02, for a total of n = 10,000 datum. • The second dataset is not well-separated and generated it in the same way as the first except the clusters are not centred on that of their given cell in the grid.

Examples

Experimental Detail • The parameters used were: N=800, b=0.2, v=5, c=0.8, m =0.04, qGA =12, s0=0.03, m0=0.006. • All results presented are the average of ten runs. • Learning trials consisted of 200,000 presentations of a randomly sampled data point.

Example Initial Results

Compaction • Many overlapping rules are seen around each true cluster. • Developed a four-step rule compaction algorithm to remove overlaps: • Delete useless rules (v.low coverage) • Sort on numerosity • Sort on error • Extract largest [M] rules

Example Result after Compaction

Comparative Performance • We use as a measure of the quality of each clustering solution the total of the k-means objective function. • Quality of LCS was 8.12 +/- 0.54 and the number of clusters 25.0 +/- 0. • The average quality on the not well-separated dataset was 24.50 +/- 0.56 and the number of clusters 14.0 +/- 0. • The k-means algorithm (k=25) averaged over 10 runs gives a quality of 32.42 +/- 9.49 and 21.07 +/- 5.25 on the well-separated and less-separated datasets respectively.

Comparative Performance II • For estimating the number of clusters we ran, for 10 times each, different k (2 to 30)with different random initializations in k-means. • To select the best clustering with different numbers of clusters, the Davies-Bouldin validity index was used. • The result on well-separated dataset has a lower negative peak at 23 clusters and the less-separated dataset has a lower negative peak at 14 clusters. • Thus LCS better on separated data (25).

A Network-like Extension • One of the missing parts of XCS is a niche fitness sharing mechanism. • Here rules adjust their fitnesses based on the fitnesses of the other co-active rules. • Termedrelative accuracy (f’): f’ = f / Sf

Gives Improved Performance

Conclusions • Similarities (and differences) between AIS and LCS have long been noted. • Views taken from many different perspectives: dynamical systems, networks, complex adaptive systems, etc. • A recently presented LCS as a clustering technique is essentially a clonal selection AIS. • Can mechanisms from both fields now be consolidated to mutual benefit?

Some Possibilities • Theory and mechanisms for generalization. • Adaptive rates of search. • Theory from ensembles/mixture-of-experts. • Representation schemes. • Memory. • N.B. A new theory of neuronal replicators implies innate and adaptive components in learning.

Towards a Mapping of Modern AIS and Learning Classifier Systems

Towards a Mapping of Modern AIS and Learning Classifier Systems

Presentation Transcript

Classifier Systems

Towards a DecSerFlow mapping to S CIFF

Learning Classifier Systems

Towards Adaptive Web-Based Learning Systems

Mapping for Learning: Mapping

Classifier Systems

Towards a Representation of AT Systems

Towards a Segment Based Mapping System

EC: Lecture 17: Classifier Systems

Towards a Segment Based Mapping System

Learning Classifier Systems

Learning Classifier Systems

Artificial Immune Systems: a new classifier

TOWARDS A MODERN LOCAL GOVERNMENT ASSOCIATION

Modern Way of Learning

TOWARDS MODERN LEARNING ENVIRONMENTS: Experiences from Adult Education

Towards a Segment Based Mapping System

Optimizing Matrix Multiplication with a Classifier Learning System

Classifier Systems

A simple classifier