Swarm Intelligence Research in Bioinformatics at University of Kent

PSO for Bioinformatics Alex Freitas and Colin JohnsonUniversity of Kent

People involved in swarm intelligence research at Kent(1) • XPS Project • Alex Freitas (Lecturer) • Colin Johnson (Lecturer) • Elon Correa (RA – will start soon) • Mudassar Iqbal (PhD student – started Nov. 2004) • Initially investigating a dynamic neighborhood topology • Interested in bioinformatics – problem to be defined

People involved in swarm intelligence research at Kent(2) • Other research students • Terry Arnold (3rd-year PhD student) • Supervised by Colin • Doing research on force-based PSO • Nick Holden (MRes student) • Supervised by Alex • Doing research on “A hybrid PSO/ACO algorithm for hierarchical classification of biological data (enzymes)” • Allen Chan (MRes student) • Supervised by Alex • Doing research on an ACO algorithm for classification of biological data – multi-label classification problem

Introduction to Classification • Each record (example) belongs to a predefined class • Each example consists of two parts: • < predictor attributes, class attribute >, e.g.: • < Gender = M, Age = 25, Salary = 35,000, credit = good > • < attributes_describing_protein, function = transport > • Goal: to predict the class of an example, based on the values of the predictor attributes for that example

Hierarchical classification (1) • Hierarchical classes Enzyme Commision root (EC) codes have 4 levels, e.g. EC.1.1.1.1 1 2 most general class 1.1 1.2 1.3 2.1 2.2 most specific class

Hierarchical classification (2) • Challenges • Several predictions must be made for each example – one predicted class at each level of the hierarchy • As we go down the hierarchy, there are fewer examples (records) per class – “data fragmentation” • Opportunities • Information of “class similarities” in the hierarchy • Top-Down approach: first predict top-level class, then predict second-level class among children of predicted top-level class, etc., until a leaf class is predicted • Cost of misclassifying 1.1 into 1.2 is smaller than cost of misclassifying 1.1 into 2.1

A hybrid PSO/ACO algorithm – basic ideas • Each particle represents a candidate classification rule • Continuous (real-valued) attributes – standard PSO • Categorical (nominal) attributes – special treatment; e.g. Gender: “F” or “M” (unordered values) • Each categorical attribute is represented by a “pheromone vector”, with one element for each attribute value plus one element for “not used in rule” F M “off” (not used in rule) Pheromone: 0.6 0.1 0.3 General motivation: ACO algorithms, using pheromone, cope well with discrete data

A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (1) • Class attribute: 4-digit EC code (4 levels of classes) • Predictor attributes: Prosite patterns (motifs) • A particle represents a classification rule: pattern1 . . . . . patternn yes no off yes no off class 0.3 0.1 0.6 0.8 0.1 0.1 EC.1.5.2.1 The particle is “decoded” into a rule by choosing a value (“yes”, “no”, “off”) for each attribute, with probability given by its pheromone vector • Pheromone values are updated based on rule quality • Particle also moves towards previous best and local best

A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (2) • Algorithm follows top-down (greedy) approach: • first discover rules predicting 1st-level class, then discover rules predicting 2nd level class, etc. • this sequential procedure is used in both training and testing • Preliminary results (varying some parameters) • Predictive accuracy at level 1 (6 classes): 94.9-96.7% • Predictive accuracy at level 2 (51 classes): 72.3-90.3% • Current/Future work • Prediction of levels 3 and 4 of EC code; other data sets • Consider different misclassification costs • Develop a less greedy method for top-down classification (allowing the recovery from errors in higher levels)

Force-based particle swarms • Drawing inspiration from physics • In particular, ways of simulating fluid flow • The idea is to control the flow of particles by assigning forces between particle types, then letting the process run to completion. • We can use different force types: • Electromagnetic forces • Gravitational forces • Linear distance-based forces • Lennard-Jones potential • ...

Force based programming language • One idea is to create a force-based programming language. • We express the problem by saying how forces between pairs of particle types interact. • Example: clustering • Create fixed particles for the data • Create k classes of particles for the cluster-markers • Rules: • All cluster-markers repel at close range • Cluster-markers of different types always repel • Cluster-markers are attracted to data.

Demonstrations and videos

Applications • Currently applying this to classification algorithms in bioinformatics. • Data points will be fixed in the space. • Particle attraction/repulsion will be learned using a GA/GP type strategy to learn: • The forces that apply between particle types • The shapes of the possible force profiles.

Swarm Intelligence Research in Bioinformatics at University of Kent

Swarm Intelligence Research in Bioinformatics at University of Kent

Presentation Transcript

Tools for BioInformatics

Statistics for bioinformatics

General PSO Update

Biology for Bioinformatics

Bioinformatics approaches for…

Python for Bioinformatics

Python for Bioinformatics

Bioinformatics for Research

Bioinformatics for Biologists

Computing for Bioinformatics

OHOS/ PSO Functions

Perl for Bioinformatics

Programming for Bioinformatics

PSO Membership

Statistics for bioinformatics

Portals for Bioinformatics

General PSO Update

PSO Variations

Bioinformatics for Beginners

Particle Swarm Optimization (PSO)

PSO -Introduction