130 likes | 167 Views
Explore research in swarm intelligence for bioinformatics at the University of Kent. Learn about hierarchical classification and hybrid PSO/ACO algorithms for predicting enzyme classes. Discover force-based particle swarms inspired by physics, and potential applications.
E N D
PSO for Bioinformatics Alex Freitas and Colin JohnsonUniversity of Kent
People involved in swarm intelligence research at Kent(1) • XPS Project • Alex Freitas (Lecturer) • Colin Johnson (Lecturer) • Elon Correa (RA – will start soon) • Mudassar Iqbal (PhD student – started Nov. 2004) • Initially investigating a dynamic neighborhood topology • Interested in bioinformatics – problem to be defined
People involved in swarm intelligence research at Kent(2) • Other research students • Terry Arnold (3rd-year PhD student) • Supervised by Colin • Doing research on force-based PSO • Nick Holden (MRes student) • Supervised by Alex • Doing research on “A hybrid PSO/ACO algorithm for hierarchical classification of biological data (enzymes)” • Allen Chan (MRes student) • Supervised by Alex • Doing research on an ACO algorithm for classification of biological data – multi-label classification problem
Introduction to Classification • Each record (example) belongs to a predefined class • Each example consists of two parts: • < predictor attributes, class attribute >, e.g.: • < Gender = M, Age = 25, Salary = 35,000, credit = good > • < attributes_describing_protein, function = transport > • Goal: to predict the class of an example, based on the values of the predictor attributes for that example
Hierarchical classification (1) • Hierarchical classes Enzyme Commision root (EC) codes have 4 levels, e.g. EC.1.1.1.1 1 2 most general class 1.1 1.2 1.3 2.1 2.2 most specific class
Hierarchical classification (2) • Challenges • Several predictions must be made for each example – one predicted class at each level of the hierarchy • As we go down the hierarchy, there are fewer examples (records) per class – “data fragmentation” • Opportunities • Information of “class similarities” in the hierarchy • Top-Down approach: first predict top-level class, then predict second-level class among children of predicted top-level class, etc., until a leaf class is predicted • Cost of misclassifying 1.1 into 1.2 is smaller than cost of misclassifying 1.1 into 2.1
A hybrid PSO/ACO algorithm – basic ideas • Each particle represents a candidate classification rule • Continuous (real-valued) attributes – standard PSO • Categorical (nominal) attributes – special treatment; e.g. Gender: “F” or “M” (unordered values) • Each categorical attribute is represented by a “pheromone vector”, with one element for each attribute value plus one element for “not used in rule” F M “off” (not used in rule) Pheromone: 0.6 0.1 0.3 General motivation: ACO algorithms, using pheromone, cope well with discrete data
A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (1) • Class attribute: 4-digit EC code (4 levels of classes) • Predictor attributes: Prosite patterns (motifs) • A particle represents a classification rule: pattern1 . . . . . patternn yes no off yes no off class 0.3 0.1 0.6 0.8 0.1 0.1 EC.1.5.2.1 The particle is “decoded” into a rule by choosing a value (“yes”, “no”, “off”) for each attribute, with probability given by its pheromone vector • Pheromone values are updated based on rule quality • Particle also moves towards previous best and local best
A hybrid PSO/ACO algorithm for predicting hierarchical enzyme classes (2) • Algorithm follows top-down (greedy) approach: • first discover rules predicting 1st-level class, then discover rules predicting 2nd level class, etc. • this sequential procedure is used in both training and testing • Preliminary results (varying some parameters) • Predictive accuracy at level 1 (6 classes): 94.9-96.7% • Predictive accuracy at level 2 (51 classes): 72.3-90.3% • Current/Future work • Prediction of levels 3 and 4 of EC code; other data sets • Consider different misclassification costs • Develop a less greedy method for top-down classification (allowing the recovery from errors in higher levels)
Force-based particle swarms • Drawing inspiration from physics • In particular, ways of simulating fluid flow • The idea is to control the flow of particles by assigning forces between particle types, then letting the process run to completion. • We can use different force types: • Electromagnetic forces • Gravitational forces • Linear distance-based forces • Lennard-Jones potential • ...
Force based programming language • One idea is to create a force-based programming language. • We express the problem by saying how forces between pairs of particle types interact. • Example: clustering • Create fixed particles for the data • Create k classes of particles for the cluster-markers • Rules: • All cluster-markers repel at close range • Cluster-markers of different types always repel • Cluster-markers are attracted to data.
Applications • Currently applying this to classification algorithms in bioinformatics. • Data points will be fixed in the space. • Particle attraction/repulsion will be learned using a GA/GP type strategy to learn: • The forces that apply between particle types • The shapes of the possible force profiles.