1 / 62

Estimation Of Distribution Algorithm based on Markov Random Fields

Estimation Of Distribution Algorithm based on Markov Random Fields. Siddhartha Shakya School Of Computing The Robert Gordon University. Outline. From GAs to EDAs Probabilistic Graphical Models in EDAs Bayesian networks Markov Random Fields

clare
Download Presentation

Estimation Of Distribution Algorithm based on Markov Random Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon University Siddhartha Shakya

  2. Outline • From GAs to EDAs • Probabilistic Graphical Models in EDAs • Bayesian networks • Markov Random Fields • Fitness modelling approach to estimating and sampling MRF in EDA • Gibbs distribution, energy function and modelling the fitness • Estimating parameters (Fitness modelling approach) • Sampling MRF (several different approaches) • Conclusion Siddhartha Shakya

  3. Genetic Algorithms (GAs) • Population based optimisation technique • Based on Darwin's theory of Evolution • A solution is encoded as a set of symbols known as chromosome • A population of solution is generated • Genetic operators are then applied to the population to get next generation that replaces the parent population Siddhartha Shakya

  4. Simple GA simulation Siddhartha Shakya

  5. GA to EDA Siddhartha Shakya

  6. 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 Simple EDA simulation 0 0 1 1 1 1 1 1 0 1 0.5 0.5 1.0 0.5 1.0 1 0 1 0 1 0 1 1 1 1 Siddhartha Shakya

  7. Joint Probability Distribution (JPD) • Solution as a set of random variables • Joint probability Distribution (JPD) • Exponential to the number of variables, therefore not feasible to calculate in most cases • Needs Simplification!! Siddhartha Shakya

  8. Factorisation of JPD • Univariate model: No interaction: Simplest model • Bivariate model: Pair-wise interaction • Multivariate Model: interaction of more than two variables Siddhartha Shakya

  9. Typical estimation and sampling of JPD in EDAs • Learn the interaction between variables in the solution • Learn the probabilities associated with interacting variables • This specifies the JPD: p(x) • Sample the JPD (i.e. learned probabilities) Siddhartha Shakya

  10. Probabilistic Graphical Models • Efficient tool to represent the factorisation of JPD • Marriage between probability theory and Graph theory • Consist of Two components • Structure • Parameters • Two types of PGM • Directed PGM (Bayesian Networks) • Undirected PGM (Markov Random Field) Siddhartha Shakya

  11. X2 X1 X3 X4 X5 Directed PGM (Bayesian networks) • Structure: Directed Acyclic Graph (DAG) • Independence relationship: A variable is conditionally independent of rest of the variables given its parents • Parameters: Conditional probabilities Siddhartha Shakya

  12. X2 X1 X3 X4 X5 Bayesian networks • The factorisation of JPD encoded in terms of conditional probabilities is • JPD for BN Siddhartha Shakya

  13. X2 X1 X3 X4 X5 Estimating a Bayesian network • Estimate structure • Estimate parameters • This completely specifies the JPD • JPD can then be Sampled Siddhartha Shakya

  14. BN based EDAs • Initialise parent solutions • Select a set from parent solutions • Estimate a BN from selected set • Estimate structure • Estimate parameters • Sample BN to generate new population • Replace parents with new set and go to 2 until termination criteria satisfies Siddhartha Shakya

  15. X2 X1 X3 X4 X5 How to estimate and sample BN in EDAs • Estimating structure • Score + Search techniques • Conditional independence test • Estimating parameters • Trivial in EDAs: Dataset is complete • Estimate probabilities of parents before child • Sampling • Probabilistic Logical Sampling (Sample parents before child) Siddhartha Shakya

  16. BN based EDAs • Well established approach in EDAs BOA, EBNA, LFDA, MIMIC, COMIT, BMDA References • Larrañiaga and Lozano 2002 • Pelikan 2002 Siddhartha Shakya

  17. Markov Random Fields (MRF) • Structure: Undirected Graph • Local independence: A variable is conditionally independent of rest of the variables given its neighbours • Global Independence: Two sets of variables are conditionally independent to each other if there is a third set that separates them. • Parameters: potential functions defined on the cliques X1 X2 X3 X5 X4 X6 Siddhartha Shakya

  18. X1 X2 X3 X5 X4 X6 Markov Random Field • The factorisation of JPD encoded in terms of potential function over maximal cliques is • JPD for MRF Siddhartha Shakya

  19. X1 X2 X3 X5 X4 X6 Estimating a Markov Random field • Estimate structure from data • Estimate parameters • Requires potential functions to be numerically defined • This completely specifies the JPD • JPD can then be Sampled • No specific order (not a DAG) so a bit problematic Siddhartha Shakya

  20. MRF in EDA • Has recently been proposed as a estimation of distribution technique in EDA • Shakya et al 2004, 2005 • Santana et el 2003, 2005 Siddhartha Shakya

  21. MRF based EDA • Initialise parent solutions • Select a set from parent solutions • Estimate a MRF from selected set • Estimate structure • Estimate parameters • Sample MRF to generate new population • Replace parent with new solutions and go to 2 until termination criteria satisfies Siddhartha Shakya

  22. How to estimate and sample MRF in EDA • Learning Structure • Conditional Independence test (MN-EDA, MN-FDA) • Linkage detection algorithm (LDFA) • Learning Parameter • Junction tree approach (FDA) • Junction graph approach (MN-FDA) • Kikuchi approximation approach (MN-EDA) • Fitness modelling approach (DEUM) • Sampling • Probabilistic Logic Sampling (FDA, MN-FDA) • Probability vector approach (DEUMpv) • Direct sampling of Gibbs distribution (DEUMd) • Metropolis sampler (Is-DEUMm) • Gibbs Sampler (Is-DEUMg, MN-EDA) Siddhartha Shakya

  23. Fitness modelling approach • Hamersley Clifford theorem: JPD for any MRF follows Gibbs distribution • Energy of Gibbs distribution in terms of potential functions over the cliques • Assuming probability of solution is proportional to its fitness: • From (a) and (b) a Model of fitness function - MRF fitness model (MFM) – is derived Siddhartha Shakya

  24. MRF fitness Model (MFM) • Properties: • Completely specifies the JPD for MRF • Negative relationship between fitness and Energy i.e. Minimising energy = maximise fitness • Task: • Need to find the structure for MRF • Need to numerically define clique potential function Siddhartha Shakya

  25. X1 X2 X3 X5 X4 X6 MRF Fitness Model (MFM) • Let us start with simplest model: univariate model – this eliminates structure learning :) • For univariate model there will be n singleton clique • For each singleton clique assign a potential function • Corresponding MFM • In terms of Gibbs distribution Siddhartha Shakya

  26. Estimating MRF parameters using MFM • Each chromosome gives us a linear equation • Applying it to a set of selected solution gives us a system of linear equations • Solving it will give us the approximation to the MRF parameters • Knowing MRF parameters completely specifies JPD • Next step is to sample the JPD Siddhartha Shakya

  27. General DEUM framework Distribution Estimation Using MRF algorithm (DEUM) • Initialise parent population P • Select set D from P (can use D=P !!) • Build a MFM and fit to D to estimate MRF parameters • Sample MRF to generate new population • Replace P with new population and go to 2 until termination criterion satisfies Siddhartha Shakya

  28. How to sample MRF • Probability vector approach • Direct Sampling of Gibbs Distribution • Metropolis sampling • Gibbs sampling Siddhartha Shakya

  29. Probability vector approach to sample MRF • Minimise U(x) to maximise f(x) • To minimise U(x) Each αixi should be minimum • This suggests: if αi is negative then corresponding xi should be positive • We could get an optimum chromosome for the current population just by looking on α • However not always the current population contains enough information to generate optimum • We look on sign of each αi to update a vector of probability Siddhartha Shakya

  30. DEUM with probability vector (DEUMpv) Siddhartha Shakya

  31. Updating Rule • Uses sign of a MRF parameter to direct search towards favouring value of respective variable that minimises energy U(x) • Learning rate controls convergence Siddhartha Shakya

  32. 0 0 1 0 1 0 1 0 0 0 0.05 -0.05 -0.625 -0.05 -0.625 Simulation of DEUMpv 0 1 1 1 1 0 1 1 1 1 4 4 1 0 1 0 1 1 0 1 0 1 3 3 0.5 0.5 0.5 0.5 0.5 0 0 1 0 1 2 2 0 1 0 0 0 1 1 0.4 0.6 0.6 0.6 0.6 Siddhartha Shakya

  33. Results • OneMax Problem Siddhartha Shakya

  34. Results • F6 function optimisation Siddhartha Shakya

  35. Results • Trap 5 function • Deceptive problem • No solution found Siddhartha Shakya

  36. Sampling MRF • Probability vector approach • Direct sampling of Gibbs distribution • Metropolis sampling • Gibbs sampling Siddhartha Shakya

  37. Direct Sampling of Gibbs distribution • In the probability vector approach, only the sign of MRF parameters has been used • However, one could directly sample from the Gibbs distribution and make use of the values of MRF parameters • Also could use the temperature coefficient to manipulate the probabilities Siddhartha Shakya

  38. Direct Sampling of Gibbs distribution Siddhartha Shakya

  39. Direct Sampling of Gibbs distribution • The temperature coefficient has an important role • Decreasing T will cool probability to either 1 or 0 depending upon sign and value of alpha • This forms the basis for the DEUM based on direct sampling of Gibbs distribution (DEUMd) Siddhartha Shakya

  40. DEUM with direct sampling (DEUMd) 1. Generate initial population, P, of size M 2. Select the N fittest solutions, N ≤ M 3. Calculate MRF parameters 4. Generate M new solutions by sampling univariate distribution 5. Replace P by new population and go to 2 until complete Siddhartha Shakya

  41. 0 1 1 1 1 0 1 1 1 1 4 4 1 0 1 0 1 1 0 1 0 1 3 3 0 0 1 0 1 2 2 0 0 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0.4 0.6 0.6 0.6 0.6 0.05 -0.05 -0.625 -0.05 -0.625 DEUMd simulation 0 1 1 1 1 4 1 0 1 1 1 4 0 1 1 0 1 3 0 1 0 1 0 2 Siddhartha Shakya

  42. Experimental results • OneMax Problem Siddhartha Shakya

  43. F6 function optimization Siddhartha Shakya

  44. Plateau Problem (n=180) Siddhartha Shakya

  45. Checker Board Problem (n=100) Siddhartha Shakya

  46. Trap function of order 5 (n=60) Siddhartha Shakya

  47. Experimental results Siddhartha Shakya

  48. Analysis of Results • For Univariate problems (OneMax), given population size of 1.5n, P=D and T->0, solution was found in single generation • For problems with low order dependency between variables (Plateau and CheckerBoard), performance was significantly better than that of other Univariate EDAs. • For the deceptive problems with higher order dependency (Trap function and Six peaks) DEUMd was deceived but by slowing the cooling rate, it was able to find solution for Trap of order 5. • For the problems where optimum was not known the performance was comparable to that of GA and other EDAs and was better in some cases. Siddhartha Shakya

  49. Cost- Benefit Analysis (the cost) • Polynomial cost of estimating the distribution compared to linear cost of other univariate EDAs • Cost to compute univariate marginal frequency: • Cost to compute SVD Siddhartha Shakya

  50. Cost- Benefit Analysis (the benefit) • DEUMd can significantly reduce the number of fitness evaluations • Quality of solution was better for DEUMd than other compared EDAs • DEUMd should be tried on problems where the increased solution quality outweigh computational cost. Siddhartha Shakya

More Related