Linkage Problem, Distribution Estimation, and Bayesian Networks
240 likes | 261 Views
Explore the linkage problem, distribution estimation, and Bayesian networks in evolutionary computation. Learn about solutions, algorithms, capabilities, and difficulties of these concepts.
Linkage Problem, Distribution Estimation, and Bayesian Networks
E N D
Presentation Transcript
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz
Linkage problem • The problem of building block disruption • Due to crossover • Solutions • Changing the representation of solutions • Evolving the recombination operators • Extracting some information from the entire set of promising solutions in order to generate new solutions
Evolving Representation or Operators • Representation of solutions in the algorithm is to make the interacting components of partial solutions less likely to be broken by recombination. • Various reordering and mapping operators. • Too slow, not sufficiently powerful • Premature convergence. • Messy Genetic Algorithm • Linkage Learning Genetic Algorithm
Probabilistic Modeling • Estimation of Distribution Algorithms • No crossover • New solutions are generated by using the information extracted from entire set of promising solutions. • How to extract the information?
No Interaction • Population Based Incremental Learning (PBIL) (1994) • Compact Genetic Algorithm (cGA) (1998) • Univariate Marginal Distribution Algorithm (UMDA) (1997)
Pairwise Interaction • Dependency tree (1997) • Mutual-Information-Maximization Input Clustering (MIMIC) (1997) • Bivariate Marginal Distribution Algorithm (BMDA) (1999)
Multivariate Interactions • Factorized Distribution Algorithm (FDA) (1998) • Extended Compact Genetic Algorithm (ECGA) (1999) • Bayesian Optimization Algorithm (BOA) (1999)
Multivariate Interactions • Iterative Density Estimation Evolutionary Algorithm (IDEA) (2000) • Bayesian Network (1999) • Gaussian Network (1999) • Bayesian Evolutionary Optimization (Helmholtz Machine) (2000) • Probabilistic Principle Component Analysis (PPCA) (2001)
Capabilities & Difficulties • No interactions • Efficient on linear problems. • Higher order BBs. • Pairwise • Efficient with BBs of order 2. • Higher order BBs.
Capabilities & Difficulties • FDA • Efficient on decomp. Prob. • Prior information is essential. • ECGA • Efficient on separable prob. • Highly overlapping BBs. • BOA • General.
The Bayesian Optimization Algorithm (BOA) • BOA uses the identical class of distributions as the FDA. • does not require a valid distribution factorization as input. • able to learn the distribution on the fly without the use of any problem-specific information. • Prior information can be incorporated.
BOA • Set t 0. randomly generate initial population P(0) • Select a set of promising strings S(t) from P(t). • Construct the network B using a chosen metric and constraints. • Generate a set of new strings O(t) according to the joint distribution encoded by B. • Create a new population P(t+1) by replacing some strings from P(t) with O(t). Set t t+1. • If the termination criteria are not met, go to 2.
Bayesian Networks • The Bayesian Dirichlet metric (BDe) • Parametric learning • Greedy algorithms • Structure learning
Greedy algorithm for network construction • Initialize the network B. • Choose all simple graph operations that can be performed on the network without violating the constraints. • Pick the operation that increases the score of the network the most • Perform the operation picked in the previous step. • If the network can no longer be improved under given constraints on its complexity or a maximal number of iterations has been reached, finish • Go to 2.
Generation of a new instance • Mark all variable as unprocessed. • Pick up an unprocessed variable Xi with all parents processed already. • Set Xi to xi with probability p(Xi = xi|Xi = xi). • Mark Xi as already processed. • If there are unprocessed variables left, go to 2.
Additively Decomposable Functions • Additively decomposable functions (ADF) • Can be decomposable into smaller subproblems • Order-k decomposable function • There exists a set of l functions fi over subsets of variables Si for i = 0, …, l-1, each of the size at most k,
ADF, the Interactions • ADFs that can be decomposed by using only nonoverlapping sets. • Subfunctions are independent. • Overlapping sets.
Future Works • Bayesian Optimization Algorithm, Population Sizing, and Time to convergence • Hierachical Problem Solving by the Bayesian Optimization Algorithm • Genetic Algorithms, Clustering, and Breaking of Symmetry (PPSN 2000) • Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor