Function optimization
E N D
Presentation Transcript
Function optimization Stelios Papadakis spap@staff.teicrete.gr
The problem of optimization • Consider that you are in the middle of a mountain surrounded by hills, with your eyes hermitically closed. • Your mission is to reach the highest/lowest point of the mountain on foot by only using an altimeter for evaluating your position. • a straightforward strategy for accomplishing your mission? • Step-1: memorize your current position • Step-2: make a step to a random direction and measure the altitude of the new position • Step-2.1: if the new altitudeis higher than the previous one then make a step forward to the same direction • Step-2.2: if the new altitude is lower than the previous one then make a step backwards to the opposite direction • Step-2.3: if the altitude of the new point is the same with the previous one then you are in a plateau, probably.
Geometrically (minimization) Local optimum Basin of attraction Derivative based methods Global optimum
Analyzing the strategy • There exists a mechanism for changing your position (i.e. for generating new candidate solutions) • There exists a method for evaluating your position (Objective function) • The Objective function is a mapping from the dimensional space: to the set of real numbers: , usually • The objective function is the common concept for all optimization methods. • The target is to find the value of usually noted as that maximizes/minimizes the value of the objective function. • Mathematically: for minimization for maximization
Analyzing the strategy • A mechanism for generating new solutions • A function for evaluating potential solutions • Methods based on the derivatives of the objective function • Newton, Backpropagation, Steepest ascent • Derivatives are required (problems, limitations) • Local optima trapping (momentum, etc.) • One starting point (multiple starting points) • The landscape is a plateau derivatives are zero • Population based methods • Derivative free • A population of points, evaluated in parallel • The mechanism for generating new solutions • Evaluation of solutions is the same.
Population based algorithms (minimization) • Population of solutions, simultaneously evaluated • No derivatives are required • By design capacity for escaping from local optima • For the next time step, a new population is derived from the previous one • fitness for GA • Premature convergence • Slow convergence (hill climbing operators) • Exploration capacity • Exploitation capacity • The ‘no free lunch’ theorem
Single and many optima • Multimodal objective functions • Unimodal objective functions
The real picture after some random points • We know • We know the objective function • And domain of parameter • The picture of the previous slide requires the evaluation of infinite points • We know if a solution is better than another one • We have to produce a better solution based on these observations • We don’t know • We don’t know something regarding the global optimum • We don’t know if a solution is global or local optimum • Actually, there is no optimization method that guarantees the location of global optimum.
Some special function landscapes • The landscape has wide flat areas (plateaus) • The basin of attraction is extremely narrow surrounded by a plateau (Needle in a Haystack) (e.g. Password detection) • The landscape has convex paths to a local optimum while the global optimum is located in the opposite direction (deceptive problems) • Narrow paths to the global optimum (Ridges of the search space) where the fitness function is worse for every point outside the path • These landscapes usually appear to any non convex objective functions • VeryDifficult for population based algorithms. • Unfeasible for derivative based methods
Constraint optimization (minimization) • General formulation S.t. A common trick is to convert the constraint optimization problem to an unconstraint one, by using penalty factors/functions. So, optimize the instead of . • : are arbitrarily large positive real numbers. • Penalty factors deteriorate the solutions that violate the constraints. • The penalty factors may accept constant values or increasing values as the optimization proceeds
Evolutionary algorithms’ family tree • Evolutionary algorithms (EA) are optimizationmethods • EA are not modelling tools • Modeling tools are Neural networks and fuzzy systems • An evolutionary algorithm could be used to optimize the parameters/structure of a model as any other optimization method
A brief history Τhere is no absolute agreement between historians, but it does not matters for engineers
Genetic Algorithms (GAs) • GAs are evolutionary algorithms • GAs simulates the Darwinian theory of natural evolution/natural selection • GAs are population based optimization algorithms • SGA: Simple Genetic Algorithm (Holland/Goldberg)
Anatomy of a SGA • A population of individuals • An individual is an entity that has at least two properties • A chromosome • A fitness value • The chromosome is a string of symbols usually bit string • The fitness value is a real number that corresponds to the chromosome of the individual • Practically, given an individual, its fitness value is computed from its chromosome according to the objective function being optimized
The chromosome • A string of symbols according to a selected alphabet • If the alphabet is then the chromosome is a binary string of bits gene Genotype Provided that this part of chromosome express something with specific meaning (e.ga variable) phenotype A useable number (genotype expressed). The integer number that a genotype substring expresses could be the phenotype (here 170) • Genotype space • Phenotype space
Encoding of one variable as binary stringalphabet , gene, genotype, allele, phenotype • Decision : The number of bits • The number of bits defines the resolution of variable • Gene: a single bit • Genotype: a part of chromosome that corresponds to a value with specific meaning • Phenotype: a genotype which is expressed (e.g. the value of a variable that participates in the computation of fitness) • Genotype, phenotype space
Decoding of one variable from the chromosome • Decision : the domain of the variable is required • a specific integer value should be linearly mapped in the interval • The integer value is linearly mapped to the real value • From the equation of line:
Formulating an optimization problem to be solved by EAs • Define the objective function to be optimized • Define the tunable parameters of the objective function and formulate them as a vector • Encoding of tunable variables into the chromosome • Define and memorize the number of bits for each tunable parameter. Consider each parameter as a tuple • The total length of chromosome is bits.
Decoding • The rangeof each tunable parameter is required • the part of chromosome that encodes the parameter starts from the bit • and ends to the bit
Decoding • : the integer phenotype value of the binary sub-string • : the real phenotype value in the actual range • The mapping is linear • example
Evaluation • After decoding all variables the vector of parameters is known • The respective fitness value is the value of the objective function on these specific parameter values • A single individual is evaluated
Initialization of SGA • Compute the chromosome length according to the number of parameters and the number of bits per parameter • Select the number of individuals to craft the population • Randomly initialize the chromosome of each individual • Evaluate each individual according to the objective function • The first population is created at or generation 0 • From the initial population we need to produce a new generation with the expectation that the children would be better than their parents (at least in average) • This is the reproduction procedure
Reproduction • Selection • Roulette wheel selection • Tournament selection • Proportional selection • Crossover • One point multipoint, uniform • Mutation • One point, multipoint, inversion • Elitism (not an SGA operator, but necessary for practical applications) • Problem specific operators (e.g. hill climbing) usually applied to the elite individual
Specific crossover and mutation operators • Floating point crossover and mutation
Mutation Many other mutation exist (e.g. insertion mutation, scramble mutation etc)
example Many other crossover operators exist, (e.g. PMX crossover, cycle crossover e.t.c.)
Fitness ranking • Premature convergence (conditions?) • Fitness scaling • Fitness ranking
The new Generation (reproduction) Mutation with probability child 1 child 2 child 1 child 2 Initial population Parent 1 Parent 2 Crossover With probability Selection 2 times Selects 2 parents Mutation with probability Place the new children To the new population • Repeat times to produce the new population • Optional apply elitism (no SGA operator but useful in practice) • How to make a function to return 1 with probability • Generate a uniform random number in [0,1] • If that number is less than the return 1 • Else return 0 New population
The process of SGA evolution Initialize the population Reproduction produce a new population No Is stopping criterion satisfied? yes End of evolution
GA Example Problem Optimize value of over where x is restricted to integers. The maximum value of 1 occurs at x = 128. Note: Function space is identical to fitness space.
GA Example Problem • One variable • Use binary alphabet • Therefore, represent each individual as 8-bit binary string • Set number of population members to (artificially low) 8 • Randomize population • Calculate fitness for each member of (initialized) population
GA Example Problem - Reproduction Reproduction is used to form new population of n individuals Select members of current population Use stochastic process based on fitnesses First, compute normalized fitness value for each individual by dividing individual fitness by sum of all fitnesses (fi / 5.083 in the example case) Generate a random number between 0 and 1 n times to form new population (sort of like spinning roulette wheel)