1 / 50

# MAE 552 Heuristic Optimization - PowerPoint PPT Presentation

MAE 552 Heuristic Optimization . Instructor: John Eddy Lecture #12 2/18/02 Intro to Evolutionary Algorithms. So Far:. Up until now, all the algorithms discussed operated on a single “current” design. Simulated Annealing – randomized perturbations of a single point.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' MAE 552 Heuristic Optimization ' - quinlan-mendoza

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### MAE 552 Heuristic Optimization

Instructor: John Eddy

Lecture #12

2/18/02

Intro to Evolutionary Algorithms

• Up until now, all the algorithms discussed operated on a single “current” design.

• Simulated Annealing – randomized perturbations of a single point.

• Greedy Algorithms – maximized local improvement.

• Consider operating on and maintaining an entire “population” of points simultaneously.

• So what? It would be easier to just run my single point algorithm many times or maybe on multiple processors to save wall clock time.

• We can now simulate the processes of natural selection and competition within our population. We can have our candidate designs fight for places in the population of future generations (iterations).

• Date back to the 1950’s.

• Many researchers independently developed different versions.

• Examples are:

• Genetic Algorithms, Evolution Strategies, Evolutionary Programming.

Most of the terminology is borrowed from Biology

• Phenotype:the "outward, physical manifestation" of an organism. The physical parts, the sum of the atoms, molecules, macromolecules, cells, structures, metabolism, energy utilization, tissues, organs, reflexes and behaviors; anything that is part of the observable structure, function or behavior of a living organism.

• Genotype: This is the "internally coded, heritable information" carried by all living organisms. This stored information is used as a "blueprint" or set of instructions for building and maintaining a living creature.

• Gene: The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product.

• Chromosome: The self-replicating genetic structures of cells each containing the entire genome of an organism.

• Alleles: Alternative forms of a genetic locus.

• Crossing Over:The breaking during meiosis of one maternal and one paternal chromosome, the exchange of corresponding sections of DNA, and the rejoining of the chromosomes. This process can result in an exchange of alleles between chromosomes.

• Mutation: A heritable change in the genetic makeup of an organism

We are not constrained by any of the rules of biological systems.

For example, we can have as many parents as we wish contribute to the makeup of our offspring, we can have members that live forever (don’t age).

What is important to note here is that we are using nature as a model for our mathematical algorithms.

• An encoded representation of solutions to the problem.

• Ex. binary encoding, real number encoding, integer encoding, data structure encoding.

• A means of generating an initial population.

• Ex. random initialization, patterned initialization.

• A means of evaluating design fitness.

• Need a consistent means of determining which designs are “better” than others.

• Operators for producing and selecting new designs.

• Ex. selection, crossover, mutation.

• Values for the parameters of the algorithm.

• Ex. How much crossover and mutation, how big is population.

• General equation describing most evolutionary algorithms is:

Where:

x[t] is the population at time t

v(*) is/are the variation operator(s)

s(*) is the selection operator.

• An encoded representation of solutions to the problem.

• Ex. binary encoding, real number encoding, integer encoding, data structure encoding.

• A means of generating an initial population.

• Ex. random initialization, patterned initialization.

• A means of evaluating design fitness.

• Need a consistent means of determining which designs are “better” than others.

• Operators for producing and selecting new designs.

• Ex. selection, crossover, mutation.

• Values for the parameters of the algorithm.

• Ex. How much crossover and mutation, how big is population.

• Vectors of integers.

• Useful for TSP, Integer problems.

Possible Trips:

[ 1 8 6 5 2 3 4 7 ]

[ 8 2 5 6 3 1 7 4 ]

[ 2 4 6 3 7 5 1 8 ]

(where return home is implied).

TSP:

1

2

4

3

6

8

5

7

• Vectors of real numbers.

• Useful for continuous problems.

Possible Design Configurations:

[ 13.65, -1.25, 30.98 ]

[ 0.67, 14.81, 67.15 ]

[ 53.74, 12.54, -21.32 ]

Min:f(x) = x[1]2 + x[2] – x[3]3 – 50

s.t. g(x) ≤ 0

h(x) = 0

xl ≤ x ≤ xu

• Vectors of binary bits.

• Useful for packing and shipping problems.

What’s in the bag:

[ 0 1 1 0 ]

[ 1 0 1 0 ]

[ 1 1 1 1 ]

4

2

1

3

• Combination of the previous types.

• Useful for variable length lists. (perhaps a list of continuous numbers where an integer indicates the size of the list).

Example:

What do we do:

form: [ N, a0,a1, a2, … aN ]

Possible Solutions:

[ 3, 2.65, 4.25, 3.14 ]

[ 2, 5.32, 2.81 ]

[ 4, 3.21, 4.25, 9.65, 7.28 ]

y

t

Combine: *, +, -, /, % (mod), sin, cos, tan, etc.

With:

Possible Solutions:

Encoding

• Symbolic expressions

• Useful for mapping problems (control problems etc. typically requires a parser. Data structure is typically a tree.)

It is common to use binary encoding for problems involving integer, real, and binary type variables.

Previously we saw that vectors of bits may be useful for problems involving binary state variables (T, F) (like is item 1 in the bag?)

How in a minute, first why?

Primarily because of flexibility (handles many types of variables) and because we can take advantage of the way that computers work.

Also, this method of encoding lends itself to a number of common variation operators as we will see.

A note on flexibility:

Flexibility usually comes at the expense of optimality. In our case, this method may not work best for many of the problems we do, but it should work fairly well for many of them.

Specialization on a problem by problem basis will usually improve performance.

Each of our design variable values may be represented as vector of 1’s and 0’s. For example:

BinaryDecimal

00000000 0

00101101 45

11111111 255

1101.11 13.75

Therefore, since our design is defined by the collection of its variables, the design can be written as a long string of bits.

Example:

For a design [ 4, 6, 2 ]

We can equivalently write

[ 0100, 0110, 0010 ]

Back to why.

We will not likely be doing this by hand. We will probably us a computer.

All numbers in a computer are represented as a string of bits ( 1’s and 0’s).

We can take advantage of this.

Because of this, it is not necessary to explicitly create vectors of bits to represent our design variables.

Advantages (assuming we decided on BE):

• Memory efficient

• 32 bit integer value requires 4 bytes instead of what would be a minimum of 32 bytes otherwise.

Advantages (vs. explicit vectors of bits):

• Code simplification

• No explicit conversion from binary to decimal is necessary for use of the design variables.

• Most languages support operation directly on the bits of integers.

• Most bitwise operators only work on integral types and probably most of our variables are real.

• Solution: Specify a precision with which to keep each design variable and convert to a long integer before any bit manipulation.

Example of conversion using precision.

Given X1 = 12.6345 - desired precision = 3

X1-int = (int)[(12.6345)(103)] = 12634

To improve accuracy, perhaps round X1 prior to conversion. To get back the original, simply divide X1-int by 103.

In general:

Xi-int = Xi * 10(prec) (truncated).

Xi = Xi-int / 10(prec)

• An encoded representation of solutions to the problem.

• Ex. binary encoding, real number encoding, integer encoding, data structure encoding.

• A means of generating an initial population.

• Ex. random initialization, patterned initialization.

• A means of evaluating design fitness.

• Need a consistent means of determining which designs are “better” than others.

• Operators for producing and selecting new designs.

• Ex. selection, crossover, mutation.

• Values for the parameters of the algorithm.

• Ex. How much crossover and mutation, how big is population.

• Quite simply, the population must be initialized in any way you wish.

• Some Possibilities:

Random

Patterned

x2

x2

x1

x1

• An encoded representation of solutions to the problem.

• Ex. binary encoding, real number encoding, integer encoding, data structure encoding.

• A means of generating an initial population.

• Ex. random initialization, patterned initialization.

• A means of evaluating design fitness.

• Need a consistent means of determining which designs are “better” than others.

• Operators for producing and selecting new designs.

• Ex. selection, crossover, mutation.

• Values for the parameters of the algorithm.

• Ex. How much crossover and mutation, how big is population.

• It is necessary to provide a consistent means of evaluating the fitness of a design.

A ≥ B ≥ C implies A ≥ C

(those who have studied utility theory and preference ranking know that this does not always hold)

• The closer to optimal a point is, the better it’s fitness should be (provides direction).

Consider the case of functions of binary bits xi = {0, 1}

Maximize:

What will be the result of these two functions (what is the difference)?

• How can we avoid this Problem?

• In this particular case, perhaps we can make our fitness value a count of the number of DV’s with a value of 1.

• This is a very problem specific question and will usually require knowledge about the problem.

• Important concept is that fitness is not limited to the objective function value and commonly is not.

• Creative measures of fitness can greatly improve the performance of the algorithm and may have a strong dependency on the choice of encoding and use of operators.

• An encoded representation of solutions to the problem.

• Ex. binary encoding, real number encoding, integer encoding, data structure encoding.

• A means of generating an initial population.

• Ex. random initialization, patterned initialization.

• A means of evaluating design fitness.

• Need a consistent means of determining which designs are “better” than others.

• Operators for producing and selecting new designs.

• Ex. selection, crossover, mutation.

• Values for the parameters of the algorithm.

• Ex. How much crossover and mutation, how big is population.

• The variation operators provide means of generating new designs.

• They should be set up to leverage information discovered in previous design evaluations.

• Choice of variation operators is tightly coupled with choice of encoding (as we will see as we progress).

• Problems are most efficiently solved when the proper operators are chosen and tailored to the problem at hand.

• Crossover is the inclusion of or combination of “genetic” material from one or more designs to create new designs.

(recall biological definition).

• Appropriate choice of a crossover strategy is highly dependent on choice of encoding and evaluation function.

• Consider a plain integer problem (not TSP) and two possible designs:

• Could probabilistically choose values from the vectors:

• X1 = [ 10, 15, 9, 7, 19 ]

• X2 = [ 17, 2, 14, 31, 3 ]

• C1 = [ 10, 2, 14, 7, 19 ]

• C2 = [ 17, 15, 9, 31, 3 ]

• Could choose 1 or more random crossover point(s):

• X1 = [ 10, 15, 9, 7, 19 ]

• X2 = [ 17, 2, 14, 31, 3 ]

• C1 = [ 10, 15, 14, 31, 3 ]

• C2 = [ 17, 2, 9, 7, 19 ]

-Could increment or decrement each according to a Gaussian dist. with mean of zero and std dev. would then be a measure of the probability of large changes.

• Could use same strategies listed for ints.

• Another approach is to use arithmetic crossover (taken from convex set theory).

• Basic Equations:

• C1 = λ1X1 + λ2X2

• C2 = λ1X2 + λ2X1

• This is a weighted average approach:

Convex Combination: λ1 +λ2 = 1andλ1, λ2 > 0

Affine Combination: λ1 + λ2 = 1

Linear Combination: λ1, λ2 En

First Define Hamming Distance:

Given 2 expressions, the Hamming distance is the number of characters that must be changed to make the expressions equivalent.

So DH for 01101 and 10100 is 3

and DH for 01111 and 10000 is 5

We will consider this later.

• Could use 1st two strategies for ints. (3rd strategy would not make sense).

• Could iterate through vector and according to a given probability, change the bits.

• BE lends itself well to Parameterized Crossover (we have already seen a parameterized approach)

• Because each variable (parameter) is in itself a string of bits, we can operate on each variable separately.

• Example Single Point Parameterized:

• Given two designs

[ 0100, 0110, 0010 ] = [ 4, 6, 2 ]

[ 1010, 0111, 1111 ] = [ 9, 7, 15 ]

• Example Single Point Parameterized:

• Choose a crossover point for each variable

[ 0 100, 01 10, 001 0 ] = [ 4, 6, 2 ]

[ 1 010, 01 11, 111 1 ] = [ 9, 7, 15 ]

• Then perform crossover as before.

• Example Single Point Parameterized:

• Results for this case:

[ 0 100, 01 10, 001 0 ] = [ 4, 6, 2 ]

[ 1 010, 01 11, 111 1 ] = [ 9, 7, 15 ]

[ 0010, 0111, 0011] = [ 2, 7, 3 ]

[ 1100, 0110, 1110] = [ 12, 6, 14 ]

Prnts

Cldrn

• Notice that one child tends to be like one parent and the other tends to be like the other parent.

A few of the strategies we mentioned can be used as-are (ex. random picking).

However, for maximum efficiency, this will likely require a combination of the aforementioned crossover strategies. This case will likely require highly specialized operators (very problem dependent).

• Assume using tree structures

• Now, crossover can occur by combining the branches of one tree with another.