“Conservation of Information in Evolutionary Search Algorithms: Measuring the Cost of Success”

“Conservation of Information in Evolutionary Search Algorithms:Measuring the Cost of Success” Robert J. Marks II

Abstract Conservation of information theorems indicate that any search algorithm performs on average as well as random search without replacement unless it takes advantage of problem-specific information about the search target or the search-space structure. Combinatorics shows that even a moderately sized search requires problem-specific information to be successful. Three measures to characterize the information required for successful search are (1) endogenous information, which measures the difficulty of finding a target using random search; (2) exogenous information, which measures the difficulty that remains in finding a target once a search takes advantage of problem-specific information; and (3) active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific information for successfully finding a target. A methodology is developed based on these information measures to gauge the effectiveness with which problem-specific information facilitates successful search. It then applies this methodology to various search tools widely used in evolutionary search.

Information • “The [computing] machine does not create any new information, but it performs a very valuable transformation of known information.” --Leon Brillouin, Science and Information Theory (Academic Press, New York, 1956).

What is Evolutionary Computation? • Simulation of Evolution on a Computer How good is each solution? A set of possible solutions Computer Model Survival of the fittest Next generation Mutation Duplicate, Mutate & Crossover Keep a set of the best solutions

Yagi-Uda antenna (1954)  Space of all parameters. Parameters that give results better than Yagi-Uda T Designed by Evolutionary Search at NASA http://ic.arc.nasa.gov/projects/esg/research/antenna.htm Search in Engineering Design • Can we do better? Engineers… • Create a parameterized model • Establish a measure design’s fitness • Search the N-D parameter space

Random vs. Assisted Search: Information is given to you... • Target Info • Warmer! • Interval Halving Target • Search Space Info • Steepest Descent • Conjugate Gradient Descent

Blind Search From the movie UHF

Search Space Assumption... 27 keys Apply Bernoulli's principle of insufficient reason “in the absence of any prior knowledge, we must assume that the events have equal probability Jakob Bernoulli, ``Ars Conjectandi'' (``The Art of Conjecturing), (1713). • Monkeys at a typewriter… Information Theoretic Equivalent: Maximum Entropy (A Good Optimization Assumption)

How Does Moore’s Law Help? Computer today searches for a target of B =10000 bits in a year. Double the speed. Faster Computer searches for a target of B + 1 =10001 bits in a year.

Defining “Impossible”

Converting Mass to Computing Power • Minimum energy for an irreversible bit Von Neumann-Landaurer limit = ln(2) k T = 1.15 x 10 -23joules • Mass of Universe ~ 1053 kg. Convert all the mass in the universe to energy (E=mc2) , we could generate 7.83 x 1092 Bits 1. Assuming background radiation of 2.76 degrees Kelvin

L How Long a Phrase? Target • IN THE BEGINNING ... EARTH • JFD SDKA ASS SA ... KSLLS • KASFSDA SASSF A ... JDASF • J ASDFASD ASDFD ... ASFDG • JASKLF SADFAS D ... ASSDF . . . • IN THE BEGINNING ... EARTH Expected number = NL N = 27 characters

How Long a Phrase from the Universe? Number of bits expected for a random search p=N-L 7.83 x 1092bits = NL log2NL For N = 27, p=N-L L = 63 characters

How Long a Phrase from the Multiverse?

Does Quantum Computing Help? Quantum computing reduces search time by a square root.  L. K. Grover, “A fast quantum mechanical algorithm for data search”, Proc. ACM Symp. Theory Computing, 1996, pp. 212--219.

Pr[tT ] =  Probability Search Space Pr()=1 Target T t Active Information in Search

Acceptable solutions  T Fitness Each point in the parameter space has a fitness. The problem of the search is finding a good enough fitness.

Search Algorithms Steepest Ascent Exhaustive Newton-Rapheson Levenberg-Marquardt Tabu Search Simulated Annealing Particle Swarm Search Evolutionary Approaches Problem: In order to work better than average, each algorithm implicitly assumes something about the search space and/or location of the target.

No Free Lunch Theorem With no knowledge of where the target is at and no knowledge about the fitness surface, one search performs, on average, as good as any another.

NFLT is obvious... Chances of opening lock in 5 tries is independent of the algorithm used.

Quotes on the need for added information for targeted search … • “…unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other” Yu-Chi Ho and D.L. Pepyne, (2001). • No free lunch theorems “indicate the importance of incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm.” David Wolpert & William G. Macready (1997). ``Simple explanantion of the No Free Lunch Theorem", Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, Florida, "No free lunch theorems for optimization", IEEE Trans. Evolutionary Computation 1(1): 67-82 (1997).

Therefore... • Nothing works better, on the average, than random search. • For a search algorithm like evolutionary search to work, we require active information.

9 6 ? Can a computer program generate more information than it is given If a search algorithm does not obey the NFL theorem, it “is like a perpetual motion machine - conservation of generalization performance precludes it.” Cullen Schaffer (1994) – anticipating the NFLT. • Cullen Schaffer, 1994. “A conservation law for generalization performance,”in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen, San Francisco: Morgan Kaufmann, pp.295-265.

 Probability Search Space Target T Targeted Search Bernoulli's Principle of Insufficient Reason = Maximum Entropy Assumption

Target T Endogenous Information  This is all of the information we can get from the search. We can get no more.

Probability of Success.Choose a search algorithm... Let be the probability of success of an evolutionary search. If there is no added information: If information has been added.

reference Active Information Checks: • For a “perfect search”, = all of the available information

Active Information Checks: 2. For a “blind query”, = no active information

EXAMPLES of ACTIVE INFORMATION Random SearchPartitioned SearchFOO Search in Alphabet & NucleotidesStructured Information (ev) Stepping Stone Search (Avida)

1. Active Information in Random Searches... For random search, for very small p Q = Number of Queries (Trials) p = success of a trial pS = chance of one or more successes

Active Information in Random Searches... • Active information is not a function of the size of the space or the probability of success – but only the number of queries. • There is a diminishing return. Two queries gives one bit of added information. Four queries gives two bits. Sixteen queries gives four bits, 256 gives 8 bits, etc.

XEHDASDSDDTTWSW*QITE*RIPOCFL • XERXPLEE*ETSXSR*IZAW**LPAEWL • MEQWASKL*RTPLSWKIRDOU*VPASRL 2. Active Information in Partitioned Search... • METHINKS*IT*IS*LIKE*A*WEASEL yada yada yada • METHINKS*IT*IS*LIKE*A*WEASEL

2. Active Information in Partitioned Search... • METHINKS*IT*IS*LIKE*A*WEASEL For random search For Partitioned Search Hints amplify the added information by a factor of L.

2. ActiveInformation in Partitioned Search... • Comparison METHINKS*IT*IS*LIKE*A*WEASEL Reality: For Partitioned Search For Random Search There is a lot of active information! L= 28 characters, 27 in alphabet

2.Domain knowledge can be applied differently resulting in varying degrees of active information The knowledge used in partitioned search can be used to find all the letters and spaces in an arbitrarily large library using only 26 queries.

Single Agent Mutation (MacKay) 3. Single Agent Mutation (MacKay) • Specify a target of bits of length L • Initiate a string of random bits. • Form two children with mutation (bit flip) probability of . • Find the best fit of the two children. Kill the parent and weak child. If there is a tie between the kids, flip a coin. • Go to Step 3 and repeat. (WLOG, assume target is all ones)

2 (L-k) k ones 1- 1- 2 (L-k) Single Agent Mutation (MacKay) 3. Single Agent Mutation (MacKay) If  <<1 , this is a Markov birth process.

128 bits = perfect search information I+ (Q) = 126.7516 bits  = 0.00005, L=128 bits 0.0022 bits per query

4. Active FOO Information FOO = frequency of occurrence Concise Oxford Dictionary (9th edition, 1995) Information of nth Letter Average information=Entropy

Kullback-Leibler Distance between FOO and Maximum Entropy English Alphabet Entropy • English • Uniform • FOO • Active information

Asymptotic Equapartition Theorem • A FOO structuring of a long message restricts search to a subspace uniform in .  Target T • For a message with L characters with alphabet of N letters... FOO Subspace

Asymptotic Equapartition Theorem • For King James Bible using FOO, the active information is I+ = 6.169 MB. Endogenous Information I = 16.717 MB • Can we add MORE information? digraphs trigraphs

Endogenous Information = 36 log2 27 =171 bits 5. Stepping Stone Search • STONE_ Establish Sub Alphabet • TEN_TOES_ Establish FOO • _TENSE_TEEN_TOOTS_ONE_TONE_TEST_SET_ Phrase SSS Active Information I+= 29 bits

Examples of Active Information • The NFL theorem has been useful to address the "sometimes outrageous claims that had been made of specific optimization algorithms“ S. Christensen and F. Oppacher, "What can we learn from No Free Lunch? A First Attempt to Characterize the Concept of a Searchable,“ Proceedings of the Genetic and Evolutionary Computation (2001).

"Torture numbers, and they'll confess to anything." Gregg Easterbrook

Example of Active Information

Schneider’s EV

String of 131 nucleotides 24 weights on [-511, 512] Bias [-511, 512] error 131 fixed binding site locations. 16 binding sites. Equivalent to inverting a perceptron:

The Function... The Results...

“Conservation of Information in Evolutionary Search Algorithms: Measuring the Cost of Success”