1 / 13

No Free Lunch (NFL) Theorem

No Free Lunch (NFL) Theorem. Presentation by Kristian Nolde. Many slides are based on a presentation of Y.C. Ho. General notes. Goal: Give an intuitive feeling for the NFL Present some mathemtical background To keep in mind NFL is an impossibility theorem, such as

vidor
Download Presentation

No Free Lunch (NFL) Theorem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No Free Lunch (NFL)Theorem Presentation by Kristian Nolde Many slides are based on a presentation of Y.C. Ho

  2. General notes Goal: • Give an intuitive feeling for the NFL • Present some mathemtical background To keep in mind • NFL is an impossibility theorem, such as • Gödel‘s proof in mathematics (roughly: some facts cannot be proved or disaproved in any mathematical system) • Arrow‘s theorem in economics (in principle, perfect democracy is not realizable) • Thus, practicle use is limited ?!?

  3. The No Free Lunch Theorem • Without specific structural assumptions, no optimization scheme can perform better than blind search on the average • But blind search is very inefficient! • Prob (at least one out of N samples is in the top-n for search space of size |Q|) ~ nN/|Q| ex. Prob=0.0001 for |Q|=109, n=1000, N=1000

  4. Assume a finite World Finite # of input symbols (x’s) and finite # of output symbols (y’s) => finite # of possible mappings from input to output (f’s)

  5. f1 f2 f|F| 0 1 1 x1 0 1 1 1 0 x2 1 0 0 0 1 0 x|X| 1 1 The Fundamental Matrix F In each row, each value of Y appear |Y| |X|-1 times! FACT: equal number of 0’s and 1’s in each row! Averaged over all f, the value is independent of x!

  6. Compare Algorithms • Think of two algorithms: a1 and a2e.g. a1 always selects from x1 to x.5|X| a2 always selects from x.5|X| to x|X| • For specific f: a1 or a2 may be bettter. However, if f is not known average performance of both is equal: where d is a sample and dy is the cot value associated with d.

  7. Comparing Algorithms Continued • Case 1: Algorithms can be more specific, e.g. assume a certain realization fk, a1 • Case 2: Or, they can be more general, assume more uniform distribution of possible f, a2. • Then performance of a1 will be excellent for fk but catastrophic for all other cases (great performance, no robustness) • Contrary, a2 performs mediocre for all cases, but doesn‘t fail (poor performance, high robustness) Common Sense says: Robustness * Efficiency = Constant or Generality * Depth = Constant

  8. Implication 1 • Let x be the optimization variable, f the performance function, and y the performance, i.e., y=f(x) • then averaged over all possible optimization problems, the result is choice independent • if you don’t know the structure of f (which column you are dealing with), blind choice is as good as any!

  9. Implications 2 • Let X be the space of all possible representation (as in genetic algorithms), or space of all possible algorithms to apply to a class of problems • Without understanding of the problem, blind choice is as good as any. • “understanding” means you know which column of the F matrix you are dealing with

  10. Implications 3 • Even if you know which columns or group of columns you are dealing with => you can specialize the choice of rows • You must accept that you will suffer LOSSES should other choices of column occur due to uncertainties or disturbances

  11. f1 f2 f|F| 0 1 1 x1 0 1 1 1 0 x2 1 0 0 0 1 0 x|X| 1 1 The Fundamental Matrix F Assume a distribution of the columns, then pick a row that results in minimal expected losses or maximal performance. This is stochastic optimization

  12. Implications 5 • Worse, if you should estimate the probabilities incorrectly, then your stochastically optimized solution may suffer catastrophic bad outcomes more frequent then you like. • Reason: you have already used up more of the good outcomes in your “optimal” choice. What are left are bad ones that are not suppose to occur! (HOT Design & power law -Doyle)

  13. Implications 6 • Generality for generality sake is not very fruitful • Working on a specific problem can be rewarding • Because: • the insight can be generalized • the problem is practically important • the 80-20 effect

More Related