1 / 18

Digression: Symbolic Regression

Digression: Symbolic Regression. Suppose you are a criminologist, and you have some data about recidivism. Injects Heroin in Eyeballs. Recidivist. Years in Prison. Holds Ph.D. IQ. 10 0 87 1 1

zeus-albert
Download Presentation

Digression: Symbolic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digression: Symbolic Regression • Suppose you are a criminologist, and you have some data about recidivism. Injects Heroin in Eyeballs Recidivist Years in Prison Holds Ph.D IQ 10 0 87 1 1 4 1 86 0 0 22 1 186 1 1 6 0 108 0 1 8 0 143 0 0 : : : : :

  2. Criminology 101 • You want a formula that predicts if someone will go back to jail after being released. • The formula will be based on the data collected, so the “independent variables” are • x1 = number of years in jail • x2 = holds Ph.D. • x3 = IQ • etc. • This is usually done with “regression”. Here is a simpler example, with one independent variable.

  3. Symbolic Regression • A simple data set with one independent variable, called x. What’s the relationship between x and y? x y y 1 2 4 5 7 : 2.1 3.3 3.1 1.8 3.2 : x

  4. Symbolic Regression • You might try “linear regression:” y y = mx + b x

  5. Symbolic Regression • You might try “quadratic regression:” y y = ax2 + bx + c x

  6. Symbolic Regression • You might try “exponential regression:” y y = axb + c x

  7. Symbolic Regression • How would you choose? • Maybe there is some underlying “mechanism” that produced the data. • But you may not know… • “Symbolic regression” finds the form of the equation, and the coefficients, simultaneously.

  8. How To Do Symbolic Regression? • One way: genetic programming. • “The evolution of computer programs through natural selection.” • The brainchild of John Koza, extending work by John Holland. • A very bizarre idea that actually works! • We will do this.

  9. Regression via Genetic Programming • We know how to produce “algebraic expression trees.” • We can even form them randomly. • Koza says “Make a generation of random trees, evaluate their fitnesses, then let the more fit have sex to produce children.” • Maybe the children will be more fit?

  10. Expression Trees Again • A one-variable tree is a regression equation: + * - x 2 + x x .5 y = (((x + 0.5) - x) + (2 * x))

  11. Evaluating Expression Trees yp = (((x + 0.5) - x) + (2 * x)) x yo yp |yo - yp|2 Superscripts: “o” for “observed” “p” for “predicted” 1 2 4 5 7 2.1 2.5 0.16 3.3 4.5 1.44 3.1 8.5 29.16 1.8 10.5 75.69 3.2 14.5 127.69 234.14 = “fitness”

  12. A Generation of Random Trees Tree 1 Tree 2 Tree 3 Tree 4 … Tree Fitness 1 335 2 1530 3 950 4 1462 : : (most of these are really rotten!)

  13. Choosing Parents Tree 1 Tree 2 Tree 3 Tree 4 Generation 1 … Tree Fitness 1 335 2 1530 3 950 4 1462 : : Choose these two, randomly, “proportional to their fitness"

  14. “Sexual Reproduction” Choose “crossover points”, at random Generation 1 Then, swap the subtrees to make two new child trees: Generation 2

  15. The Steps • Create Generation 1 by randomly generating 500 trees. • Find the fitness of each tree. • Choose pairs of parent trees, proportional to their fitness. • Crossover to make two child trees, adding them to Generation 2. • Continue until there are 500 child trees in Generation 2. • Repeat for 50 generations, keeping the best (most fit) tree over all generations.

  16. How Could This Possibly Work? • No one seems to be able to say… • John Holland proved something called the “schema theorem,” but it really doesn’t explain much. • It’s a highly “parallel” process that recombines “good” building blocks. • It really does work very well for a huge variety of hard problems!

  17. Why This, in a Java Course? • Because we’re going to implement it! • Because writing code to implement this isn’t too hard. • Because it illustrates a large number of O-O and Java ideas. • Because it’s fun! • Here is what my implementation looks like:

More Related