Digression: Symbolic Regression

1 / 18

# Digression: Symbolic Regression - PowerPoint PPT Presentation

Digression: Symbolic Regression. Suppose you are a criminologist, and you have some data about recidivism. Injects Heroin in Eyeballs. Recidivist. Years in Prison. Holds Ph.D. IQ. 10 0 87 1 1

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Digression: Symbolic Regression' - zeus-albert

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Digression: Symbolic Regression
• Suppose you are a criminologist, and you have some data about recidivism.

Injects Heroin

in Eyeballs

Recidivist

Years in

Prison

Holds

Ph.D

IQ

10 0 87 1 1

4 1 86 0 0

22 1 186 1 1

6 0 108 0 1

8 0 143 0 0

: : : : :

Criminology 101
• You want a formula that predicts if someone will go back to jail after being released.
• The formula will be based on the data collected, so the “independent variables” are
• x1 = number of years in jail
• x2 = holds Ph.D.
• x3 = IQ
• etc.
• This is usually done with “regression”. Here is a simpler example, with one independent variable.
Symbolic Regression
• A simple data set with one independent variable, called x. What’s the relationship between x and y?

x

y

y

1

2

4

5

7

:

2.1

3.3

3.1

1.8

3.2

:

x

Symbolic Regression
• You might try “linear regression:”

y

y = mx + b

x

Symbolic Regression
• You might try “quadratic regression:”

y

y = ax2 + bx + c

x

Symbolic Regression
• You might try “exponential regression:”

y

y = axb + c

x

Symbolic Regression
• How would you choose?
• Maybe there is some underlying “mechanism” that produced the data.
• But you may not know…
• “Symbolic regression” finds the form of the equation, and the coefficients, simultaneously.
How To Do Symbolic Regression?
• One way: genetic programming.
• “The evolution of computer programs through natural selection.”
• The brainchild of John Koza, extending work by John Holland.
• A very bizarre idea that actually works!
• We will do this.
Regression via Genetic Programming
• We know how to produce “algebraic expression trees.”
• We can even form them randomly.
• Koza says “Make a generation of random trees, evaluate their fitnesses, then let the more fit have sex to produce children.”
• Maybe the children will be more fit?
Expression Trees Again
• A one-variable tree is a regression equation:

+

*

-

x

2

+

x

x

.5

y = (((x + 0.5) - x) + (2 * x))

Evaluating Expression Trees

yp = (((x + 0.5) - x) + (2 * x))

x

yo yp |yo - yp|2

Superscripts:

“o” for “observed”

“p” for “predicted”

1

2

4

5

7

2.1 2.5 0.16

3.3 4.5 1.44

3.1 8.5 29.16

1.8 10.5 75.69

3.2 14.5 127.69

234.14 = “fitness”

A Generation of Random Trees

Tree 1

Tree 2

Tree 3

Tree 4

Tree Fitness

1 335

2 1530

3 950

4 1462

: :

(most of these are

really rotten!)

Choosing Parents

Tree 1

Tree 2

Tree 3

Tree 4

Generation 1

Tree Fitness

1 335

2 1530

3 950

4 1462

: :

Choose these two,

randomly, “proportional

to their fitness"

“Sexual Reproduction”

Choose “crossover

points”, at random

Generation 1

Then, swap the subtrees

to make two new child

trees:

Generation 2

The Steps
• Create Generation 1 by randomly generating 500 trees.
• Find the fitness of each tree.
• Choose pairs of parent trees, proportional to their fitness.
• Crossover to make two child trees, adding them to Generation 2.
• Continue until there are 500 child trees in Generation 2.
• Repeat for 50 generations, keeping the best (most fit) tree over all generations.
How Could This Possibly Work?
• No one seems to be able to say…
• John Holland proved something called the “schema theorem,” but it really doesn’t explain much.
• It’s a highly “parallel” process that recombines “good” building blocks.
• It really does work very well for a huge variety of hard problems!
Why This, in a Java Course?
• Because we’re going to implement it!
• Because writing code to implement this isn’t too hard.
• Because it illustrates a large number of O-O and Java ideas.
• Because it’s fun!
• Here is what my implementation looks like: