Genetic Programming With Boosting for Ambiguities in Regression Problem

Download Presentation

Genetic Programming With Boosting for Ambiguities in Regression Problem

Loading in 2 Seconds...

- 70 Views
- Uploaded on
- Presentation posted in: General

Genetic Programming With Boosting for Ambiguities in Regression Problem

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Genetic Programming With Boosting for Ambiguities in Regression Problem

Grégory Paris

Laboratoire d’informatique du Littoral

Université du Littoral-côte d’Opale

62228 Calais Cedex, France

Paris@lil.univ-littoral.fr

For a given x, several values are possible for f(x).

- Boosting to get several values
- Boosting in few words
- GPboost: our algorithm for regression problem
- Boosting deals with ambiguities, clusters the data

- Dealing with several values: Dendrograms
- Presentation
- Application

- Results and conclusion

- Introduced by Freund and Schapire in 90’s
- Improvement of machine learning methods
- For weak learners methods (methods that perform better than a random search)
- Decrease of error on learning set is assured
- Makes several hypothesis on different distributions
- Makes them vote to get a final hypothesis

- Iba’s version in 1999
- Distributions are used to build the fitness set
- Our version in 2001
- Distribution is included in the fitness function

Fitness set:

Distribution:

Each example has a weight

Initial weight is for each example

will be run T times (T rounds of boosting) with different distributions

« Weak Learner » :

: a GP algorithm including distribution in its fitness

Fitness function:

For do

Run using

The best-of-run is denoted

is the confidence given to function

: error on

: Normalization factor

Update distribution for the next round:

End For

Each function gives a value for x

A median weighted by confidence values is computed

Others medians provide similar results

- Principle of boosting is to focus on points which have not been matched on previous round
- In ambiguities, all the points can not be matched with one function
- Using weights to alternatively focus on ambiguities.

Target

rms

- e.g.

- We are seeking a fitness function which will focus on extrema rather than average points

- We run GPboost on this ambiguities problem
- We use our fitness function
- We set T=6, the number of rounds

- We are given 6 functions
- For a given x, we can provide 6 values
- We have to find a way to pick up 2 values among the 6.
- We propose dendrograms to solve this problem

- T values
- Cluster the set of values and take the median of each cluster
- To cluster the values, we build dendrogram
- Start with T clusters
- At each step, group the two nearest clusters

S={-1.1; -1; 0; 0.15; 1; 1.05}

- The dendrogram must be cut off at a height corresponding to the number of values we want.

- A fixed cut-off value gives better results but needs a priori knowledge of the problem
- Dynamic cut-off value
- The number of values will be computed in order to reduce the error made on fitness set on each ambiguity

Computing cut-off Value

With dynamic cut-off value

With static cut-off value

- Inverting

- Inverting

- Good results on classical and simple problems
To do

- Improving cut-off value
- Applying to real problems