Fuzzy Inference System Learning By Reinforcement

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ

A Comparison of Fuzzy & Classical Controllers • Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. • Rules close to natural language • A priori knowledge • Classical Controller: Need analytical task model.

Design Problem of FC • A priori knowledge extraction is not easy: • Disagreement between experts • Great number of variables necessary to solve the control task

Self Tunning FIS • A direct teacher: based on input-output set of trainning data. • A distal teacher: does not give the correct actions, but the desired effect on the process. • A performance measure: EA • A critic: gives rewards and punishment with respect to state reached by the learner. RL methods. • There are no more than two fuzzy sets activated for an input value

Goal • To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.

A MIMO FIS FIS is made of N rules of the following form: Ri: ith rule of the rule base Si:input variables Lij: linguistic term of input variable; its membership function Lij YNO:output variables Oij: linguistic term of output variable

Rule Preconditions • Membership functions are triangles and trapezoids (altough not differentiable). • because they are simple • Sufficient in a number of application • Strong fuzzy partition used: • All values activate at least one fuzzy set, the input universe is completely covered.

Strong Fuzzy Partition Example

Rule Conclusions • Each of i rule has No corresponding conclusions: • For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product: • The FIS outputs are

Learning • Number and positions of the input fuzzy labels being set using a priori knowledge. • Structural Learning: consists in tuning the number of rules. • FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.

Reinforcement Learning NOTE: state observability is total.

Markovian Decision Problem • S a finite discrete state • U a finite discrete action • R primary reinforcements R:SxUR • P transition probabilities P:SxUxS [0,1]. • State evaluation function:

The Curse of Dimensionality • Some form of generalization must be incorporated in state representation. Various function approximators used: • CMAC • Neural Networks • FIS: the state space encoding is based on a vector corresponding to the current state.

Adaptive Heuristic Critic • AHC is made of two components: • Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values). • Associative Search Element: selects actions which lead to better critic values

FACL Scheme

The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:

The Actor • When the rule Ri is activated, one of the Ri local action is elected to participate in the global action, based on its quality. The global action triggered: where -greedy is a function implementing mixed exploration-exploitation strategy.

Tunning vector w • TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:

Meta Learning Rule • Update strategie for learning rate: • Every parameter should have its learning rate. (=1n) • Every learning rate should be allowed to vary over time. (in order V values to converge) • When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased. • When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:

Execution Procedure • Estimation of evaluation function corresponding to the current state. • Computation of the TD error. • Tunning of parameter vector v and w. • Estimation of the new evaluation function for the current state with new conclusion vector vt+1. • Learning rate updating with Delta-Bar-Delta rule. • For each activated rule, election of the local action: computation and triggering of the global action Ut+1.

Example

Example Cont. • The number of rules is twenty five. • For the sake of simplicity, the discerete actions available are the same for all rules. • The discerete action set: • The reinforcement function:

Results • Performance measure for distance: • Results:

Fuzzy Inference System Learning By Reinforcement

Fuzzy Inference System Learning By Reinforcement

Presentation Transcript

Fuzzy Inference System (FIS) and Matlab Fuzzy Logic Toolbox

Fuzzy Inference (Expert) System

Introduction to Fuzzy Inference

Reinforcement Learning

Fuzzy Inference Systems

Fuzzy Inference and Reasoning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Fuzzy Reinforcement Learning Agents

REINFORCEMENT LEARNING

Fuzzy inference system and learning 08 july 2014

Fuzzy Inference Systems

Takagi-Sugeno Fuzzy Inference - Parametric Fuzzy System -

Fuzzy Inference Systems

Fuzzy Inference Systems

Chapter 9 Fuzzy Inference

Reinforcement Learning