1 / 23

Fuzzy Inference System Learning By Reinforcement

Fuzzy Inference System Learning By Reinforcement. Presented by Alp Sardağ. A Comparison of Fuzzy & Classical Controllers. Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. Rules close to natural language

hei
Download Presentation

Fuzzy Inference System Learning By Reinforcement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ

  2. A Comparison of Fuzzy & Classical Controllers • Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. • Rules close to natural language • A priori knowledge • Classical Controller: Need analytical task model.

  3. Design Problem of FC • A priori knowledge extraction is not easy: • Disagreement between experts • Great number of variables necessary to solve the control task

  4. Self Tunning FIS • A direct teacher: based on input-output set of trainning data. • A distal teacher: does not give the correct actions, but the desired effect on the process. • A performance measure: EA • A critic: gives rewards and punishment with respect to state reached by the learner. RL methods. • There are no more than two fuzzy sets activated for an input value

  5. Goal • To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.

  6. A MIMO FIS FIS is made of N rules of the following form: Ri: ith rule of the rule base Si:input variables Lij: linguistic term of input variable; its membership function Lij YNO:output variables Oij: linguistic term of output variable

  7. Rule Preconditions • Membership functions are triangles and trapezoids (altough not differentiable). • because they are simple • Sufficient in a number of application • Strong fuzzy partition used: • All values activate at least one fuzzy set, the input universe is completely covered.

  8. Strong Fuzzy Partition Example

  9. Rule Conclusions • Each of i rule has No corresponding conclusions: • For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product: • The FIS outputs are

  10. Learning • Number and positions of the input fuzzy labels being set using a priori knowledge. • Structural Learning: consists in tuning the number of rules. • FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.

  11. Reinforcement Learning NOTE: state observability is total.

  12. Markovian Decision Problem • S a finite discrete state • U a finite discrete action • R primary reinforcements R:SxUR • P transition probabilities P:SxUxS [0,1]. • State evaluation function:

  13. The Curse of Dimensionality • Some form of generalization must be incorporated in state representation. Various function approximators used: • CMAC • Neural Networks • FIS: the state space encoding is based on a vector corresponding to the current state.

  14. Adaptive Heuristic Critic • AHC is made of two components: • Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values). • Associative Search Element: selects actions which lead to better critic values

  15. FACL Scheme

  16. The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:

  17. The Actor • When the rule Ri is activated, one of the Ri local action is elected to participate in the global action, based on its quality. The global action triggered: where -greedy is a function implementing mixed exploration-exploitation strategy.

  18. Tunning vector w • TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:

  19. Meta Learning Rule • Update strategie for learning rate: • Every parameter should have its learning rate. (=1n) • Every learning rate should be allowed to vary over time. (in order V values to converge) • When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased. • When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:

  20. Execution Procedure • Estimation of evaluation function corresponding to the current state. • Computation of the TD error. • Tunning of parameter vector v and w. • Estimation of the new evaluation function for the current state with new conclusion vector vt+1. • Learning rate updating with Delta-Bar-Delta rule. • For each activated rule, election of the local action: computation and triggering of the global action Ut+1.

  21. Example

  22. Example Cont. • The number of rules is twenty five. • For the sake of simplicity, the discerete actions available are the same for all rules. • The discerete action set: • The reinforcement function:

  23. Results • Performance measure for distance: • Results:

More Related