1 / 56

Parameter Learning from Stochastic Teachers and Stochastic Compulsive Liars

B. John Oommen SCS, Carleton University Plenary Talk : PRIS’04 (Porto). Parameter Learning from Stochastic Teachers and Stochastic Compulsive Liars. A Joint Work with G. Raghunath and B. Kuipers. Right. Left. Problem Statement. We are learning from:

leslieh
Download Presentation

Parameter Learning from Stochastic Teachers and Stochastic Compulsive Liars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. B. John Oommen SCS, Carleton University Plenary Talk : PRIS’04 (Porto) Parameter Learning from Stochastic Teachers and Stochastic Compulsive Liars A Joint Work with G. Raghunath and B. Kuipers

  2. Right Left Problem Statement We are learning from: a Stochastic Teacher or a Stochastic Liar Do I go … ? DEATH LIFE

  3. Problem Statement Teacher / Liar : Identity Unknown Tells Lies Tells Truth Life Gate Death Gate • We have to ask only ONE question, • Go through the Gate to LIFE.

  4. 0 1 General Version of this Problem We have : • A Stochastic Teacher or a Liar • The Answer in [0,1] ; Any point in Interval

  5. General Version of this Problem Question: Shall we go Left or Right ? Teacher Go Right with prob. p Go Left with prob. 1 - p , where p > 0.5 Liar Do the same with p < 0.5

  6. Stochastic Teacher We are on a Line Searching for a Point Don't know how far we are from the point We interact with a Deterministic Teacher Charges us with how far we are from point If Points are Integers : Problem can be solved in O(N) steps Question : What shall we do if the Teacher is Stochastic

  7. Model of Computation We are searching for a point l* [0,1] We interact with a Stochastic Teacher b(n) : Response from the Environment • Move Left / Right • Stochastic -- Erroneous

  8. Model of Computation Suppose l* = 0.8725 If the Current choice for l is 0.6345 We get a response Go Right with prob. p Go Left with prob. 1- p Fortunately,The Environment is Informative i.e., p > 0.5 Question : Can we still learn Best parameter ? Numerous applications:NONLINEAR OPTIMIZATION

  9. Application to Optimization Aim in Optimization : Minimize (maximize) a criterion function Generally speaking : The algorithm works from a "current" solution towards the Optimal (???) solution Based on information it currently has. Crucial Issue: What is the Parameter the Optimization Algorithm should use.

  10. Application to Optimization If the parameter is Too Small the convergence is Sluggish. If it is Too Large Erroneous Convergence or Oscillations. In many cases the parameter related to the second derivative analogous to a "Newton's" method.

  11. Application to Optimization First Relax Assumptions onl Normalize if bounds on parameter known : l = (m - mmin)/(mmax - mmin) Thus l* [0,1] In the case of Neural Networks Functions range of parameter varies from 10-3 to 103 Use amonotonic one-to-one mapping l := A. Logbm for some b.

  12. Application to Optimization SincelConverges Arbitrarily Close tol* mconverges Arbitrarily Close tom*. Can Learn the Best Parameter in NN...

  13. Proposed Solution Work in a Discretized space Discretizel to be element of a finite set Makes steps from one value of l to the next Based on Response from the Environment

  14. Advantages of Discretizing in Learning Advantages of Discretizing in Learning (i) Practical considerations Random Number Generator Typically finite accuracy Action probability not any real number (ii) Probability Changes Jumps and not continuously. Convergence in "finite" time possible (iii) Proofse-optimal different Discrete State Markov Chain

  15. 0 2 4 6 8 Advantages of Discretizing in Learning (iv) Rate of convergence Faster than continuous schemes Increase probability to unity directly rather than asymptotically. (v) Reduces the time per iteration Addition is quicker than Multiplication Don't need floating point numbers Generally : Discrete algorithms are superior In terms of both time and space.

  16. The Learning Algorithm Assume Current Value for l is l(n). Then : In Internal State (i.e., 0 < l(n) < 1) : If E suggests Increasingl l(n+1) := l(n) + 1/N Else {If E suggests Decreasingl } l(n+1) := l(n) - 1/N

  17. The Learning Algorithm At End States : If l(n) = 1 If E suggests Increasingl l(n+1) := l(n) Else {If E suggests decreasing l } l(n+1) := l(n) - 1/N If l(n) = 0 If E suggests Decreasingl l(n+1) := l(n) Else {If E suggests decreasing l } l(n+1) := l(n) + 1/N

  18. The Learning Algorithm Note : Rules are Deterministic State Transitions are stochastic Because “Environment” is stochastic. Properties of this Scheme Theorem I The learning algorithm is e-optimal. Sketch of Proof : We can prove that as N is increased :

  19. The Learning Algorithm The States of the Markov Chain Integers {0,1,2,..., N} State 'i' refers to the value i/N. Good Advice - w.p. p Wrong advice - w.p. q = 1 - p

  20. The Learning Algorithm Let Z be the index for which Z/N < l* < (Z+1)/N. Then if q = 1-p, the Transition Matrix T is : • Ti,i+1 = p if 0  i  Z • = q if Z < i  N-1. • Ti,i-1 = q if 1 < i  Z • = p if Z < i  N. For the self loops : T0,0 = q TN,N = q

  21. The Markov Matrix By a lengthy induction it can be proved that : pi = e.pi-1whenever i  Z. pi = pi-1 / ewhenever i > Z, and, pZ+1 = pZ. where e = p/q < 1.

  22. The Learning Algorithm To Conclude the proof compute : E[l()] asN . INCREASE / DECREASE : GEOMETRIC …...

  23. l* The Learning Algorithm As N  , the Prob. Mass looks like this : An Increasing Geometric series 0 till Z. Most of the mass near Z. A Decreasing Geometric series Z+1 till N. Most of the mass near Z. Indeed, E[l()] is arbitrarily close tol*.

  24. Example Partition interval [0,1] into eight intervals {0, 1/8 , 2/8 , … , 7/8 , 1) } Supposel* is 0.65. (i) All transitions for {0, 1/8, 2/8 , 3/8, 4/8, 5/8 } Increased with prob. p Decreased with prob. q (ii) All transitions for {6/8, 7/8 , 1} are : Decreased with prob. p Increased with prob. p

  25. Example In this case, If e := p/q : po K ; pi K.e; p2 K.e2 p3 K.e3 ; p4 K.e4; p5 K.e5 p6 K.e5 ; p7 K.e4; p8 K.e3 GEOMETRIC

  26. Experimental Results Table I : True value of E[l()] for various p and Various Resolutions, N. l* is 0.9123.

  27. Experimental Results Figure I : Plot of E[l()] with N.

  28. Experimental Results Figure II : Plot of E[l()] with p.

  29. p1(n) p2(n) p3(n) p4(n) 0.4 0.3 0.1 0.2 If 2 Chosen & Rewarded. p2 increased; p1, p3, p4 decreased linearly. = p1(n+1) p2(n+1) p3(n+1) p4(n+1) 0.36 1-0.36-0.9-0.18 0.09 0.18 0.36 0.37 0.09 0.18 = = 1 0 0 0 p1() p2() p3() p4() If 1 is the best action: Continuous Solution : LRI Scheme

  30. Systematically Explore the Given Interval • 123 Continuous Solution • Use mid-point as the Initial Guess • Partition the interval into 3 sub-intervals • Use -optimal learning in each Sub-interval • l* Here ? • l* Here ? • l* Here ?

  31. Is l*Left, Inside or Right of sub-interval?? • l* Here ? • l* Here ? • l* Here ? • 123 Continuous Solution • Intelligently Eliminate sub-intervals • Recursively Search the remaining sub-interval(s) until the width is small enough • The mid point of this interval is the result.

  32.  = [,) : Current interval containing *. It is Partitioned into j=1,2,3 sub-intervals. * is in exactly one of sub-intervals j=1,2,3. -optimal Automata decides RelativePosition of j with respect to *. Since points on the real interval are Monotonically Increasing, 1 < 2 <3 • l* Here ? • l* Here ? • l* Here ? • 123 Adaptive Tertiary Search

  33. Pruned Interval   / so that: * in / / is one of {1, 2, 3, 1U 2, 2U 3} • 123 Adaptive Tertiary Search • l* Here ? Yes

  34. Because of monotonicity of intervals Because of -optimality of the schemes The above two constraints indicate that the Search converges Search converges monotonically Adaptive Tertiary Search

  35. Each Automaton : Two possible actions : Left Half / Right Half Uses Input from Teacher : LRI Scheme Converges after One Epoch to Left End [1, 0]T Right End  [0, 1]T Cannot Decde (Inside) [0.78, 0.22]T (e.g.). Learning Automata for j

  36. j j Lj Rj Lj Rj * * (b) (a) j Lj Rj * (e) Location ofjConvergence of LRI

  37. For an Stochastic Teacher &The LRI If (* Left ofj) ThenPr (j =Left )  1 If (* Right ofj) ThenPr (j =Right)  1 If (* Insidej) Then Pr (j =Left, Right, Inside)  1 • l* Here ? • 123 Results on Convergence

  38. Conversely, If (j =Left ) ThenPr (* Left ofj)  1 If (j =Right) ThenPr (* Right ofj)  1 If (j =Inside) ThenPr (* Insidej)  1 • l* Here ? • 123 Results on Convergence

  39. 123 Decision Table to Prune j

  40. In the Previous Decision Table Only 7 out of 27 combinations are shown. The Rest Impossible The Decision Table to prune is Complete Pr[ * in the Pruned Interval ]  1 Convergence : Stochastic Teachers

  41. Consequence : Dual Problem If E : Environment w.p. p then, E’ has p’ = 1-p Dual of an Stochastic Teacher (p > 0.5) is a Stochastic Liar (p’ < 0.5) Consequences of Convergence

  42. Converge • Converge • l* Here • 123 Decision Table for Stoch. Liar • How will the Stoch. Liar Teach ? • Left Machine : Converge to Left End • Right Machine : Converge to Right End

  43. Start with an Extended Search Interval ’ = [-1,2) where as  =[0,1) • Converge • Converge • l* Here • [-1, 0)[0, 1] (1,2] Learning from Stochastic Liar • After ONE Epoch • Liar Will Force [-1,0) or (0,2] with Prob. 1. • GOTCHA RED HANDED !!!!! • We Know Environment is Deceptive

  44. KNOW Environment is Deceptive Use the Original interval =[0,1) Treat Go Left as Go Right Go Right as Go Left !!!! Learning from Stochastic Liar

  45. Experimental Results

  46. Convergence of CPL-ATS

  47. Convergence : p=0.1 & p=0.8 - almost identical The former is highly deceptive environment Even in the first epoch ONLY 9% error In two more epochs the error is within 1.5% For p nearer 0.5 the convergence is Sluggish Observations on Results

  48. Can use a combination of LRI and Pruning Scheme is -optimal. Can be applied to Stochastic Teachers and Liars Can detect the nature of Unknown Environment Can learn the parameter correctly w. p.  1 Conclusions • THANK YOU • VERY MUCH

  49. How to Simulate Environment Our idea : analogous to the RPROP network. Dij(t) = -Dij(t-1).h+If *> 0, = -Dij(t-1).h-If *< 0, = Dij(t-1) Otherwise. whereh+andh-are parameters of the scheme. Increments are influenced by the sign of two succeeding derivatives

  50. Experimental Results Figure I : Plot of E[l()] with N.

More Related