1 / 37

Learning to Trade via Direct Reinforcement

Learning to Trade via Direct Reinforcement. John Moody International Computer Science Institute, Berkeley & J E Moody & Company LLC, Portland Moody@ICSI.Berkeley.Edu John@JEMoody.Com Global Derivatives Trading & Risk Management Paris, May 2008. What is Reinforcement Learning?.

clea
Download Presentation

Learning to Trade via Direct Reinforcement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Trade viaDirect Reinforcement John Moody International Computer Science Institute, Berkeley & J E Moody & Company LLC, Portland Moody@ICSI.Berkeley.Edu John@JEMoody.Com Global Derivatives Trading & Risk Management Paris, May 2008

  2. What is Reinforcement Learning? RL Considers: • A Goal-Directed “Learning” Agent • interacting with an Uncertain Environment • that attempts to maximize Reward / Utility RL is an Active Paradigm: • Agent “Learns” by “Trial & Error” Discovery • Actions result in Reinforcement RL Paradigms: • Value Function Learning (Dynamic Programming) • Direct Reinforcement (Adaptive Control) Global Derivatives Trading & Risk Management – May 2008

  3. I. Why Direct Reinforcement? Direct Reinforcement Learning: Finds predictive structure in financial data Integrates Forecasting w/ Decision Making Balances Risk vs. Reward Incorporates Transaction Costs Discover Trading Strategies! Global Derivatives Trading & Risk Management – May 2008

  4. Optimizing Trades based on Forecasts • Indirect Approach: • Two sets of parameters • Forecast error is not Utility • Forecaster ignores transaction costs • Information bottleneck Global Derivatives Trading & Risk Management – May 2008

  5. Learning to Trade via Direct Reinforcement • Trader Properties: • One set of parameters • A single utility function • U includes transaction costs • Direct mapping from inputs to actions Global Derivatives Trading & Risk Management – May 2008

  6. Direct RL Trader(USD/GBP):ReturnA=15%, SRA=2.3, DDRA=3.3 Global Derivatives Trading & Risk Management – May 2008

  7. II. Direct Reinforcement:Algorithms & Illustrations • Algorithms: • Recurrent Reinforcement Learning (RRL) • Stochastic Direct Reinforcement (SDR) • Illustrations: • Sensitivity to Transaction Costs • Risk-Averse Reinforcement Global Derivatives Trading & Risk Management – May 2008

  8. Learning to Trade via Direct Reinforcement DR Trader: • Recurrent policy (Trading signals, Portfolio weights) • Takes action, Receives reward (Trading Return w/ Transaction Costs) • Causal performance function (Generally path-dependent) • Learn policy by varying GOAL: Maximize performance or marginal performance Global Derivatives Trading & Risk Management – May 2008

  9. Recurrent Reinforcement Learning (RRL)(Moody & Wu 1997) Deterministic gradient (batch): with recursion: Stochastic gradient (on-line): stochastic recursion: Stochastic parameter update (on-line): Constant : adaptive learning. Declining : stochastic approx. Global Derivatives Trading & Risk Management – May 2008

  10. Structure of Traders • Single Asset - Price series - Return series • Traders - Discrete position size - Recurrent policy • Observations: • Full system State is not known • Simple Trading Returns and Profit: • Transaction Costs: represented by . Global Derivatives Trading & Risk Management – May 2008

  11. Risk-Averse Reinforcement:Financial Performance Measures Performance Functions: • Path independent: (Standard Utility Functions) • Path dependent: Performance Ratios: • Sharpe Ratio: • Downside Deviation Ratio: For Learning: • Per-Period Returns: • Marginal Performance: e.g. Differential Sharpe Ratio . Global Derivatives Trading & Risk Management – May 2008

  12. Long / Short Trader SimulationSensitivity to Transaction Costs • Learns from scratch and on-line • Moving average Sharpe Ratio with  = 0.01 Global Derivatives Trading & Risk Management – May 2008

  13. Trader Simulation Transaction Costs vs. Performance 100 Runs; Costs = 0.2%, 0.5%, and 1.0% Sharpe Ratio Trading Frequency Global Derivatives Trading & Risk Management – May 2008

  14. Minimizing Downside Risk:Artificial Price Series w/ Heavy Tails Global Derivatives Trading & Risk Management – May 2008

  15. Comparison of Risk-Averse Traders Underwater Curves Global Derivatives Trading & Risk Management – May 2008

  16. Comparison of Risk-Averse Traders: Draw-Downs Global Derivatives Trading & Risk Management – May 2008

  17. III. Direct Reinforcement vs.Dynamic Programming • Algorithms: • Value Function Method (Q-Learning) • Direct Reinforcement Learning (RRL) • Illustration: • Asset Allocation: S&P 500 & T-Bills • RRL vs. Q-Learning Global Derivatives Trading & Risk Management – May 2008

  18. Value Function Learning Origins: Dynamic Programming Learn “optimal” Q-Function Q: state  action  value Solve Bellman’s Equation Action: “Indirect” Direct Reinforcement Origins: Adaptive Control Learn “good” Policy P P: observations  p(action) Optimize “Policy Gradient” Action: “Direct” RL Paradigms Compared Global Derivatives Trading & Risk Management – May 2008

  19. S&P-500 / T-Bill Asset Allocation:Maximizing the Differential Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

  20. S&P-500: Opening Up the Black Box 85 series: Learned relationships are nonstationary over time Global Derivatives Trading & Risk Management – May 2008

  21. Closing Remarks • Direct Reinforcement Learning: • Discovers Trading Opportunities in Markets • Integrates Forecasting w/ Trading • Maximizes Risk-Adjusted Returns • Optimizes Trading w/ Transaction Costs • Direct Reinforcement Offers Advantages Over: • Trading based on Forecasts (Supervised Learning) • Dynamic Programming RL (Value Function Methods) • Illustrations: • Controlled Simulations • FX Currency Trader • Asset Allocation: S&P 500 vs. Cash Moody@ICSI.Berkeley.Edu & John@JEMoody.Com Global Derivatives Trading & Risk Management – May 2008

  22. Selected References: [1] John Moody and Lizhong Wu. Optimization of trading systems and portfolios. Decision Technologies for Financial Engineering, 1997. [2] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17:441-470, 1998. [3] Jonathan Baxter and Peter L. Bartlett. Direct gradient-based reinforcement learning: Gradient estimation algorithms. 2001. [4] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4):875-889, July 2001. [5] Carl Gold. FX Trading via Recurrent Reinforcement Learning. Proceedings of IEEE CIFEr Conference, Hong Kong, 2003. [6] John Moody, Y. Liu, M. Saffell and K.J. Youn. Stochastic Direct Reinforcement: Application to Simple Games with Recurrence. In Artificial Multiagent Learning, Sean Luke et al. eds, AAAI Press, 2004. Global Derivatives Trading & Risk Management – May 2008

  23. Supplemental Slides • Differential Sharpe Ratio • Portfolio Optimization • Stochastic Direct Reinforcement (SDR) Global Derivatives Trading & Risk Management – May 2008

  24. Maximizing the Sharpe Ratio Sharpe Ratio: Exponential Moving AverageSharpe Ratio: with time scale and Motivation: • EMA Sharpe ratio emphasizes recent patterns; • can be updated incrementally. Global Derivatives Trading & Risk Management – May 2008

  25. Differential Sharpe Ratiofor Adaptive Optimization Expand to first order in : Define Differential Sharpe Ratio as: where Global Derivatives Trading & Risk Management – May 2008

  26. Learning with the Differential SR Evaluate “Marginal Utility” Gradient: Motivation for DSR: • isolates contribution of to (“marginal utility” ); • provides interpretability; • adapts to changing market conditions; • facilitates efficient on-line learning (stochastic optimization). Global Derivatives Trading & Risk Management – May 2008

  27. Trader Simulation Transaction costs vs. Performance 100 runs; Costs = 0.2%, 0.5%, and 1.0% Trading Frequency Cumulative Profit Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

  28. Portfolio Optimization (3 Securities) Global Derivatives Trading & Risk Management – May 2008

  29. Stochastic Direct Reinforcement: Probabilistic Policies Global Derivatives Trading & Risk Management – May 2008

  30. Learning to Trade • Single Asset - Price series - Return series • Trader - Discrete position size - Recurrent policy • Observations: • Full system State is not known • Simple Trading Returns and Profit: • Transaction cost rate . Global Derivatives Trading & Risk Management – May 2008

  31. Why does Reinforcement need Recurrence? Considera learning agent with stochastic policy function whose inputs include recent observations o and actions a : Why should past actions (recurrence) be included? Examples: Games (observations o are opponent’s actions) Trading financial markets In General: Model opponent’s responses o to previous actions a Minimize transaction costs, market impact Recurrence enables discovery of better policies that capture an agent’s impact on the world !! Global Derivatives Trading & Risk Management – May 2008

  32. Stochastic Direct Reinforcement (SDR):Maximize Performance Expected total performance of a sequence of T actions Maximize performance via direct gradient ascent Must evaluate total policy gradient for a policy represented by Global Derivatives Trading & Risk Management – May 2008

  33. Stochastic Direct Reinforcement (SDR):Maximize Performance The goal of SDR is to maximize expected total performance of a sequence of T actions via direct gradient ascent Must evaluate for a policy represented by Notation: The complete history is denoted . is a partial history of length (n,m) . Global Derivatives Trading & Risk Management – May 2008

  34. Stochastic Direct Reinforcement:First Order Recurrent Policy Gradient For first order recurrence (m=1), conditional action probability is given by the policy: The probabilities of current actions depend upon the probabilities of prior actions: The total (recurrent) policy gradient is computed as : with partial (naïve) policy gradient : Global Derivatives Trading & Risk Management – May 2008

  35. SDR Trader Simulationw/ Transaction Costs Global Derivatives Trading & Risk Management – May 2008

  36. Trading Frequency vs. Transaction Costs Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008

  37. Sharpe Ratio vs. Transaction Costs Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008

More Related