1 / 31

Presenter: Alexander Kaganov Supervisor: Paul Chow

Hardware Acceleration of Monte-Carlo Structural Financial Instrument Pricing Using a Gaussian Copula Model. Presenter: Alexander Kaganov Supervisor: Paul Chow. Increasing Computational Requirements (1/3). In recent years the financial industry has seen:

maia
Download Presentation

Presenter: Alexander Kaganov Supervisor: Paul Chow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Acceleration ofMonte-Carlo Structural Financial Instrument Pricing Using a Gaussian Copula Model Presenter: Alexander Kaganov Supervisor: Paul Chow

  2. Increasing Computational Requirements (1/3) In recent years the financial industry has seen: 1. Increasing contract/model complexity • Every year new models are developed • Unavailability of closed-form solution • Necessitate Monte-Carlo pricing

  3. N instruments Y time 3xN instruments 3xY time (at least) Increasing Computational Requirements (2/3) 2. Increasing portfolio sizes • Increase in simple to price instruments • Increase in complex derivate security • CDO issuance has increased from $157 billion in 2004 to $507 billion in 2007 (>3x)¹ ¹ SIFMA

  4. Increasing Computational Requirements (3/3) 3. Ever-present need to make real-time decisions • Market trends can change quickly • Instruments traded electronically 1 ms in Latency is Worth $100 M in Stock Trading Business Value (AMD Analyst Day-26 july 2007)

  5. Coarse-Grain Fine-Grain Typical MC Financial simulation Trends in Financial Monte-Carlo Algorithms 1. Computationally intensive • Accuracy Time2 2. Highly repetitive • A large portion of the calculation time is spent in a small portion of the code • (~90% of the time is spent in ~10% of the code) 3. High degree of coarse and fine-grain parallelism

  6. Collateralized Debt Obligation (CDO)

  7. CDO Problem: • Banks typically hold portfolios with highly volatile assets. Solution: • Sell assets to an outside entity (SPV), which combines the different assets together into one collateral pool • Repackage the pool as CDO tranches. • Sell tranches as form of protection to investors in return for premium payments

  8. CDO Structure/Pricing (1/2) • Each tranche has attachment and detachment points • Losses below attachment point → the tranche is unaffected • Losses above the detachment point → the tranche becomes inactive • Investor premium is paid based on the tranche width minus tranche losses Investors Borrowers Super Senior: 12%-100% Bonds Senior: 6% -12% Collateral Pool Loans (Credit Default Swap) Mezzanine: 3% -6% CDS SPV CDOs Equity: 0% -3% Sponsor (Bank) Tranches

  9. CDO Structure/Pricing (2/2) Mezzanine Tranche: Detachment (6%) • Paid premium based on the full investment Investor Premium Payments Investor Premium Payments • Losses 1/3 of the principal investment. Premium paid based on 2/3 of the original investment 4% Tranche Losses Attachment (3%) CDO Tranche Value = Premium Leg – Default Leg S =tranche thickness si= Premium di= Discount factorLi= Tranche loses at time interval i

  10. 1. Generate: or Li’s Gaussian Copula Model • Monte-Carlo (MC) pricing model • For each path: One-Factor Multi-Factor 2. Compare: 3. Record losses:

  11. Implementation

  12. Multi-Core Architecture • Three portions: Distributor, Pricing cores, and Collector. • Coarse Grain Parallelism: MC paths divided among OFGC cores • All cores have the same input data except for random generator seeds • Data transfer occurs in parallel to calculations • Double Buffering • Minimal required data transfer rate with an external processor of: 11.3MBytes/sec • 1-Lane PCI express- 250 MBytes/sec • Data transfer latency can be hidden

  13. Pricing Core Design Phase 1: Generate Yi Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses

  14. Phase 1 • Four factors accumulated per clock cycle • Accumulation is performed in parallel to Phase 2 execution One-Factor Multi-Factor

  15. Pricing Core Design Phase 1: Generate Yi Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses

  16. Phase 2 • Compare Yi<Φ-1[P(τi<t)]. Record Losses • Fine-grain parallelism: parallelize over time • 8 replicas • More replicas → higher speedup (potentially) • However, large portions of the hardware become underutilized • Pipelined adder latency creates multiple partial sums

  17. Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses Pricing Core Design Phase 1: Generate Yi Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses

  18. Results

  19. One-Factor Utilization* *Utilization for Virtex 5 SX50T chip

  20. Performance: Benchmarks • Credit rating and number of instruments are based on Dow Jones CDX • Notionals obtained from Moody’s, range from $600,000 to $6.6 billion • α: uniformly distributed in [0, 1] • Recovery rate: Normally distributed, N (0.4,0.15) • # of Time Steps: Normally distributed, N (20,10) • # of Systemic Factors: varied from 1 to 16

  21. Processor vs. FPGA setup • 3.4 GHz Intel Xeon Processor • 3GB RAM • C++ program • 100,000 Monte-Carlo paths • Virtex 5 SX50T speed grade -3 • Connected to host through PCI express • 100,000 Monte-Carlo paths

  22. Performance: One-Factor Single Core Results (1/3)

  23. Performance: One-Factor Single Core Results (2/3)

  24. Performance: One-Factor Single Core Results (3/3) Single Core Average Acceleration: Double Precision: 10.8 X Single Precision: 14.0 X Single/Double Hybrid: 13.8 X Integer: 15.8 X

  25. Performance: One-Factor Multi-Core Results • Monte-Carlo paths independence allows for a linear speedup as more pricing cores are incorporated.

  26. Producer: Phase 1 Ratep: 1/SFRF Consumer: Phase 2 Ratec: 1/TRF Performance: Multi-Factor Results (1/3) • SFRF is the number of cycles before Phase 1 generates a new Yi • TRF is the number of cycles before Phase 2 requires a new Yi

  27. Performance: Multi-Factor Results (2/3) Ratec<= Ratep Ratec>Ratep

  28. Performance: Multi-Factor Results (3/3)

  29. Summary • Presented a hardware architecture for pricing Collateralized Debt Obligations using Li’s models • Demonstrating the flexibility of an FPGA in exploiting both coarse- and fine-grain parallelism • Established that either a single/double hybrid or “integer” representations could be used to balance resource utilization and accuracy • Achieved a maximal acceleration of over 64-fold and 71-fold, for the one-factor and multi-factor models, respectively

  30. Future Work • Expand to a multi-FPGA system • Attempt the algorithm on a different accelerator architectures • GPU • 3. Incorporate into a financial software suite

  31. Thank You (Questions?)

More Related