1 / 23

Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs. Zoltán Nagy, Péter Szolgay. Introduction. Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) Ocean modeling Results Conclusions. Cellular Neural/Nonlinear Networks (CNN). 2 or N dimensional grid

ivan-dudley
Download Presentation

Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zoltán Nagy, Péter Szolgay

  2. Introduction • Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) • Ocean modeling • Results • Conclusions 2

  3. Cellular Neural/Nonlinear Networks (CNN) • 2 or N dimensional grid • Locally connected • Analog processing elements • State value is continuous in time 3

  4. uij input xij state yij output zij constant bias Aij,kl feedback template Bij,kl feed-forward template Structure of a CNN cell 4

  5. CNN-UM implementations • Software simulation • Easy to implement • Slow, even if using processor specific instructions • Emulated digital VLSI • Specialized digital architecture • Selectable computing precision (Castle architecture: 1, 6, 12 bit) • Orders faster than the software simulation • Long design time • Analog VLSI • Huge computing power (~TeraOP/s) • Low accuracy (7-8 bit) • Noise and temperature sensitivity 5

  6. Structure of the Falcon emulated digital CNN-UM • Mixer • Contains cell values for the next updates • Memory unit • Contains a belt of the cell array • Template memory • Arithmetic unit • Processors can be connected on a grid • Linear speedup 6

  7. Structure of the arithmetic unit • Cell update in row wise order • Cycle time depends on template size • Fully pipelined 7

  8. Configurable parameters • State, template and constant width between 2 to 64 bits • Number of templates • Size of the templates • Width of the cell array slice • Number of layers • Number and arrangement of the processor cores 8

  9. The Wave equation Spatial discretization 2 layer CNN Example: Solution of a simple PDE on CNN 9

  10. Barotropic model Baroclinic models z-coordinate model σ-coordinate model isopycnal Fine resolution models Real-time forecast Fishing industry Search and rescue Coarse resolution models Long term predictions Climate modeling Ocean models 10

  11. Sigma coordinate model Vertical coordinate is scaled on the water column depth Second moment turbulence closure sub-model Provides vertical mixing coefficients Solution technique: Mode splitting Internal mode (3D) Vertical structure equations Implicit solution External mode (2D) Vertically integrated equations Explicit solution (Leapfrog method) The Princeton Ocean Model (POM) 11

  12. ux, uy mass transport η free surface elevation Ω angular rotation of the Earth Θ latitude H depth of the ocean g gravitational acceleration τw, τb wind and bottom stress A lateral viscosity Governing equations of the external (2D) mode 12

  13. Solution on CNN • Spatial discretization on a uniform grid • 3-layer CNN structure • Non-linear template required for advection term • Cannot be solved on analog VLSI CNN chips • Solvable on the modified Falcon architecture • Support of non-linearity • Specialized cell model 13

  14. The modified arithmetic unit of the Falcon architecture 14

  15. Complicated arithmetic unit Fixed-point number representation Configurable precision High level hardware description language required(e.g.Handel-C) Implementation on FPGA 15

  16. Performance 16

  17. The Seamount problem 17

  18. Results after 72 hours Circulation pattern Elevation 18

  19. Error of the solution 19

  20. Error of the solution 20

  21. Memory requirements of the internal (3D) equations • Extended memory hierarchy • New level stores 3 cross sectional slices from the 3D array • Large memory required (e.g. 512x512x64 sized grid, 3x512x64 elements per state variable) • Cannot be stored on-chip • Off-chip storage requires huge I/O bandwidth • Processor array should be used • The 3D array is divided between the processors • Optimal data set for on chip storage: 2048 elements per cross sectional slice (512x32x64 sized grid per processor) • Each processor located on a separate FPGA 21

  22. Solution of the internal (3D) equations • Implicit solution • Fixed-point solution • Requires large precision to avoid rounding errors • Seems to be impractical • Floating-point solution • Requires large area (especially add/sub) • Explicit solution • Smaller timestep • Simpler arithmetic unit 22

  23. Conclusions • Ocean modeling using emulated digital CNN is very promising • Moderate precision is required in 2D mode • 1% accuracy using 24 bits • Expected speedup (compared to an Athlon64 2GHz microprocessor) • 80 times on our RC200 prototyping board • 3700 times on the largest available FPGA 23

More Related