1 / 16

Mapping of scalable RDMA protocols to ASIC/FPGA platforms

Mapping of scalable RDMA protocols to ASIC/FPGA platforms. Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA yosefgavriel@computer.org. Presentation Overview. Motivation

blanca
Download Presentation

Mapping of scalable RDMA protocols to ASIC/FPGA platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping of scalable RDMA protocols to ASIC/FPGA platforms Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA yosefgavriel@computer.org

  2. Presentation Overview Motivation TCP Off-loading Zero-copying RDMA protocol RDMA protocol stack Structure of a RDMA card Results Conclusion

  3. Motivation Supercomputer or Server farm Supercomputer or Server farm WAN Terabyte storage Terabyte storage Workstation Enabling high-bandwidth WAN applications

  4. Applications Distributed Command and Control. Signal processing (e.g. RADAR) Sharing of intelligence data real-time. Distributed large scale computation/ simulation of aerospace problems. Extension of storage area networks over a wide area network (WAN). Enabling technology for modern supercomputing installations.

  5. Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1 Traditional TCP/IP Networking Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Router

  6. L3 L2 L1 Standard Data Flow on TCP/IP Application A Memory Space Application B Memory Space WAN/LAN TCP Buffer/Stack Memory Space TCP Buffer/Stack Memory Space L1 L2 L3

  7. Standard Data Flow on TCP/IP • Traditional TCP/IP copies data from application to TCP memory buffer • Leads to CPU lost cycles in buffer copying • CPU gets overwhelmed to rates above 2.5 Gbps • TCP/IP off-loading is a help but it does not solve the problem on the receiver side

  8. Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (Phy) TCP/IP off-load processing Application/O.S. TCP/IP offload Processor (TOE) Mapped to hardware

  9. Zero-copying and TCP offloading processing Host CPU Cache Memory TCP off-load Processor TOE/NIC Card Host CPU Host Main Memory Receive Buffer Network buffer WAN/LAN

  10. Zero-copying and TCP offloading processing • Zero-copying is still not achieved as receiver buffer is still copied back to application memory space • TCP/IP off-loading is not scalable • RDMA protocols provide a solution

  11. RDMA data-flow for WAN applications Host Memory Host Memory Host CPU B Host CPU A Application Memory Space Application Memory Space WAN RDMA NIC Card RDMA NIC Card

  12. Scalable WAN-RDMA for bandwidths above 10 Gbps 10 Gbps links RDMA NIC Card for WAN Tx Buffer PHY Host MAC > 10 Gbps WAN RDMA Engine Rx Buffer DMA channel

  13. The RDMA protocol layers and our prototype Running on Host CPU ULP (e.g. iSCSI, NFS) RDMA DDP MPA SCTP TCP Layer 3 (e.g. IP) Layer 2 (MAC) Layer 1 (PHY) FPGA implementation FPGA and off-the-shelf MAC/PHY chips

  14. PCI-Express/Hyper-transport Interface Overall Hardware/Firmware Organization of the WAN RDMA card IP/Firmware module RDMA Protocol Engine Rx Memory controller Tx Memory controller SCTP Protocol Engine Rx Memory Bank Layer 3 (IP) Processor Rx Memory Bank Data stream split/join unit SAR SAR SAR SAR 10GE/OC-192 framer 10GE/OC-192 framer 10GE/ OC-192 framer 10GE/OC-192 framer PHY PHY PHY PHY

  15. Present Results Currently using Virtex-II/Virtex-IIPro (Xilinx) as target devices for our cores Data indicate that most of the key cores will fit one FPGA device (Virtex-II) Aggregate of all cores is spanning several FPGAs Intra-device communication is a issue, need to be careful with PCB design. We are currently trying to accommodate most of the cores in one FPGA. Most of the cores will be made available free-of-charge to researchers in non-profit or government organizations.

  16. Conclusion Advent of Hyper-transport/ PCI-Express and VITA (embedded computing) standards will enable I/0 bandwidths above 10 Gbps locally Extension of RDMA protocol enables large bandwidths over wide area networks The proposed cores will fulfill the natural growth of bandwidth requirements in commercial/defense/aerospace applications.

More Related