html5-img
1 / 39

A High-Speed Inter-Process Communication Architecture for FPGA-based Hardware Acceleration of Molecular Dynamics

A High-Speed Inter-Process Communication Architecture for FPGA-based Hardware Acceleration of Molecular Dynamics. Presented by: Chris Comis September 23, 2005 Supervisor: Professor Paul Chow. Outline. Motivation System-Level Overview Protocol Development Results

gayle
Download Presentation

A High-Speed Inter-Process Communication Architecture for FPGA-based Hardware Acceleration of Molecular Dynamics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A High-Speed Inter-Process Communication Architecture for FPGA-based Hardware Acceleration of Molecular Dynamics Presented by: Chris Comis September 23, 2005 Supervisor: Professor Paul Chow

  2. Outline • Motivation • System-Level Overview • Protocol Development • Results • Integration into a Programming Model • Conclusions/Questions

  3. What is Molecular Dynamics? • A method of calculating the time-evolution of molecular configurations • Useful in the analysis of protein folding • Many applications in rational drug design

  4. MD is Computationally Challenging • Forces (i.e. F=ma) are calculated between an atom and all other atoms in the system • An O(n2) problem across 10,000+ atoms • Force calculations are performed at femtosecond timesteps • Interesting results may take several μs of simulation (109+ timesteps required) MD simulations are typically run on supercomputers

  5. An FPGA-based MD Accelerator • An ongoing collaborative project involves the development of an FPGA-based MD Accelerator • Advantages to an FPGA-based approach: • Massive parallel computation • Forces can be parallelized • Force computations can be accelerated ~88x • High-speed Serial I/O (SERDES) may be leveraged

  6. Area of Focus • Develop communication protocol using high-speed SERDES links • Requirements: • Reliability • Light-weight • Minimal trip-time for small packets • Must be abstracted at the hardware and software levels

  7. Outline • Motivation • System-Level Overview • Protocol Development • Results • Integration into a Programming Model • Conclusions/Questions

  8. A Partial MD Simulator • Computation blockscan be hardwareor software executedon MicroBlazesoft processors • Software must be writtenusing a programming model Blocks → computationArrows → communication

  9. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model

  10. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model • The model is then adapted for SERDES development

  11. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model • The model is then adapted for SERDES development • Producers and consumer hardware blocks are implemented

  12. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model • The model is then adapted for SERDES development • Producers and consumer hardware blocks are implemented • An FSL (FIFO) is used as an abstracted method of data transport with SERDES logic

  13. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model • The model is then adapted for SERDES development • Producers and consumer hardware blocks are implemented • An FSL is used as an abstracted method of data transport with SERDES logic • An OPB bus interface is added for register access of components

  14. System-Level Overview • The MD simulator is simplified to a Producer/Consumer model • The model is then adapted for SERDES development • Producers and consumer hardware blocks are implemented • An FSL is used as an abstracted method of data transport with SERDES logic • An OPB bus interface is added for register access of components • Deep FIFOs are added for logging high-speed data

  15. Outline • Motivation • System-Level Overview • Protocol Development • Results • Integration into a Programming Model • Conclusions/Questions

  16. Protocol Overview • A synchronous acknowledgement-based protocol was chosen • Simple and predictable • An inherent delay in waiting for acknowledgements • To mask this delay: • Multiple producers are connected to the SERDES interface • The link is time-multiplexed across multiple producers

  17. Protocol Overview • All data has a word width of 4 bytes • Data packets: • Variable size (between 32 and 2016 bytes) • A 32-bit CRC is appended • Acknowledgements: • 8 bytes in size • Can interrupt transmission of data packets

  18. Transmit Logic • Transmitter consists mainly of two components • Dual-port buffers: • The start address of the packet is kept in case a resend is necessary • Scheduler: • Schedules ready packets in a round-robin fashion From Producer via FSL To Scheduler of SERDES Link

  19. Receive Logic • Receiver consists mainly of two components: • Dual-port buffers: • The start address of the packet is kept in case errors occur • Three-stage Dataflow Pipeline: Stage 1: Determine if incoming data is properly formatted Stage 2: Evaluate incoming data against all possible errors Stage 3: Pass results to acknowledgement handler From SERDES Link To Consumer via FSL

  20. Design Effort • Majority of design effort was in error handling: • Transmitter: • Determine which packet combinations corrupt the system • Establish a priority among conflicting packet types • Receiver: • Handle all possible combinations of transmission errors

  21. Outline • Motivation • System-Level Overview • Protocol Development • Results • Integration into a Programming Model • Conclusions/Questions

  22. Test Environment • All SERDES tests performed across a Xilinx Virtex-II Pro XC2VP7 and XC2VP30 series FPGAs • Ribbon cables were used to transfer serial data between non-impedance controlled connectors

  23. Reliability and Sustainability • Verification test environment: • Send data concurrently from three producers to three respective consumers • Pseudo-random packet length • Consumers read from FSL at variable rates • Reliability: • Run this test under extremely poor line conditions • Sustainability: • Run this test under normal line conditions for a long period of time

  24. Reliability • Reliability: 128-second Test Results

  25. Sustainability • Sustainability: 8-hour Test Results

  26. Comparison Against Other Communication Mechanisms • Two configurations are used • Configuration A: Saturate the channel with packets • Configuration B: Loop-back test • Compare against: • Simple FPGA-based 100BaseT Ethernet • TCP/IP FPGA-based 100BaseT Ethernet • TCP/IP Cluster-based Gigabit Ethernet

  27. Throughput Results

  28. One-way Trip Time Results

  29. Area Consumption • Each SERDES Interface takes approximately 8% of a Xilinx XC2VP30 • Debug logic substantially increases area consumption: • FF usage increases 68% • LUT usage increases 43%

  30. Outline • Motivation • System-Level Overview • Protocol Development • Results • Integration into a Programming Model • Conclusions/Questions

  31. Integration into a Programming Model • Hardware abstraction: FSL • Software abstraction: An MPI-based Programming Model • Modified MPI_Send and MPI_Recv function calls while (1) { MPI_Send(data_outgoing, 64, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(data_incoming, 64, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); }

  32. Integration into a Programming Model • Replaced producers and consumers with a MicroBlaze processor • Several communication scenarios were tested

  33. Outline • Motivation • System-Level Overview • Protocol Development • Results • Incorporation into a Programming Model • Conclusions/Questions

  34. Conclusions • Final Results: • Reliable and sustainable • Abstracted at the software and hardware level • 2074 FFs and 2244 LUTs required for SERDES logic only • Given a channel rate of 2.5Gbps, maximum bidirectional throughput of 1.928Gbps • Minimum packet trip-time of 1.23μs

  35. Acknowledgements Y. Gu, T. VanCourt, M. C. Herbordt, FPGA Acceleration of Molecular Dynamics Computations, To appear: Proceedings of Field Programmable Logic and Applications, August 2005. • Professor Régis Pomès, Chris Madill • Professor Paul Chow, Professor C.Y. Chen, Lesley Shannon, Arun Patel, Manuel Saldaña, David Chui, Sam Lee, Andrew House,, Nathalie Chan, Lorne Applebaum, Patrick Akl References

  36. Transmitter Packet Collision Handling • Packets are enclosed by 8B/10B control characters (K-characters) • The type of packet is distinguished by the K-characters used • Certain combinations of control characters cannot be nested • Clock correction has priority over acknowledgement • Acknowledgement cannot interrupt the end of a data packet • Clock correction must avoid the beginning and end of a data packet

  37. Receiver Error Handling • All combinations of errors at the receiver are handled correctly • Data errors (CRC errors) • Disparity errors or invalid characters (soft errors) • Errors in framing (frame errors) • Channel failures (hard errors) • Lost acknowledgements/repeat packets • Receiver buffers full

  38. Test Configuration A • Send data concurrently from three producers to three respective consumers • Producers write to FSL as fast as possible • Consumers read from FSL as fast as possible • Analyze best-case throughput results

  39. Test Configuration B • Send data from a producer to a consumer • Delay a packet write from a producer until a packet has been completely received by the consumer on the same FPGA • A communication loop results that determines round-trip trip time (and therefore one-way trip time)

More Related