1 / 11

RAMP BLUE: Double-Floating Point Coprocessor

RAMP BLUE: Double-Floating Point Coprocessor. Mitch Harwell David Tylman. What is Ramp. R esearch A ccelerator for M ultiple P rocessors With multiple FPGAs on multiple BEE2 boards in single chassis, RAMP is building a massive, parallel multi-processor system. Why Ramp?.

aida
Download Presentation

RAMP BLUE: Double-Floating Point Coprocessor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMP BLUE:Double-Floating Point Coprocessor Mitch Harwell David Tylman

  2. What is Ramp • Research Accelerator for Multiple Processors With multiple FPGAs on multiple BEE2 boards in single chassis, RAMP is building a massive, parallel multi-processor system.

  3. Why Ramp? • We have hit a “Power wall” where Power has become increasingly troublesome, as has the dissipation of heat through the air. Power has become expensive, while transistors are essentially free. • We have reached an “ILP wall” where the law of diminishing returns requires more HW to squeeze out the last ILP from the design. • Along with power we have hit a “Memory wall” where the Memory latencies have become restrictive.(200 clock cycles to DRAM memory, 4 clocks for multiply) • Power Wall + ILP Wall + Memory Wall = Brick Wall • Because traditional Uni-processors will cease to exhibit the performance gains of the last three decades, it is necessary to investigate other means of speeding up computation, but the computer architecture community lacks the basic infrastructure tools required to carry out this research. • RAMP will accelerate research across all the fields that touch multiple processors: operating systems, compilers, debuggers, programming languages, scientific libraries, and so on.

  4. Design Decisions • The interface was chosen for the purpose of minimizing the time spent transferring data over the FSL bus. • No acknowledgements or synchronization structures were used. • We transferred the control necessary to control the FPU over the FSL_Control lines instead of sending a 5th data word. • This works under the assumption that the interface will always expect 4 word-inputs and two word-outputs. • The hardware unit was designed to be as simple as possible. • None of the units are pipelined, and only one functional unit (add/sub, mult, div, sqrt, comp, fx->fl, fl->fx) will be running at a time. • New values are not processed until the old values have completed calculating.

  5. Software Shenanigans • gcc translates floating-point math operations into function calls. The operands broken into 4 32-bit words and sent one at a time over the FSL bus • For each data word, we also transmit a control bit to specify which operation to perform. • We stall the processor until the answer appears on the FSL bus.

  6. Hardware High-jinks

  7. idle write read crunch The Current Design Microblaze FSL

  8. What has been accomplished The software talks to the hardware as is expected. The hardware captures the operands, performs the correct operations, and returns correct results as expected. The software returns the hardware results as expected.

  9. Benchmarks • We ran a FFT benchmark twice. • Once on our DFPU hardware (6 minutes 17 seconds) • Once with software routines (56 minutes 31 seconds)

  10. What remains • Fully-compliant IEEE 754 math units • Multiple processors sharing one DFPU • Pipelined design

More Related