RAMP BLUE: Double-Floating Point Coprocessor

RAMP BLUE:Double-Floating Point Coprocessor Mitch Harwell David Tylman

What is Ramp • Research Accelerator for Multiple Processors With multiple FPGAs on multiple BEE2 boards in single chassis, RAMP is building a massive, parallel multi-processor system.

Why Ramp? • We have hit a “Power wall” where Power has become increasingly troublesome, as has the dissipation of heat through the air. Power has become expensive, while transistors are essentially free. • We have reached an “ILP wall” where the law of diminishing returns requires more HW to squeeze out the last ILP from the design. • Along with power we have hit a “Memory wall” where the Memory latencies have become restrictive.(200 clock cycles to DRAM memory, 4 clocks for multiply) • Power Wall + ILP Wall + Memory Wall = Brick Wall • Because traditional Uni-processors will cease to exhibit the performance gains of the last three decades, it is necessary to investigate other means of speeding up computation, but the computer architecture community lacks the basic infrastructure tools required to carry out this research. • RAMP will accelerate research across all the fields that touch multiple processors: operating systems, compilers, debuggers, programming languages, scientific libraries, and so on.

Design Decisions • The interface was chosen for the purpose of minimizing the time spent transferring data over the FSL bus. • No acknowledgements or synchronization structures were used. • We transferred the control necessary to control the FPU over the FSL_Control lines instead of sending a 5th data word. • This works under the assumption that the interface will always expect 4 word-inputs and two word-outputs. • The hardware unit was designed to be as simple as possible. • None of the units are pipelined, and only one functional unit (add/sub, mult, div, sqrt, comp, fx->fl, fl->fx) will be running at a time. • New values are not processed until the old values have completed calculating.

Software Shenanigans • gcc translates floating-point math operations into function calls. The operands broken into 4 32-bit words and sent one at a time over the FSL bus • For each data word, we also transmit a control bit to specify which operation to perform. • We stall the processor until the answer appears on the FSL bus.

Hardware High-jinks

idle write read crunch The Current Design Microblaze FSL

What has been accomplished The software talks to the hardware as is expected. The hardware captures the operands, performs the correct operations, and returns correct results as expected. The software returns the hardware results as expected.

Benchmarks • We ran a FFT benchmark twice. • Once on our DFPU hardware (6 minutes 17 seconds) • Once with software routines (56 minutes 31 seconds)

What remains • Fully-compliant IEEE 754 math units • Multiple processors sharing one DFPU • Pipelined design

RAMP BLUE: Double-Floating Point Coprocessor

RAMP BLUE: Double-Floating Point Coprocessor

Presentation Transcript

Floating Car Data Projects Worldwide: A Selective Review

Double Patenting Simplified

Double Marginalization

Chapter 03

Decimal Floating-Point Arithmetic

Double Outlet Right Ventricle

POWER5

An Operator’s View on Deepwater Floating Systems and Technology Development

Rapid Mobile Phone-based (RAMP) survey

Chapter 13 Blue Cross Blue Shield

32 point font 28 point font 24 point font 20 point font 18 point font 16 point font 14 point font

Red vs. Blue?

ATS Peer Observations Ramp Agents

CS4100: 計算機結構 Computer Arithmetic

DATA REPRESENTATION

Number Representation Part 2 Floating Point Representations Rounding

Monday, October 11 Assignment(s) due: Assignment #7: IEEE FLOATING POINT FORMATS

CS4100: 計算機結構 Computer Arithmetic

Chapter 03

Return of Thailand's floating basket festival