Joseph Schneider February 23, 2010

Joseph Schneider February 23, 2010 Bridge Floating-Point Fused Multiply-Add DesignBy Eric Quinnell, Earl E Swartzlander Jr., and Carl Lemonds

Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction Faster, more precise than using two consecutive instructions with standard multiplier and adder Can perform standard addition and multiplication with appropriate constants Fused Multiply-Add

Performing standard addition and multiplication suffers greater latencies than when using a standard adder or multiplier When using an FMA instead, can’t perform addition and multiplication in parallel Fused Multiply-add

Goal: To design architecture between FADD and FMUL units. Reuse components to minimize area and power consumption Allow both standard operations and the FMA functionality fused multiply-add

Floating-point units all assume double-precision (64-bit) IEEE-754 standard format format

Compare adder standalone, multiplier standalone, FMA standalone, and the FMA bridge Compared on basis of latency, area, and power Basis of comparison

(A x B) + C A and B multiplied while C is aligned based on exponent difference Carry-save adder implemented Result is rounded- only once as opposed to two roundings necessary for performing the equation in two operations FMA architecture

Follows same architecture of FMA, only reusing parts from FADD and FMUL as appropriate From FMUL, uses multiplier array. From FADD, uses rounding unit. In this method, FADD and FMUL can be used individually or in parallel, while the FMA is used only when needed. Clock-gating used to ensure bridge is only powered when needed Bridge FMA

Same as a standard unit, only with additional outputs from multiplier array leading to FMA Round element shut down via clock-gating when performing an FMA operation FMUL

Uses Farmwald dual-path FADD design; Two paths available based on exponent difference of inputs Multiplexer used to select between paths for rounding unit now include option for FMA input In this manner, FMA uses FADD’s rounding unit FADD

End result, Bridge FMA hardware is essentially the original FMA hardware, only without the multiplier array and rounding unit. Bridge FMA

FMUL, FADD, FMA, and Bridge FMA all implemented in Verilog Uses AMD 65-nm silicon-on-insulator design set results

Bridge architecture 30%-70% faster than FMA architecture when performing FADD or FMUL instructions with significant savings in power consumption Also allows for an FADD and FMUL instruction in parallel, further improving speed 12% performance gain when executing FMA instruction over consecutive operations on individual FADD and FMUL. results

Takes 40% more area to include Bridge FMA with FADD and FMUL Unit 60% increase in power for FMA instruction over consecutive FADD and FMUL instructions in worst case conditions Increased latency and power over standalone FMA unit Results

Joseph Schneider February 23, 2010

Joseph Schneider February 23, 2010

Presentation Transcript

2010 Update PORTFOLIO COMMITTEE 23 February 2010

February 23 , 2007

LAPL Status Report February 23, 2010

PCT Update February 23, 2010

Evolution or Revolution? 23 rd February 2010

NSCS Meeting February 23, 2010

FEBRUARY 23, 2010

Source : Schneider et al. (2010)

February 23, 2010 Organization Day 2

Jens Johansen, ETF 23 February 2010, Cairo

TREEBREEDEX SEMINAR, Bucharest 23-25 February 2010

Quiz February 23

February 23, 2010

February 23, 2010

Margarita Ilieva, Trier, 23 February 2010 bghelsinki

National MANCOM Meeting February 23, 2010

23 February 2010

February 23, 2010

Nicola Moxey MED 6490 February 23, 2010

February 23, 2010 Brussels

Creativity PLC February 23, 2010

APEX Schedule February 23-24, 2010