Reconfigurable Computing Lecture 14: Floating Point

CPRE 583Reconfigurable ComputingLecture 14: Fri 10/12/2011(Floating Point) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/

Announcements/Reminders • Project Teams: Form by Monday 10/10 • MP2 due Friday 10/14 • Project Proposal due Friday 10/14 (midnight) • High-level topic, and high plan for execution • I’ll give feedback • Project proposal Class presentation on Wed 10/19 • 5-10 power point slides • I plan to have exams back to you this Friday

Project Grading Breakdown • 50% Final Project Demo • 30% Final Project Report • 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight • 20% Final Project Presentation

Projects Ideas: Relevant conferences • Micro • Super Computing • HPCA • IPDPS • FPL • FPT • FCCM • FPGA • DAC • ICCAD • Reconfig • RTSS • RTAS • ISCA

Projects: Target Timeline • Teams Formed and Topic: Mon 10/10 • Project idea in Power Point 3-5 slides • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • Project team list: Name, Responsibility • High-level Plan/Proposal: Fri 10/14 • Power Point 5-10 slides (presentation to class Wed 10/19) • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Related research papers (if any)

Projects: Target Timeline • Work on projects: 10/19 - 12/9 • Weekly update reports • More information on updates will be given • Presentations: Finals week • Present / Demo what is done at this point • 15-20 minutes (depends on number of projects) • Final write up and Software/Hardware turned in: Day of final (TBD)

Initial Project Proposal Slides (5-10 slides) • Project team list: Name, Responsibility (who is project leader) • Team size: 3-4 (5 case-by-case) • Project idea • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • High-level Plan • Break project into mile stones • Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Research papers related to you project idea

Weekly Project Updates • The current state of your project write up • Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section • The current state of your Final Presentation • Your Initial Project proposal presentation (Due Wed 10/19). Should make for a starting point for you Final presentation • What things are work & not working • What roadblocks are you running into

Common Questions

Overview • Floating Point on FPGAs (Chapter 21.4 and 31) • Why is it viewed as difficult?? • Options for mitigating issues

Floating Point Format (IEEE-754) Single Precision S exp Mantissa 1 8 23 23 Mantissa = b-1 b-2 b-3 ….b-23 = ∑ b-i 2-i i=1 Floating point value = (-1)S * 2(exp-127) * (1.Mantissa) Example: 0 x”80” 110 x”00000” = -1^0 * 2^128-127 * 1.(1/2 + 1/4) = -1^0 * 2^1 * 1.75 = 3.5 Double Precision S exp Mantissa 1 11 52 Floating point value = (-1)S * 2(exp-1023) * (1.Mantissa)

Fixed Point Whole Fractional b-1 b-2 …. b-F bW-1 … b1 b0 Example formats (W.F): 5.5, 10.12, 3.7 Example fixed point 5.5 format: 01010 01100 = 10. 1/4 + 1/8 = 10.375 Compare floating point and fixed point Floating point: 0 x”80” “110” x”00000” = 3.5 10-bit (Format 3.7) Fixed Point for 3.5 = ? 011 1000000

Fixed Point (Addition) Operand 1 Whole Fractional Whole Fractional Operand 2 + Whole Fractional sum

Fixed Point (Addition) 11-bit 4.7 format 0011 111 0000 Operand 1 = 3.875 0001 101 0000 Operand 2 = 1.625 + sum 0101 100 0000 = 5.5 You can use a standard ripple-carry adder!

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” +

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Common exponent (i.e. align binary point) • Make x”80” -> x”7F” or visa-verse?

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Common exponent (i.e. align binary point) • Make x”7F”->x”80”, lose least significant bits of Operand 2 • - Add the difference of x”80” – x“7F” = 1 to x”7F” • - Shift mantissa of Operand 2 by difference to the right. • remember “implicit” 1 of the original mantissa 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” +

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” +

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + Overflow! 1 110 x”00000”

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas • You can’t just overflow mantissa into exponent field • You are actually overflowing the implicit “1” of Operand 1, so you sort of have an implicit “2” (i.e. “10”). 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + Overflow! 1 110 x”00000”

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas • Deal with overflow of Mantissa by normalizing. • Shift mantissa right by 1 (shift a “0” in because of implicit “2”) • Increment exponent by 1 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + 0 x”81” 011 x”00000”

Floating Point (Addition) 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + = 5.5 0 x”81” 011 x”00000” • Add mantissas • Deal with overflow of Mantissa by normalizing. • Shift mantissa right by 1 (shift a “0” in because of implicit “2”) • Increment exponent by 1 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + 0 x”81” 011 x”00000”

Floating Point (Addition): Other concerns Single Precision S exp Mantissa 1 8 23

Fixed Point (Addition): Hardware

Floating Point (Addition): High-level Hardware E1 E0 M0 M1 Difference Greater Than Mux SWAP Shift value Standard Adder from previous slidet Right Shift Add/Sub Priority Encoder Round Denormal? Left Shift value Left Shift Sub/const E M

Floating Point • Both Xilinx and Altera supply floating point soft-cores (which I believe are IEEE-754 compliant). So don’t get too afraid if you need floating point in your class projects • Also there should be floating point open cores that are freely available.

Fixed Point vs. Floating Point • Floating Point advantages: • Application designer does not have to think “much” about the math • Floating point format supports a wide range of numbers (+/- 3x1038 to +/-1x10-38), single precision • If IEEE-754 compliant, then easier to accelerate existing floating point base applications • Floating Point disadvantages • Ease of use at great hardware expense • 32-bit fix point add (~32 DFF + 32 LUTs) • 32-bit single precision floating point add (~250 DFF + 250 LUTs). About 10x more resources, thus 1/10 possible best case parallelism. • Floating point typically needs massive pipeline to achieve high clock rates (i.e. high throughput) • No hard-resouces such as carry-chain to take advantage of

Fixed Point vs. Floating Point • Range example: (using decimal for clarity) • Assume we can only use 3 digit • For fixed point, all 3 digits used for whole part (3.0 format) • For floating point, 2 digits used for mantissa, 1 digit for exponent • What is the largest number you can represent for each? • Precision example: (using decimal for clarity) • For the same format above, represent 125

Mitigating Floating Point Disadvantages • Only support a subset of the IEEE-754 standard • Could use software to off-load special cases • Modify floating point format to support a smaller data type (e.g. 18-bit instead of 32-bit) • Link to Cornell class: • http://instruct1.cit.cornell.edu/courses/ece576/FloatingPoint/index.html • Add hardware support in the FPGA for floating point • Hardcore multipliers: Added by companies early 2000’s • Altera: Hard shared paths for floating point (Stratix-V 2011) • How to get 1-TFLOP throughput on FPGAs article • http://www.eetimes.com/design/programmable-logic/4207687/How-to-achieve-1-trillion-floating-point-operations-per-second-in-an-FPGA • http://www.altera.com/education/webcasts/all/wc-2011-floating-point-dsp-fpga.html

Mitigating Fixed Point Disadvantages (21.4) • Block Floating Point (mitigating range issue) • All data in a block share an exponent • Periodic rescaling • Makes use of Fix-point hardware • Useful in application where data is processed in stages, and a limited value range can be placed on each stage (e.g. FFT)

Next Lecture • Review Mid-term • EDK (XPS) tool overview

Questions/Comments/Concerns • Write down • Main point of lecture • One thing that’s still not quite clear • If everything is clear, then give an example of how to apply something from lecture OR

Lecture Notes Altera App Notes on computing FLOPs for Stratix-III or Stratix-IV Altera old app Notes on floating point add/mult Link to floating point single precision calculator Block (fixed) floating point (build slide explanation example) Number comparing CPU/FPGA/GPU floating point throughput Pre-make showing some examples of Fix Point advantage for: • Representing the precision of a number • And precision for add a convoluted type of number 1M.0001

Lecture Notes Points: Original 286, 386: Not floating point HW Next: Floating point coprocessor (on a separate chip) Next: Floating point on same chip Why carry ripple used over my advanced “high” performing generate-propagate adders (.1 for 4-LUTs vs .4ns for 1 LUT

Reconfigurable Computing Lecture 14: Floating Point

Reconfigurable Computing Lecture 14: Floating Point

Presentation Transcript

Reconfigurable Computing (EN2911X, Fall07) Lecture 15: SystemC (3/3)

Lecture on High Performance Processor Architecture ( CS05162 )

CprE / ComS 583 Reconfigurable Computing

Introduction to Reconfigurable Computing

CPRE 583 Reconfigurable Computing Lecture 20: Wed 11/2/2011 (Compute Models)

Early Work

Reconfigurable Computing for DSP

CPRE 583 Reconfigurable Computing Lecture 1: Wed 8/24/2011 (Course Overview)

FPGA: From Flashing LED to Reconfigurable Computing

A survey on Reconfigurable Computing for Signal Processing Applications

CprE / ComS 583 Reconfigurable Computing

Reconfigurable Computing

ECpE 583 Reconfigurable Computing Lecture 12: Tue 10/2/2008 (MP1 Assign, Projects)

CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583 Reconfigurable Computing

Reconfigurable Computing - Clocks

Reconfigurable Architectures

ECE 697F Reconfigurable Computing Lecture 19 Reconfigurable Coprocessors

Reconfigurable Computing

Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3)

Automatic Generation of Systolic Array Designs For Reconfigurable Computing