Loading in 2 Seconds...

CPRE 583 Reconfigurable Computing Lecture 14: Fri 10/12/2011 (Floating Point)

Loading in 2 Seconds...

0 Views

Download Presentation
##### CPRE 583 Reconfigurable Computing Lecture 14: Fri 10/12/2011 (Floating Point)

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**CPRE 583Reconfigurable ComputingLecture 14: Fri**10/12/2011(Floating Point) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/**Announcements/Reminders**• Project Teams: Form by Monday 10/10 • MP2 due Friday 10/14 • Project Proposal due Friday 10/14 (midnight) • High-level topic, and high plan for execution • I’ll give feedback • Project proposal Class presentation on Wed 10/19 • 5-10 power point slides • I plan to have exams back to you this Friday**Project Grading Breakdown**• 50% Final Project Demo • 30% Final Project Report • 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight • 20% Final Project Presentation**Projects Ideas: Relevant conferences**• Micro • Super Computing • HPCA • IPDPS • FPL • FPT • FCCM • FPGA • DAC • ICCAD • Reconfig • RTSS • RTAS • ISCA**Projects: Target Timeline**• Teams Formed and Topic: Mon 10/10 • Project idea in Power Point 3-5 slides • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • Project team list: Name, Responsibility • High-level Plan/Proposal: Fri 10/14 • Power Point 5-10 slides (presentation to class Wed 10/19) • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Related research papers (if any)**Projects: Target Timeline**• Work on projects: 10/19 - 12/9 • Weekly update reports • More information on updates will be given • Presentations: Finals week • Present / Demo what is done at this point • 15-20 minutes (depends on number of projects) • Final write up and Software/Hardware turned in: Day of final (TBD)**Initial Project Proposal Slides (5-10 slides)**• Project team list: Name, Responsibility (who is project leader) • Team size: 3-4 (5 case-by-case) • Project idea • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • High-level Plan • Break project into mile stones • Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Research papers related to you project idea**Weekly Project Updates**• The current state of your project write up • Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section • The current state of your Final Presentation • Your Initial Project proposal presentation (Due Wed 10/19). Should make for a starting point for you Final presentation • What things are work & not working • What roadblocks are you running into**Overview**• Floating Point on FPGAs (Chapter 21.4 and 31) • Why is it viewed as difficult?? • Options for mitigating issues**Floating Point Format (IEEE-754)**Single Precision S exp Mantissa 1 8 23 23 Mantissa = b-1 b-2 b-3 ….b-23 = ∑ b-i 2-i i=1 Floating point value = (-1)S * 2(exp-127) * (1.Mantissa) Example: 0 x”80” 110 x”00000” = -1^0 * 2^128-127 * 1.(1/2 + 1/4) = -1^0 * 2^1 * 1.75 = 3.5 Double Precision S exp Mantissa 1 11 52 Floating point value = (-1)S * 2(exp-1023) * (1.Mantissa)**Fixed Point**Whole Fractional b-1 b-2 …. b-F bW-1 … b1 b0 Example formats (W.F): 5.5, 10.12, 3.7 Example fixed point 5.5 format: 01010 01100 = 10. 1/4 + 1/8 = 10.375 Compare floating point and fixed point Floating point: 0 x”80” “110” x”00000” = 3.5 10-bit (Format 3.7) Fixed Point for 3.5 = ? 011 1000000**Fixed Point (Addition)**Operand 1 Whole Fractional Whole Fractional Operand 2 + Whole Fractional sum**Fixed Point (Addition)**11-bit 4.7 format 0011 111 0000 Operand 1 = 3.875 0001 101 0000 Operand 2 = 1.625 + sum 0101 100 0000 = 5.5 You can use a standard ripple-carry adder!**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” +**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Common exponent (i.e. align binary point) • Make x”80” -> x”7F” or visa-verse?**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Common exponent (i.e. align binary point) • Make x”7F”->x”80”, lose least significant bits of Operand 2 • - Add the difference of x”80” – x“7F” = 1 to x”7F” • - Shift mantissa of Operand 2 by difference to the right. • remember “implicit” 1 of the original mantissa 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” +**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” +**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + Overflow! 1 110 x”00000”**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas • You can’t just overflow mantissa into exponent field • You are actually overflowing the implicit “1” of Operand 1, so you sort of have an implicit “2” (i.e. “10”). 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + Overflow! 1 110 x”00000”**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + • Add mantissas • Deal with overflow of Mantissa by normalizing. • Shift mantissa right by 1 (shift a “0” in because of implicit “2”) • Increment exponent by 1 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + 0 x”81” 011 x”00000”**Floating Point (Addition)**0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”7F” 101 x”00000” + = 5.5 0 x”81” 011 x”00000” • Add mantissas • Deal with overflow of Mantissa by normalizing. • Shift mantissa right by 1 (shift a “0” in because of implicit “2”) • Increment exponent by 1 0 x”80” 111 x”80000” Operand 1 = 3.875 Operand 2 = 1.625 0 x”80” 110 x”80000” + 0 x”81” 011 x”00000”**Floating Point (Addition): Other concerns**Single Precision S exp Mantissa 1 8 23**Floating Point (Addition): High-level Hardware**E1 E0 M0 M1 Difference Greater Than Mux SWAP Shift value Standard Adder from previous slidet Right Shift Add/Sub Priority Encoder Round Denormal? Left Shift value Left Shift Sub/const E M**Floating Point**• Both Xilinx and Altera supply floating point soft-cores (which I believe are IEEE-754 compliant). So don’t get too afraid if you need floating point in your class projects • Also there should be floating point open cores that are freely available.**Fixed Point vs. Floating Point**• Floating Point advantages: • Application designer does not have to think “much” about the math • Floating point format supports a wide range of numbers (+/- 3x1038 to +/-1x10-38), single precision • If IEEE-754 compliant, then easier to accelerate existing floating point base applications • Floating Point disadvantages • Ease of use at great hardware expense • 32-bit fix point add (~32 DFF + 32 LUTs) • 32-bit single precision floating point add (~250 DFF + 250 LUTs). About 10x more resources, thus 1/10 possible best case parallelism. • Floating point typically needs massive pipeline to achieve high clock rates (i.e. high throughput) • No hard-resouces such as carry-chain to take advantage of**Fixed Point vs. Floating Point**• Range example: (using decimal for clarity) • Assume we can only use 3 digit • For fixed point, all 3 digits used for whole part (3.0 format) • For floating point, 2 digits used for mantissa, 1 digit for exponent • What is the largest number you can represent for each? • Precision example: (using decimal for clarity) • For the same format above, represent 125**Mitigating Floating Point Disadvantages**• Only support a subset of the IEEE-754 standard • Could use software to off-load special cases • Modify floating point format to support a smaller data type (e.g. 18-bit instead of 32-bit) • Link to Cornell class: • http://instruct1.cit.cornell.edu/courses/ece576/FloatingPoint/index.html • Add hardware support in the FPGA for floating point • Hardcore multipliers: Added by companies early 2000’s • Altera: Hard shared paths for floating point (Stratix-V 2011) • How to get 1-TFLOP throughput on FPGAs article • http://www.eetimes.com/design/programmable-logic/4207687/How-to-achieve-1-trillion-floating-point-operations-per-second-in-an-FPGA • http://www.altera.com/education/webcasts/all/wc-2011-floating-point-dsp-fpga.html**Mitigating Fixed Point Disadvantages (21.4)**• Block Floating Point (mitigating range issue) • All data in a block share an exponent • Periodic rescaling • Makes use of Fix-point hardware • Useful in application where data is processed in stages, and a limited value range can be placed on each stage (e.g. FFT)**Next Lecture**• Review Mid-term • EDK (XPS) tool overview**Questions/Comments/Concerns**• Write down • Main point of lecture • One thing that’s still not quite clear • If everything is clear, then give an example of how to apply something from lecture OR**Lecture Notes**Altera App Notes on computing FLOPs for Stratix-III or Stratix-IV Altera old app Notes on floating point add/mult Link to floating point single precision calculator Block (fixed) floating point (build slide explanation example) Number comparing CPU/FPGA/GPU floating point throughput Pre-make showing some examples of Fix Point advantage for: • Representing the precision of a number • And precision for add a convoluted type of number 1M.0001**Lecture Notes**Points: Original 286, 386: Not floating point HW Next: Floating point coprocessor (on a separate chip) Next: Floating point on same chip Why carry ripple used over my advanced “high” performing generate-propagate adders (.1 for 4-LUTs vs .4ns for 1 LUT