1 / 80

810 likes | 1.18k Views

Decimal Floating-Point Arithmetic. Dongdong Chen. Objectives. IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1) DFP numbers formats DFP number encoding DFP arithmetic operations DFP rounding modes DFP exception handling . Objectives (Con.).

Download Presentation
## Decimal Floating-Point Arithmetic

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Decimal Floating-Point Arithmetic**Dongdong Chen EE800, U of S**Objectives**• IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1) • DFP numbers formats • DFP number encoding • DFP arithmetic operations • DFP rounding modes • DFP exception handling EE800, U of S**Objectives (Con.)**• Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) • DFP adder/substracter • DFP multiplier • DFP divider • DFP transcendental function computation EE800, U of S**Background**The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true? EE800, U of S**Introduction**• Decimal is still essential for specific applications • Numbers in commercial databases are decimal • Extensive use decimal in commercial applications • Survey of commercial databases report • Decimal fixed-point or floating-point number • How to process decimal computation • Software computation • Convert back to decimal representation • Problems EE800, U of S**Introduction (Con.)**• Errors from decimal and binary conversion • Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.0001 Binary representation: 0.00011… 0.09… • Example 2: telephone billing Cost: 0.70; Tax: 5% BFP arithmetic: 0.6999…8*(1.05)=0.734999… DFP arithmetic: 0.70*(1.05)=0.74 • Decimal integer, fixed-point or floating-point? • Decimal hardware or software solutions? EE800, U of S**Current Researches**• DFP arithmetic defined in IEEE 754-2008 • IBM computing systems include DFP hardware • IBM Power6, z9, z10 • Intel include DFP software solution in system • Intel DFP software computation library • DFP arithmetic IP blocks: • Basic DFP arithmetic IPs: DFP adder/substrcter, multiplier, divider, square root etc. • Transcendental DFP arithmetic IPs: DFP CORDIC, Logarithm, antilogarithm, reciprocal etc. EE800, U of S**DFP Arithmetic in IEEE 754-2008**• Review BFP arithmetic in IEEE 754-2008 • How to define new DFP in IEEE 754-2008 EE800, U of S**BFP Floating-point representation**• Representation: • sign, exponent, significand (or mantissa): (–1)sign ×significand ×2exponent • more bits for significand gives more accuracy • more bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit significand • double precision: 11 bit exponent, 52 bit significand EE800, U of S**BFP floating-point Number**• Leading “1” bit of significand is implicit • Example: if the significand is 011010110…0, the actual significand is 1.011010110…0 • This is called a normalized number; there is exactly one non-zero digit to the left of the point. • Unique representation of a number • We get a little more precision: there are 24 bits in the significand, but only 23 of them are stored. EE800, U of S**Exponent**• Exponent is “biased” to make sorting easier • all 0s is smallest exponent, all 1s is largest • The actual exponent is e-127 for single precision, and e-1023 for double precision • Bias of 127 for single precision and 1023 for double precision • By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers. • If e = 1000 0011 (13110), the actual exponent is 131-127=4 • If e = 0101 1101 (9310), the actual exponent is 93-127=-34 EE800, U of S**BFP Floating-Point Formats**EE800, U of S**0**1 1 0 BFP Floating-Point Formats (Con.) Positive and negative zero 0000000000000000000000000000000 0 Biased exponent Fraction Positive and negative infinity 1111111100000000000000000000000 ∞ Biased exponent Fraction Positive underflow Negative underflow Negative Overflow Expressible negative numbers Expressible positive numbers Positive Overflow -2-127 0 2-127 - (2 – 2-23)×2128 (2 – 2-23)×2128 exponent = 128 and fraction ≠ 0, It is called “not a number” or NaN EE800, U of S**Example**• Summary: FP representation (–1)sign×(1+significand)×2exponent – bias • Example: • decimal: -.75 = -3/4 = -3/22 • binary: -.11 = -1.1 x 2-1 • floating point: exponent = 126 = 01111110 • IEEE single precision: 1 01111110 10000000000000000000000 EE800, U of S**DFP Number Representation**• Representation: • sign, exponent, significand (or mantissa): (–1)sign ×significand ×10exponent • more digits for significand gives more accuracy • more bits for exponent increases range representation: • DFP formats: • decimal32: DFP storage format encoded in 32-bit • decimal64: DFP computational format encoded in 64-bit • decimal128: DFP computational format encoded in 128-bit EE800, U of S**DFP Number format**• 1-bit Sign (S) is defined as same as BFP format • w+5-bit combination (G) to two subfield: • 5-bit (G0…G4) to encode: 2 MSBs of exponent; 1 MSD of significand; Not-a-Number (NaN); Inf; • W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4, which consists of w+2-bit nonnegative biased exponent. EE800, U of S**DFP Exponent**• Exponent is “biased” to make sorting easier • Binary format (not decimal) • The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128 • Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1); EE800, U of S**DFP Number format (Con.)**• J×10-bit Trailing Significand (T) Field: • Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) • Binary integer decimal (BID) encoding decimal number encoded by binary integer • Non-normalized decimal significand (-1)0 × 0.00900 × 102 (-1)0 × 0.09000 × 101 • DFP number’s Cohort EE800, U of S**Parameters in DFP Format**EE800, U of S**Example**• Summary: DFP representation • (–1)sign×(significand)×10exponent-bias • Convert -8.35×10-2to decimal64 • Sign bit: “1” negative, “0” positive (sign 1) • Exponent: -2+398=396 (8-bit “0110001100”) • Significand: 835(50-bit DPD coding “0…00 02 3D”) • Encoding of 5-bit MSBs (G0…G4) of Combinational field “01000” • Decimal-64 : “10100010001100…..00…1000111101” “A2 30 00 00 00 00 02 3D” (binary/hex) EE800, U of S**DFP special values**• Not-a-Number: G0…G4 “11111”; • Infinite Number: G0…G4 “11110”, sign of Inf according to the sign bit; • Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|vmax|=(10q - 1)×10emax-q+1) then overflow occurs. • Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1, it produces subnormal. • Normal number: The remaining exponent values and significands represent normal numbers. EE800, U of S**DFP Arithmetic Operations**• Basic DFP arithmetic operations • Two decimal-specific DFP operations • SameQuantum(DFP1,DFP2) • Quantize(DFP1,DFP2) • DFP comparison operations • do not distinguish between redundant of the same number • DFP conversion operations • DFP to BFP conversion (correctly rounded); • DFP to integer conversion • Recommended DFP operations EE800, U of S**DFP Arithmetic Operations**• Basic DFP arithmetic operations • Two decimal-specific DFP operations • SameQuantum(DFP1,DFP2) • Quantize(DFP1,DFP2) • DFP comparison operations • do not distinguish between redundant of the same number • DFP conversion operations • DFP to BFP conversion (correctly rounded); • DFP to integer conversion • Recommended DFP operations EE800, U of S**DFP Number’s Cohort**• Non-normalized decimal significand • DFP number’s Cohort • Standard defines the preferred (required) exponent (quantum) • Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation • Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits EE800, U of S**DFP Rounding Modes**• Five types of active rounding modes • roundTiesToEven • roundTiesToAway • roundTiesToPositive • roundTiesToNegative • roundTowardZero • Correct rounding and Faithful rounding • IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations • DFP operations should satisfy all rounding modes EE800, U of S**DFP Exception Handling**• Invalid operation: Operand is NaN; 0×Inf; quare-root of negative operand; default result is NaN • Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or −inf. • Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. • Underflow operation: if the magnitude of a result is below 10emin. • Inexact: the correctly rounded result of an operation differs from the infinite precision result. EE800, U of S**DFP Addition/Subtraction**EE800, U of S**DFP Add/Sub Data flow**EE800, U of S**DFP Addition**• Step 1: equalize the exponents • add the mantissas only when exponents are the same. • the number with smaller exponent should be shifting its point to the left, and the number with larger exponent should be shifting its point to right. • Rewriting the operand with the smaller exponent could result in a loss of the least significant digits • keep guard digit, round digit, and stick digit for the operand with smaller exponent EE800, U of S**DFP addition**• Step 2: add the mantissas 0099999x101 +0016234x10-3 0999990x100 0000016(234)x100 1000006(234) x100 • Step 3: Normalize the result if necessary EE800, U of S**DFP addition**• Step 4: Round the number if needed 1000006234x100 =1000006x100 • Step 5: Repeat step 3 if the result is no longer normalized • The final result is 1000006 • The correct answer is 1000006.234 EE800, U of S**Guard bits**• To help minimize rounding problems, IEEE specifies that intermediate steps of operations must store guard digits - additional internal digits that increase the precision of the operations. • Previous example: add one extra digit. • IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate. EE800, U of S**DFP add/sub**EE800, U of S**General Description: Addition**EE800, U of S**Example: Addition**EE800, U of S**Example: Addition (Con.)**EE800, U of S**DFU: IBM POWER6 and Z10**EE800, U of S**High performance Implementation**EE800, U of S**High performance Implementation**EE800, U of S**High performance Implementation**[12] A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S**Evaluation Results and Comparison**[Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S**DFP Multiplication**EE800, U of S**Scheme of decimal multiplier**x : 1 9 6 3 × y : 8 1 4 5 = xy0: 5x 9 8 1 5 0 0 0 0 0 xy1: 5x 9 8 1 5 −x - 1 9 6 3 xy2 : x 1 9 6 3 0 0 0 0 0 xy3: 10x 1 9 6 3 0 −2x - 3 9 2 6 1 5 9 8 8 6 3 5 EE800, U of S**Partial product generation**Generate XYi Yi {1,2,3…7,8,9} XYi is carry save format EE800, U of S**Partial product generation**Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit) n-digit radix-10 CSA m-digit radix-10 counter EE800, U of S**Carry Save Adder Tree**CSA Tree to Generate Multiplication Result EE800, U of S**Exception Detection & Handling**• Invalid operation • sNaN (pass significand of sNaN) • 0 x ∞ (produce qNaN with significand 0) • Overflow (and Inexact) • IEIP– SLA > Emax • Increase SLA until all LZs removed • Underflow (and possibly Inexact) • IEIP– SLA < Emin • Decrease SLA until 0, then shift right • Inexact**Implementation Highlights**• Leverage operands' LZCs • SC, SLA, and IESIP • Handle NaNs with minimal overhead • No dataflow modification • Coerce multiplicand or multiplier to 1 • Support gradual underflow • No dataflow modification • Simply extend number of iterations • Simple, control-based rounding scheme

More Related