Decimal Floating-Point Arithmetic

Decimal Floating-Point Arithmetic Dongdong Chen EE800, U of S

Objectives • IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1) • DFP numbers formats • DFP number encoding • DFP arithmetic operations • DFP rounding modes • DFP exception handling EE800, U of S

Objectives (Con.) • Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) • DFP adder/substracter • DFP multiplier • DFP divider • DFP transcendental function computation EE800, U of S

Background The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true? EE800, U of S

Introduction • Decimal is still essential for specific applications • Numbers in commercial databases are decimal • Extensive use decimal in commercial applications • Survey of commercial databases report • Decimal fixed-point or floating-point number • How to process decimal computation • Software computation • Convert back to decimal representation • Problems EE800, U of S

Introduction (Con.) • Errors from decimal and binary conversion • Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.0001 Binary representation: 0.00011… 0.09… • Example 2: telephone billing Cost: 0.70; Tax: 5% BFP arithmetic: 0.6999…8*(1.05)=0.734999… DFP arithmetic: 0.70*(1.05)=0.74 • Decimal integer, fixed-point or floating-point? • Decimal hardware or software solutions? EE800, U of S

Current Researches • DFP arithmetic defined in IEEE 754-2008 • IBM computing systems include DFP hardware • IBM Power6, z9, z10 • Intel include DFP software solution in system • Intel DFP software computation library • DFP arithmetic IP blocks: • Basic DFP arithmetic IPs: DFP adder/substrcter, multiplier, divider, square root etc. • Transcendental DFP arithmetic IPs: DFP CORDIC, Logarithm, antilogarithm, reciprocal etc. EE800, U of S

DFP Arithmetic in IEEE 754-2008 • Review BFP arithmetic in IEEE 754-2008 • How to define new DFP in IEEE 754-2008 EE800, U of S

BFP Floating-point representation • Representation: • sign, exponent, significand (or mantissa): (–1)sign ×significand ×2exponent • more bits for significand gives more accuracy • more bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit significand • double precision: 11 bit exponent, 52 bit significand EE800, U of S

BFP floating-point Number • Leading “1” bit of significand is implicit • Example: if the significand is 011010110…0, the actual significand is 1.011010110…0 • This is called a normalized number; there is exactly one non-zero digit to the left of the point. • Unique representation of a number • We get a little more precision: there are 24 bits in the significand, but only 23 of them are stored. EE800, U of S

Exponent • Exponent is “biased” to make sorting easier • all 0s is smallest exponent, all 1s is largest • The actual exponent is e-127 for single precision, and e-1023 for double precision • Bias of 127 for single precision and 1023 for double precision • By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers. • If e = 1000 0011 (13110), the actual exponent is 131-127=4 • If e = 0101 1101 (9310), the actual exponent is 93-127=-34 EE800, U of S

BFP Floating-Point Formats EE800, U of S

0 1 1 0 BFP Floating-Point Formats (Con.) Positive and negative zero 0000000000000000000000000000000 0 Biased exponent Fraction Positive and negative infinity 1111111100000000000000000000000 ∞ Biased exponent Fraction Positive underflow Negative underflow Negative Overflow Expressible negative numbers Expressible positive numbers Positive Overflow -2-127 0 2-127 - (2 – 2-23)×2128 (2 – 2-23)×2128 exponent = 128 and fraction ≠ 0, It is called “not a number” or NaN EE800, U of S

Example • Summary: FP representation (–1)sign×(1+significand)×2exponent – bias • Example: • decimal: -.75 = -3/4 = -3/22 • binary: -.11 = -1.1 x 2-1 • floating point: exponent = 126 = 01111110 • IEEE single precision: 1 01111110 10000000000000000000000 EE800, U of S

DFP Number Representation • Representation: • sign, exponent, significand (or mantissa): (–1)sign ×significand ×10exponent • more digits for significand gives more accuracy • more bits for exponent increases range representation: • DFP formats: • decimal32: DFP storage format encoded in 32-bit • decimal64: DFP computational format encoded in 64-bit • decimal128: DFP computational format encoded in 128-bit EE800, U of S

DFP Number format • 1-bit Sign (S) is defined as same as BFP format • w+5-bit combination (G) to two subfield: • 5-bit (G0…G4) to encode: 2 MSBs of exponent; 1 MSD of significand; Not-a-Number (NaN); Inf; • W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4, which consists of w+2-bit nonnegative biased exponent. EE800, U of S

DFP Exponent • Exponent is “biased” to make sorting easier • Binary format (not decimal) • The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128 • Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1); EE800, U of S

DFP Number format (Con.) • J×10-bit Trailing Significand (T) Field: • Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) • Binary integer decimal (BID) encoding decimal number encoded by binary integer • Non-normalized decimal significand (-1)0 × 0.00900 × 102 (-1)0 × 0.09000 × 101 • DFP number’s Cohort EE800, U of S

Parameters in DFP Format EE800, U of S

Example • Summary: DFP representation • (–1)sign×(significand)×10exponent-bias • Convert -8.35×10-2to decimal64 • Sign bit: “1” negative, “0” positive (sign 1) • Exponent: -2+398=396 (8-bit “0110001100”) • Significand: 835(50-bit DPD coding “0…00 02 3D”) • Encoding of 5-bit MSBs (G0…G4) of Combinational field “01000” • Decimal-64 : “10100010001100…..00…1000111101” “A2 30 00 00 00 00 02 3D” (binary/hex) EE800, U of S

DFP special values • Not-a-Number: G0…G4 “11111”; • Infinite Number: G0…G4 “11110”, sign of Inf according to the sign bit; • Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|vmax|=(10q - 1)×10emax-q+1) then overflow occurs. • Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1, it produces subnormal. • Normal number: The remaining exponent values and significands represent normal numbers. EE800, U of S

DFP Arithmetic Operations • Basic DFP arithmetic operations • Two decimal-specific DFP operations • SameQuantum(DFP1,DFP2) • Quantize(DFP1,DFP2) • DFP comparison operations • do not distinguish between redundant of the same number • DFP conversion operations • DFP to BFP conversion (correctly rounded); • DFP to integer conversion • Recommended DFP operations EE800, U of S

DFP Number’s Cohort • Non-normalized decimal significand • DFP number’s Cohort • Standard defines the preferred (required) exponent (quantum) • Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation • Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits EE800, U of S

DFP Rounding Modes • Five types of active rounding modes • roundTiesToEven • roundTiesToAway • roundTiesToPositive • roundTiesToNegative • roundTowardZero • Correct rounding and Faithful rounding • IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations • DFP operations should satisfy all rounding modes EE800, U of S

DFP Exception Handling • Invalid operation: Operand is NaN; 0×Inf; quare-root of negative operand; default result is NaN • Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or −inf. • Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. • Underflow operation: if the magnitude of a result is below 10emin. • Inexact: the correctly rounded result of an operation differs from the infinite precision result. EE800, U of S

DFP Addition/Subtraction EE800, U of S

DFP Add/Sub Data flow EE800, U of S

DFP Addition • Step 1: equalize the exponents • add the mantissas only when exponents are the same. • the number with smaller exponent should be shifting its point to the left, and the number with larger exponent should be shifting its point to right. • Rewriting the operand with the smaller exponent could result in a loss of the least significant digits • keep guard digit, round digit, and stick digit for the operand with smaller exponent EE800, U of S

DFP addition • Step 2: add the mantissas 0099999x101 +0016234x10-3 0999990x100 0000016(234)x100 1000006(234) x100 • Step 3: Normalize the result if necessary EE800, U of S

DFP addition • Step 4: Round the number if needed 1000006234x100 =1000006x100 • Step 5: Repeat step 3 if the result is no longer normalized • The final result is 1000006 • The correct answer is 1000006.234 EE800, U of S

Guard bits • To help minimize rounding problems, IEEE specifies that intermediate steps of operations must store guard digits - additional internal digits that increase the precision of the operations. • Previous example: add one extra digit. • IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate. EE800, U of S

DFP add/sub EE800, U of S

General Description: Addition EE800, U of S

Example: Addition EE800, U of S

Example: Addition (Con.) EE800, U of S

DFU: IBM POWER6 and Z10 EE800, U of S

High performance Implementation EE800, U of S

High performance Implementation [12] A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S

Evaluation Results and Comparison [Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S

DFP Multiplication EE800, U of S

Scheme of decimal multiplier x : 1 9 6 3 × y : 8 1 4 5 = xy0: 5x 9 8 1 5 0 0 0 0 0 xy1: 5x 9 8 1 5 −x - 1 9 6 3 xy2 : x 1 9 6 3 0 0 0 0 0 xy3: 10x 1 9 6 3 0 −2x - 3 9 2 6 1 5 9 8 8 6 3 5 EE800, U of S

Partial product generation Generate XYi Yi {1,2,3…7,8,9} XYi is carry save format EE800, U of S

Partial product generation Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit) n-digit radix-10 CSA m-digit radix-10 counter EE800, U of S

Carry Save Adder Tree CSA Tree to Generate Multiplication Result EE800, U of S

Flowchart of DFP Multiplier

Architecture of DFP Multiplier

Exception Detection & Handling • Invalid operation • sNaN (pass significand of sNaN) • 0 x ∞ (produce qNaN with significand 0) • Overflow (and Inexact) • IEIP– SLA > Emax • Increase SLA until all LZs removed • Underflow (and possibly Inexact) • IEIP– SLA < Emin • Decrease SLA until 0, then shift right • Inexact

Implementation Highlights • Leverage operands' LZCs • SC, SLA, and IESIP • Handle NaNs with minimal overhead • No dataflow modification • Coerce multiplicand or multiplier to 1 • Support gradual underflow • No dataflow modification • Simply extend number of iterations • Simple, control-based rounding scheme

Decimal Floating-Point Arithmetic

Decimal Floating-Point Arithmetic

Presentation Transcript

Binary and Floating Point Arithmetic

Floating-Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic Sept. 24, 1998

Decimal Floating Point

Set 16 FLOATING POINT ARITHMETIC

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

Floating Point Arithmetic

Floating-Point Arithmetic

Set 16 FLOATING POINT ARITHMETIC

FLOATING POINT ARITHMETIC

Floating Point Arithmetic February 15, 2001

Chapter 9 Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Integer Arithmetic Floating Point Representation Floating Point Arithmetic

Floating Point Arithmetic Feb 17, 2000

Floating Point Arithmetic – Part I

Floating Point Arithmetic

Floating Point Arithmetic