Floating Point Numbers

Muddsar JamilCS 147 Floating Point Numbers

Introduction & Representation • Provides the ability to represent very large numbers, as well as very small numbers.Example: 1 Trillion = 40 bits to left of radix pt. • Retaining as much precision as needed increase calculation efficiency. • A great deal of extra hardware is required in order to store/manipulate numbers with 80 bits or more.

Computer Representation of Floating Point Numbers _ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _ This format makes for easier comparison: =, =/=, <=, ≥ Example: Convert (358)10 in to the above format to be used as a floating point number.Java: Float x = new Float(358.0f)‏ Sign Bit0 = +1 = - Three base 16 digits 3-bitexponent

Example Continued • First step is to convert 358 from base 10 to 16. • Using Horner's method: • 358/16 = 22 --- R 6 • 22/16 = 1 --- R 6 • 35810 = 16616 • Next, convert to floating-point and Normalize • (166)10 = (166.)16 x 160 • Normalize: ( .166 )16 x 163 • The exponent is 3, but we represent it in excess 4: • 0 1 1 (+3)10 • Excess 4 + 1 0 0 (+4)10 • = 1 1 1 • 0 1 1 1 . 0 0 0 1 0 1 1 0 0 1 1 0 • + 3 1 6 6 Sign Expon. Fraction

Fractional -> Fixed Point Conversion • Convert (XYZ.375)10 to Binary • First, convert XYZ using Horner's method. • Next, Convert the .37510 as following: • .375 x 2 = 0.75 • .75 x 2 = 1.5 • .5 x 2 = 1.0 • So (.375)10 = (.011)2 Most Significant BitLeast Significant Bit

IEEE 754 Floating Point Standard • Created in 1985 to ensure standard representation among different systems. • Most new architectures support IEEE 754. • Two Formats: • Single Precision 1 8 23 • Double Precision =32 bits total Sign Expon. Fraction =64 bits 1 11 52 Sign Expon. Fraction

IEEE 754 Representations • Can Represent (among others)‏ • Non-zero, normalized numbers • Clean zero • All 0s in exponent and fraction • Sign bit can be 0, or 1, to represent +0 or -0 • Infinity / Overflow / NaN • Exponent contains all 1s, Fraction is all 0s • Sign bit can be 0, or 1 • 0 / 0; • Sqrt(-1);

Floating Point Numbers