70 likes | 207 Views
This document provides an overview of floating point numbers, emphasizing their ability to represent extremely large and small values with precision. It discusses the computational efficiency of retaining necessary precision and the significant hardware requirements for handling high-precision numbers, such as those exceeding 80 bits. The representation format simplifies comparison operations ((=, ≠, <, ≥), alongside examples of converting decimal values into floating point format using methods like Horner's algorithm. It also covers the IEEE 754 standard, outlining single and double precision formats and their significance in modern computing systems.
E N D
Muddsar JamilCS 147 Floating Point Numbers
Introduction & Representation • Provides the ability to represent very large numbers, as well as very small numbers.Example: 1 Trillion = 40 bits to left of radix pt. • Retaining as much precision as needed increase calculation efficiency. • A great deal of extra hardware is required in order to store/manipulate numbers with 80 bits or more.
Computer Representation of Floating Point Numbers _ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _ This format makes for easier comparison: =, =/=, <=, ≥ Example: Convert (358)10 in to the above format to be used as a floating point number.Java: Float x = new Float(358.0f) Sign Bit0 = +1 = - Three base 16 digits 3-bitexponent
Example Continued • First step is to convert 358 from base 10 to 16. • Using Horner's method: • 358/16 = 22 --- R 6 • 22/16 = 1 --- R 6 • 35810 = 16616 • Next, convert to floating-point and Normalize • (166)10 = (166.)16 x 160 • Normalize: ( .166 )16 x 163 • The exponent is 3, but we represent it in excess 4: • 0 1 1 (+3)10 • Excess 4 + 1 0 0 (+4)10 • = 1 1 1 • 0 1 1 1 . 0 0 0 1 0 1 1 0 0 1 1 0 • + 3 1 6 6 Sign Expon. Fraction
Fractional -> Fixed Point Conversion • Convert (XYZ.375)10 to Binary • First, convert XYZ using Horner's method. • Next, Convert the .37510 as following: • .375 x 2 = 0.75 • .75 x 2 = 1.5 • .5 x 2 = 1.0 • So (.375)10 = (.011)2 Most Significant BitLeast Significant Bit
IEEE 754 Floating Point Standard • Created in 1985 to ensure standard representation among different systems. • Most new architectures support IEEE 754. • Two Formats: • Single Precision 1 8 23 • Double Precision =32 bits total Sign Expon. Fraction =64 bits 1 11 52 Sign Expon. Fraction
IEEE 754 Representations • Can Represent (among others) • Non-zero, normalized numbers • Clean zero • All 0s in exponent and fraction • Sign bit can be 0, or 1, to represent +0 or -0 • Infinity / Overflow / NaN • Exponent contains all 1s, Fraction is all 0s • Sign bit can be 0, or 1 • 0 / 0; • Sqrt(-1);