220 likes | 574 Views
Floating Point. CS 147 Peter Budiono. Agenda. History Basic Terms General representation of floating point Constructing a simple floating point representation Floating Point Arithmetic The IEEE-754 Floating-Point Standard Range, Precision, and Accuracy. History.
E N D
Floating Point CS 147Peter Budiono
Agenda • History • Basic Terms • General representation of floating point • Constructing a simple floating point representation • Floating Point Arithmetic • The IEEE-754 Floating-Point Standard • Range, Precision, and Accuracy
History • The first floating point representation was firstly used in “V1” machine (1945). It had 7-bit exponent, 16-bit mantissa, and a sign bit. • In 1954, floating point representation was used by IBM for the modern computing system. • In 1962, the UNIVAC 1100/2200 series was introduced. It contains single precision and double precision.
Basic Terms • Scientific notation: A notation that renders numbers with a single digit to the left of the decimal point. • Normalized: A number in floating-point notation that has no leading 0s. • Floating point: Computer arithmetic that represents numbers in which the binary point is not fixed. • Fraction: The value, between 0 and 1, placed in the fraction field of the floating point. • Exponent: In the numerical representation system of floating-point arithmetic, the value that is placed in the exponent field.
Constructing a simple floating point representation • We will use 14-bit model: 1 sign bit, 5-bit exponent, and 8-bit significand. • For example, storing a decimal number 17 into this model. • In decimal we can say, 17 = 0.17 x 10^2 • But, in order to construct a floating point representation we have to convert it into binary.
17 (decimal) = 10001 ( binary) • 10001 = 0.10001 x 2^5 • Then, we can now construct its representation sign field: 0 : positive value 1 : negative value
What if we want to store a negative exponent value? • The previous example can’t handle this problem, thus we could fix that by using biased exponent. • For example, if we want to store 0.25, we will have 0.1 x 2^-1 • We can fix this by using excess-16 representation. So that we add 16 to the negative exponent (-1 + 16 = 15).
We don’t have a unique representation for each number. Another problem using this method = 17
Remedy • This problem can be fixed by normalization. • Normalization is a convention that the leftmost bit of the significand must always be 1. So that we only have for decimal value 17.
Floating Point Arithmetic • Addition 11.001000 0.10011010 11.10111010
Multiplication • T5t5t55tttttttttttttttttttttttttttttttttttttttttttt = 0.11001000 x 2^2 • = 0.10011010 x 2^0 • 0.11001000 x 0.10011010 = 0.0111100001010000 • 2^2 x 2^0 = 2^2
Some other problems in floating point arithmetic • Division by zero. • Overflow, if the result is greater in magnitude than the given storage. • Underflow, if the result is smaller in magnitude than the given storage.
The IEEE-754 Floating-Point Standard • This was first introduced in 1985. • This type of floating point includes two formats: single precision and double precision.
The standard defines: • arithmetic formats: sets of binary and decimal floating-point data, which consist of finite numbers, (including negative zero and subnormal numbers), infinities, and special 'not a number' values. • interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form • rounding algorithms: methods to be used for rounding numbers during arithmetic and conversions • operations: arithmetic and other operations on arithmetic formats • exception handling: indications of exceptional conditions (such as division by zero, overflow, underflow, etc.)
Single Precision IEEE-754 • This representation uses an excess-127 • This representation assumes an implied 1 to the left of the radix point, for example we put 1 = 1.0 x 2^(0+127)
Double Precision IEEE-754 • This representation uses an excess-1023 • This representation assumes an implied 1 to the left of the radix point, for example we put 1 = 1.0 x 2^(0+127). (same as the single precision)
Range, Precision, and Accuracy • Range In double precision, for example, we have Negative Expressible Negative Negative Positive Expressible Positive Positive Overflow Number Underflow Underflow Numbers Overflow -1.0 x 10^308 -1.0 x 10^-308 0 1.0 x 10^-308 1.0 x 10^308
Accuracy how close a number is to its true value for example, we can’t represent 0.1 in floating point, but we can still find a number in the range that relatively close to 0.1 • Precision how much information we have about a value and the amount of information used to represent the value for example, 1.666 (4 decimal digits of precision) and 1.6660 (5 decimal digits of precision). Thus, the first number is more accurate than the second one.
References • Wikipedia: http://en.wikipedia.org/wiki/IEEE_754 • Books: Computer Organization and Design by Patterson, D Computer Organization and Architecture by Null, Linda