Floating Point Numbers in Computer Architecture

Floating Point Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter)

Fractions • Similar to what we’re used to with decimal numbers

Converting decimal to binary II • 98.61 • Integer part • 98 / 2 = 49 remainder 0 • 49 / 2 = 24 remainder 1 • 24 / 2 = 12 remainder 0 • 12 / 2 = 6 remainder 0 • 6 / 2 = 3 remainder 0 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1100010

Converting decimal to binary III • 98.61 • Fractional part • 0.61  2 = 1.22 • 0.22  2 = 0.44 • 0.44  2 = 0.88 • 0.88  2 = 1.76 • 0.76  2 = 1.52 • 0.52  2 = 1.04 • .100111

Another Example (Whole number part) • 123.456 • Integer part • 123 / 2 = 61 remainder 1 • 61 / 2 = 30 remainder 1 • 30 / 2 = 15 remainder 0 • 15 / 2 = 7 remainder 1 • 7 / 2 = 3 remainder 1 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1111011

Checking: Find Calculator on menu

Put the calculator in Programmer view

Enter number (in Decimal), read off binary or put into binary mode if you want to use copy/Paste

Another Example (fractional part) • 123.456 • Fractional part • 0.456  2 = 0.912 • 0.912  2 = 1.824 • 0.824  2 = 1.648 • 0.648  2 = 1.296 • 0.296  2 = 0.592 • 0.592  2 = 1.184 • 0.184  2 = 0.368 • … • .0111010…

Convert to decimal mode, then

Ctrl-C to copy the displayed number. Switch to Scientific View. Ctrl-V to paste

Divide by 2 raised to the number of digits (in this case 7, including leading zero) 1 2

Divide by 2 raised to the number of digits (in this case 7, including leading zero) 3 4

Finally hit the equal sign. In most cases it will not be exact

Other way around • Multiply fraction by 2 raised to the desired number of digits in the fractional part. For example • .456  27 = 58.368 • Throw away the fractional part and represent the whole number • 58 111010 • But note that we specified 7 digits and the result above uses only 6. Therefore we need to put in the leading 0 • 0111010

Fixed point • If one has a set number of bits reserved for representing the whole number part and another set number of bits reserved for representing the fractional part of a number, then one is said to be using fixed point representation. • The point dividing whole number from fraction has an unchanging (fixed) place in the number.

Limits of the fixed point approach • Suppose you use 4 bits for the whole number part and 4 bits for the fractional part (ignoring sign for now). • The largest number would be 1111.1111 = 15.9375 • The smallest, non-zero number would be 0000.0001 = .0625

Floating point representation • Floating point representation allows one to represent a wider range of numbers using the same number of bits. • It is like scientific notation.

Scientific notation • Used to represent very large and very small numbers. • Ex. Avogadro’s number •  6.0221367  1023 particles •  602213670000000000000000 • Ex. Fundamental charge e •  1.60217733  10-19 C •  0.000000000000000000160217733 C

Scientific notation: all of these are the same number • 12345.6789 = 1234.56789  100 • 1234.56789  10 = 1234.56789  101 • 123.456789  100 =123.456789  102 • 12.3456789  103 • 1.23456789  104 • Rule: Shift the point to the left and increment the power of ten.

Small numbers • 0.000001234 • 0.00001234  10-1 • 0.0001234  10-2 • 0.001234  10-3 • 0.01234  10-4 • 0.1234  10-5 • 1.234  10-6 • Rule: shift point to the right and decrement the power.

IEEE 754 standards • The standards for floating point numbers are known as IEEE 754. • Starting with the fixed point binary representation, shift the point and increase the power (of 2 now that we’re in binary). • Like Scientific Notation, shift so that the number has one non-zero whole number digit (not 0 hence a 1) and the remainder are fractional bits.

Floats (98.61) • SHIFT expression so it is between 1 and 2 and keep track of the number of shifts • 1100010.10011100001010001 • 1.10001010011100001010001  26 • Express the number of shifts in binary • 1.10001010011100001010001  200000110 We’re not done yet so this exponent will change.

Mantissa and Exponent and Sign • 1.10001010011100001010001  200000110 • (Significand) Mantissa • 1.10001010011100001010001  200000110 • Exponent • +1.10001010011100001010001  200000110 • The number may be negative, so there a bit (the sign bit) reserved to indicate whether the number is positive or negative

Small numbers • 0.000010101110 • 1.0101110  2-5 • The power (a.k.a. the exponent) could be negative so we have to be able to deal with that. • Floating point numbers use a procedure known as biasing to handle the negative exponent problem.

Biasing • Actually the exponent is not represented as shown previously. • There were 8 bits used to represent the exponent on the previous slide, that means there are 256 numbers that could be represented. • Since the exponent could be negative (to represent numbers less than 1), we choose roughly half of the range to be positive and half to be negative .

Biasing (Cont.) • In biasing, one does notuse 2’s complement or a sign bit. • Instead one adds a bias (equal to the magnitude of the most negative number) to the exponents and represents the result of that addition.

Biasing (Cont.) • The exponents of all 1’s is reserved for special purposes – as is the exponent of all 0’s. • Thus with 8 bits, the bias is 127 (= 27 -1 that is 2 raised to the number of bits used for the exponent minus one). • In our previous example, we had to shift 6 times to the left, corresponding to an exponent of +6. • We add that shift to the bias 127+6=133. • That is the number we put in the exponent portion: 133  10000101.

Big floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the largest float? • Mantissa: 1111 Exponent 1111 • 0.9375  27 • =120 • (Compare this to the largest fixed-point number using the same amount of space 15.9375)

Small floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the smallest float? • Mantissa: 1000 Exponent 0000 • 0.5  2-8 • = 0.001953125 • (Compare this to the smallest fixed-point number using the same amount of space .0625)

Mantissa Storage • 1.10001010011100001010001  200000110 • (Significand) Mantissa • Our rules have use starting with 1.something (there are a few exceptions). • The standards come from a time when storage was “expensive” – so why store a digit that is always 1? So the standard does not store the 1 – it is implied.

The pieces • One bit for a sign • Eight bits for an exponent – biased by 127 • Twenty-three digits for the mantissa – which does not include the implied 1 • +98.61 • Sign: 0 • Exponent: 1000 0101 • Mantissa: 1000 1010 0111 0000 1010 001

https://www.h-schmidt.net/FloatConverter/IEEE754.html

Adding Floats • Consider adding the following numbers expressed in scientific notation 3.456789  103 1.212121  10-2 • The first step is to re-express the number with the smaller magnitude so that it has the same exponent as the other number.

Adding Floats (Cont.) • 1.212121  10-2 • 0.1212121  10-1 • 0.01212121  100 • 0.001212121  101 • 0.0001212121  102 • 0.00001212121  103 • The number was shifted 5 times (3-(-2)).

Adding Floats (Cont.) • When the exponents are equal the mantissas can be added. 3.456789  103 0.00001212121  103 • =3.45680112121  103

Rounding • In a computer there are a finite number of bits used to represent a number. • When the smaller floating-point number is shifted to make the exponents equal, some of the less significant bits are lost. • This loss of information (precision) is known as rounding.

One more fine point about floating-point representation • As discussed so far, the mantissa (significand) always starts with a 1. • When storage was expensive, designers opted not to represent this bit, since it is always 1. • It had to be inserted for various operations on the number (adding, multiplying, etc.), but it did not have to be stored.

Still another fine point • When we assume that the mantissa must start with a 1, we lose 0. • Zero is too important a number to lose, so we interpret the mantissa of all zeros and exponent of all zeros as zero • Even though ordinarily we would assume the mantissa started with a one that we didn’t store.

Yet another fine point • In the IEEE 754 format for floats, you bias by one less (127) and reserve the exponents 00000000 and 11111111 for special purposes. • One of these special purposes is “Not a number” (NaN). • Another in “Infinity” which is the floating point version of overflow.

An example • Represent -9087.8735 as a float using 23 bits for the mantissa, 8 for the exponent and one for the sign. • The float stores 23 bits but there is an implied bit, so we will talk about 24. • Convert the whole number magnitude 9087 to binary: 10 0011 0111 1111 • That uses up 14 of the 24 bits for the mantissa (23 stored), leaving 10 for the fractional part.

An example (Cont.) • Multiply the fractional part by 210 and convert whole number part of that to binary, make sure in uses 9 bits (add leading 0’s if it doesn’t). • .8735  210 = 894.464 • 894  1101111110

An example (Cont.) • 10001101111111.1101111110 • 1.00011011111111101111110  213 • Mantissa (1)00011011111111101111110 • Exponent 13+127=140  10001100 • Sign bit 1 (because number was negative) • The actual order is sign-exponent-mantissa

Check

Example 2 • 0.0076534 • No whole number part. Begin by using all 24 (sic) mantissa bits for the fractional part. • 0.0076534  224 = 128402.7449344 • 128402  11111010110010010 • Only uses 17 places, means that so far number starts with 7 zeros. But float mantissas are supposed to start with 1. • .000000011111010110010010 • 1.1111010110010010×2-8 • (But we need more digits for our mantissa)

Example 2 (Cont.) 24+7 • 0.0076534  231 = 16435551.3516032 • 16435551  1111 1010 1100 1001 0101 1111 • Above is mantissa • Exponent 127 – 8 = 119  01110111 • Sign bit 0 (positive number)

Check

Reverse • 10000111101010001100100111110101 • 10000111101010001100100111110101 • Sign bit is one number is negative • Exponent 00001111 15  15-127 (unbias)  -112 • Mantissa: 1.01010001100100111110101

Reverse (Cont) • 1.01010001100100111110101 × 223 / 223 • 101010001100100111110101 / 223 • 11061749 / 223

Reverse (Cont). • -1.31866323947906494140625*2^(-112)

Floating Point Numbers in Computer Architecture

Floating Point Numbers in Computer Architecture

Presentation Transcript

Floating Point

CHAPTER 5: Floating Point Numbers

Floating Point Numbers

Fixed-point and floating-point numbers

Lecture 6. Fixed and Floating Point Numbers

Floating point numbers in Python

IEEE Floating Point Numbers Overview

Ch. 2 Floating Point Numbers

4. Floating Point Numbers

What do floating-point numbers represent?

Data Representation: Floating Point for Real Numbers

Floating Point Numbers

Chapter 3d: Floating-Point Numbers

Floating Point

Fixed and Floating Point Numbers

Floating Point Numbers

Floating Point

Programmable Logic Circuits: Floating-Point Numbers

Floating point numbers

More ALUs and floating point numbers

Floating point numbers

Floating point numbers Intel FP Processor