1 / 18

Floating Point

Floating Point. Representation and Arithmetic (see Patterson Chapter 4). Outline. Review of floating point scientific notation Floating point binary IEEE Floating Point Standard Addition in Floating Point Remarks about multiplication. Floating Point Notation. Decimal

olathe
Download Presentation

Floating Point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Floating Point Representation and Arithmetic (see Patterson Chapter 4)

  2. Outline • Review of floating point scientific notation • Floating point binary • IEEE Floating Point Standard • Addition in Floating Point • Remarks about multiplication

  3. Floating Point Notation • Decimal • 12.4568ten (decimal notation) means • 10*1 + 2 + 4/10 + 5/100 + 6/1000 + 8/10000 • In scientific notation • 12.4568 = • 124568 * 10-4 = 1245680 * 10-5 = • 12456.8 * 10-3 = 1245.68 * 10-2 = • 124.568 * 10-1 =12.4568 * 100 • 1.24568 * 101 • 1.24568*101 is an example of normalised scientific notation.

  4. Floating Point in Binary • Binary • 0.010011two = (0/2) + (1/22) + (0/24) +(1/25) + (1/26) • 0 + 1/4 + 0 + 1/32 + 1/64 = • (0.25 + 0.03125 + 0.015625)ten = • 0.296875ten • In scientific notation • 10011*2-6 = 1001.1*2-5 = • 100.11*2-4 • 1.0011*2-2normalised

  5. Normalised Notation • In normalised binary scientific notation • unless the number is 0 • always have 1.sssssss...sss * 2E • sss...sss is the significand • E is the exponent • The significand s1s2...sn represents

  6. Representation • Note that it is impossible to exactly represent all decimal numbers in this way (eg 0.3) • Problem of representation of floating point numbers in fixed word length • need to represent • sign • significand • exponent • in one word (32 bits).

  7. 31 22 30 23 0 sign bit S exponent 8 bits E significand: 23 bits F Representation • Represents floating point number: • (-1)S * (1.0+F) * 2E • S is 1 bit (if S=1 then negative) • F is 23 bits • E is 8 bits

  8. Squeezing out More from the Bits • Since every non-zero binary f.p. number (normalised) is of the form: • 1.sss...sss *2E • We do not have to represent explicitly the 1 in the word, and can therefore interpret the bit-pattern as: • (-1)S (1 + significand) * 2E • thus ‘reclaiming’ an extra bit! • E= 0000 0000 is reserved for zero.

  9. Requirements • As far as possible the ALU should be able to reuse integer machinery in implementation of f.p. • Eg, comparison with zero • easy because of sign bit • fp numbers can be easily classified as negative, zero or positive without additional hardware. • Comparison of two fp numbers x<y not so straightforward - • how are negative exponents to be formed?

  10. 0 1111 1111 0000.... 0000 significand S E 0 0000 0001 0000.... 0000 significand S E Bad Example: (1/2) > 2 ??? • Representation of 1/2 is • 0.1two = 1.0*2-1 (normalised) • Representation of 2 is • 10two = 1.0*21 (normalised)

  11. 1111 1111 1111 1110 ....... 0111 1111 0111 1110 ... 0000 0000 Representation of Exponent • Inappropriate to use two’s complement for the exponent • Ideally want 0000 0000 to represent most negative number, 1111 1111 most positive. • Number range: positive use this for 20 negative 0111 1111 = 127ten

  12. Biased Representation(IEEE FP Standard) • The ‘bias’ 127 represents 0 • 128 to 255 represent positive exponents • 1 to 127 represent negative exponents • (remember 0 is reserved for the entire number being zero). • The actual exponent is therefore: • E - bias • (-1)S * (1 + significand) * 2E-bias

  13. Example 1 • Represent 0.3125ten = 5/16 • 5/16 = 1/4 + 1/16 = 0.0101two = 1.01*2-2 • S = 0 • E = ??? • -2 = E-bias = E-127 • E = 125ten = 0111 1101two • Significand = 010.…000 • 0 0111 1101 010000...000

  14. Example 2 • What does • 0 0111 1101 010000...000 • represent? • S = 0 • E = 0111 1101 = 125ten • Exponent = E-bias = 125-127 = -2 • Significand = 1/4 • (-1)S(1+sig.)2E-bias = (1 + 1/4)*(1/4) = 5/16

  15. Addition of FP Numbers • Given two numbers: • normalise them both • adjust the floating point of the smaller number to match the larger one • Add them together • renormalise • check for underflow/overflow of exponent • if so then break; • round significand to required number of bits • might need renormalisation (eg, 11111 round to 4 bits).

  16. Addition Example • 0.5 + 2.75 = 3.25 • 0.1two + 10.11two • 1.0*2-1 + 1.011*21 • 0.010*21 + 1.011*21 • 1.101*21 (already normalised) • (1 + (1/2) + (1/8)) * 2 • 3.25

  17. Remarks • The IEEE FP standard represents floats in 32 bits, higher precision represented across two words (doubles). • Multiplication is relatively easy, since the exponents add, and the significands can be done with integer multiplication. • There can be huge pitfalls in reliably transferring floating point code to different hardware!

  18. Summary • FP scientific notation • normalised representation in binary • Bias to represent -ve to +ve range in exponent • Addition • Notice how a 32-bit binary string can represent many different entities in memory. • Memory architectures NEXT.

More Related