1 / 7

COM181 Computer Hardware Lecture 1b: Floating Point

COM181 Computer Hardware Lecture 1b: Floating Point. Ian McCrum (see pages 242-258/Chapter 3 in textbook). Fixed Point n umbers using Binary.

asa
Download Presentation

COM181 Computer Hardware Lecture 1b: Floating Point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COM181 Computer HardwareLecture 1b: Floating Point Ian McCrum (see pages 242-258/Chapter 3 in textbook)

  2. Fixed Point numbers using Binary • Could just use weights of ½ , ¼ , 1/8, 1/16 – e.g in an 8 bit word have the 4 bits on the left carry weights of 8421 and the 4 bits on the right carry weights of 2-1, 2-2, 2-3 and 2-4 . • Better to use scientific notation; hence we need to store both a mantissa and an exponent. i.e 3.1428 x 1042 +/- 1.01101101 x 2+/-1010101

  3. IEEE 754 standard for 32 bit floating point. • All numbers must be normalised. A normalised binary number begins with a ‘1’ • You need not store the leading ‘1’ • To accommodate negative numbers you use signed arithmetic for the mantissa (not two’s complement – this makes comparisons easier) • and a “biased scheme” for the exponent. • This allows circuitry to compare IEEE754 floating point numbers easily (though it makes it complicated for humans!)

  4. IEEE754 format • Bit 31 is a sign bit, a ‘1’ means a negative number. • Bits 30-23 carry an 8 bit exponent • Bits 22-0 carry 23 bits of mantissa, which is actually 24 bits since we know the MSB is ‘1’ • There is a 64 bit version (see textbook) • There are a number of binary patterns that cannot be a number, these are useful, e.g to represent infinity or an invalid answer (NaN)

  5. Biased Exponent • By having the 8 bit exponent be 0x00 for the most negative value comparisons are easy • Hence for values of -/+127. You add 127 to give a range of 0x00 to 0xFF. • i.e -1 is stored as (-1+127 = 126 = 0111 1110) • i.e +1 is stored as 1+127 = 128 = 1000 0000 • i.e +128 is stored as 255 = 1111 1111.

  6. Examples (from textbook) -0.75ten = -3/4 =-3/22 = -3 x 2-2 -11 x 2-2 or -1.1 x 2-1 The sign bit is ‘1’ since the mantissa is negative. The exponent is -1 so we store (-1+127=126) The mantissa is just 1 since we discard the leading ‘1’ 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 30………………23 22………………………………………………………. 0 1 bit 8 bits 23 bits

  7. Examples 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sign bit is ‘1’ Exponent is 1000 0001, this is 129, we subtract the 127 bias to get +2 Mantissa is 01000… we add the implied ‘1’ to get 101 = 1 + 0 x ½ + 1 x ¼ = 1.25 Hence - 1.25 x 2+2 = -5.010

More Related