1 / 46

470 likes | 999 Views

ITEC 1000 “Introduction to Information Technology”. Lecture 5. Floating Point Numbers. www.governmentauctions.org . Prof. Peter Khaiter. Lecture Template:. Floating Point Numbers Exponential Notation Excess-50 Notation Overflow and Underflow Floating Point Calculations

Download Presentation
## Lecture 5

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**ITEC 1000 “Introduction to Information Technology”**Lecture 5 Floating Point Numbers www.governmentauctions.org Prof. Peter Khaiter**Lecture Template:**• Floating Point Numbers • Exponential Notation • Excess-50 Notation • Overflow and Underflow • Floating Point Calculations • Normalization in Floating Point • IEEE 754 Standard • Packed Decimal Format • Programming Considerations**Floating Point Numbers**• Real numbers • Used in computer when the number • is outside the integer range of the computer (too large or too small) • contains a decimal fraction • the range in PC’s: • r • or more**Exponential Notation**• The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 100 123.4 x 101 12.34 x 102 1.234 x 103 0.1234 x 104 The representations differ in that the decimal place – the “point” -- “floats” to the left or right (with the appropriate adjustment in the exponent).**Exponential Notation**• Also called scientific notation • 4 specifications required for a number • Sign (“+” in example) • Magnitude or mantissa (12345) • Sign of the exponent (“+” in 105) • Magnitude of the exponent (5) • Plus • Base of the exponent (10) • Location of decimal point (or other base) radix point**Exponent**Sign ofexponent Mantissa Sign ofmantissa Location ofdecimal point Base Parts of a Floating Point Number -0.9876 x 10-3**Floating Point Format Specification**• Integer format (8-bit word) • 7 decimal digits and a sign • Range: -9,999,999 < I < +9,999,999 • Floating point format (8-bit word)**Format**• Mantissa: stored in sign-magnitude format • Assume decimal point located at the beginning of mantissa • Exponent stored in Excess-N notation: Complementary notation • Pick middle value as offset where N is the middle value: 0..99 e.g., excess-50**Excess-50 notation**• Excess-N representation: R = N + EE • Example1: N = 50, EE = 38, R = 88 • Example2: N = 50, EE = -38, R = 12 • Excess-50: Magnitude range**Overflow and Underflow**• Possible for the number to be too large or too small for representation 0.00001 x 10-50 = 10-55**Floating Point Format: Excess-50**• First digit represents the sign of mantissa • 0 is used as a “+“sign • 5 is used as a “-“sign (arbitrarily) • Two next digits represent exponent in excess-50 • Five last digits represent mantissa • fixed decimal point located at the beginning**Normalization**• Shift numbers left by increasing the exponent until leading zeros eliminated • Converting decimal number into standard format • Provide number with exponent (0 if not yet specified) • Increase/decrease exponent to shift decimal point to proper position • Decrease exponent to eliminate leading zeros on mantissa • Correct precision by adding 0’s or discarding/rounding least significant digits**Example 1: 246.8035**Sign Excess-50 exponent Mantissa**Floating Point Calculations**• Addition and subtraction • Exponent and mantissa treated separately • Exponents of numbers must agree • Align decimal points • Least significant digits may be lost • Mantissa overflow requires exponent again shifted right**Example**Precision lost**Multiplication and Division**• Mantissas: multiplied or divided • Exponents: added or subtracted • Normalization necessary to • Restore location of decimal point • Maintain precision of the result • Adjust excess value since added twice • Example: 2 numbers with exponent = 53 represented in excess-50 notation • 53 + 53 =106 • Since 50 added twice, subtract: 106 – 50 =56 • Maintaining precision: • Normalizing and rounding multiplication**Floating Point in the Computer**• Replace digits with “0” and “1” bits • Typical floating point format • 32 bits provide range ~10-38 to 10+38 • 8-bit exponent = 256 levels • Excess-128 notation • 23 bits of mantissa: approximately 7 decimal digits of precision**IEEE 754 Standard**• Most common standard for representing floating point numbers • Single precision: 32 bits, consisting of... • Sign bit (1 bit) • Exponent (8 bits) • Mantissa (23 bits) • Double precision: 64 bits, consisting of… • Sign bit (1 bit) • Exponent (11 bits) • Mantissa (52 bits)**Mantissa (23 bits)**Exponent (8 bits) Sign of mantissa (1 bit) Single Precision Format 32 bits**Mantissa (52 bits)**Exponent (11 bits) Sign of mantissa (1 bit) Double Precision Format 64 bits**IEEE 754 Standard**• 32-bit Floating Point Value Definition**Normalization in Floating Point**• Mantissa: • Must always start with “1” • Leading bit is not stored • Implied that it is located to the left of the binary point • Normalized Form: 1.MMMMMMM… • E.g.: • Mantissa: • Actual value: • Exponent • Formatted using Excess-127 notation • Base 2 is implied • Range: 2-126 to 2127 10100000000000000000000 1.1012 = 1.62510**Excess Notation: Example**Represent exponent of 1410 in excess-127 form: 12710 = + 011111112 1410 = + 000011102 Representation = 100011012 14110**Excess Notation: Example**Represent exponent of -810 in excess 127 form: 12710 = + 011111112 - 810 = -000010002 Representation =011101112 11910**1.112 = 1.7510**130 – 127 = 3 0 = positive mantissa +1.75 23 = 14.0 or +1.112 23 = +1110.0 =14 Single Precision: Example 0 10000010 11000000000000000000000**Single Precision: Exercise**• What decimal value is represented by the following 32-bit floating point number? • Answer: 1 10000010 11110110000000000000000 Skip answer Answer**Single Precision: Exercise**Answer • What decimal value is represented by the following 32-bit floating point number? • Answer: -15.6875 1 10000010 11110110000000000000000**Step by Step Solution**1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 1 + .5 + .25 + .125 + .0625 + 0 + .015625 + .0078125 1.9609375 23 = 15.6875 * - 15.6875 ( negative )**Step by Step Solution : Alternative Method**1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 Shift “Point” 1111.10110000000000000000000 - 15.6875 ( negative )**Exercise: Floating Point Conversion**• Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) Skip answer Answer**Exercise: Floating Point Conversion**Answer • Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) 0 10000000 10010001111000000000000**Detail Solution : 3.14 to IEEE double precision**3.14 To Binary (approx): 11.001000111101 Delete implied left-most “1” and normalize 1001000111101 Prove ! Exponent = 127 + 1 position point moved when normalized 10000000 Value is positive: Sign bit = 0 0 10000000 10010001111010000000000**Packed Decimal Format**• Limited use: e.g: where precision particularly important, as in accounting and business functions. • Similar to BCD: e.g: four bit representation, as in BCD. • -> Stores two digits per byte. • Supported by business-oriented languages like COBOL • Implemented in IBM System 370/390 and Compaq Alpha**Packed Decimal Format**• Each decimal digit is stored in BCD • Two digits in a byte • The most significant digit – stored first, in the high-order bits of the first byte • Can store up to 31 digits in 16 bytes • The sign is stored in the low-order bits of the last byte • Binary 1100 represents “+” • Binary 1101 represents “-” • Binary 1111 represents unsigned number • Decimal point not stored: must be maintained by application software**Packed Decimal Format: Example 1**Decimal Value: 1 0 3 5 7, unsigned Packed Decimal: 0001 0000 0011 0101 0111 1111 Byte 1 Byte 2 Byte 3**Packed Decimal Format: Example 2**Decimal Value: - 9 0 4 1 3 Packed Decimal: 1001 0000 0100 0001 0011 1101 Byte 1 Byte 2 Byte3**Integer vs. Floating Point: Programming Considerations**• Integer advantages • Easier for computer to perform • Potential for higher precision • Faster to execute • Fewer storage locations to save time and space • Most high-level languages provide 2 or more different integer word sizes/formats: • Short integer (16 bits) • Long integer (64 bits)**Integer vs. Floating Point: Programming Considerations**• Real numbers, if: • Variable or constant has fractional part • Numbers take on very large or very small values outside integer range • Program should use least precision sufficient for the task • Higher precision formats require more storage • Packed decimal attractive alternative for business applications**Thank you!**Reading: Lecture slides and notes, Chapter 5

More Related