1 / 16

Processor support devices Part 3: Memory management, floating point

Processor support devices Part 3: Memory management, floating point. dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital Information Systems. memory management 16082. CPU 16032. main memory. Without memory management:. 4 clocks. With memory management:.

Download Presentation

Processor support devices Part 3: Memory management, floating point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processor support devicesPart3: Memory management, floating point dr.ir. A.C. VerschuerenEindhoven University of TechnologySection of Digital Information Systems

  2. memory management 16082 CPU 16032 main memory Without memory management: 4 clocks With memory management: 5 clocks! Separate memory management devices

  3. clock address control read (CPU > memory) without data address virtual physical control read (CPU > memory management) with control read (memory management > memory) data Memory management timing problem • Only 25% slower is not bad for a separate Memory Management Unit (MMU) !

  4. Solving the timing problem (1) • MMU must know virtual address before the actual memory cycle starts • Use last clock cycle of memory cycle to transfer virtual address for next memory cycle • Extra hardware to keep address to memory stable • Impossible with single clock (cache) memory cycles • Separate buses for virtual and physical address • Lot of pins on the memory management device !

  5. Solving the timing problem (2) • The best solution is to integrate the MMU with the CPU in a single device • Only physical addresses on the external bus • Easier control of the MMU from the CPU • Memory management can be placed before on-chip caches allowing much higher system speeds!

  6. Memory management: the simple way (1) • Majority of processors are 8‑bit machinesdoesn't make sense to control a VCR with a Pentium! • The processing speed and I/O are adequate • But the memory space has become too small • Marketing wants a lot of functions to be present... • Consumers want an easy to use appliance... • They want 'On-Screen-Display' and 'full menu control' In a lot of different languages - you simply cannot sell anEnglish speaking TV set in Germany or France...

  7. Memory management: the simple way (2) • Standard address size for 8 bit CPU’s is 16 bits • This gives 65536 bytes of memory • A moderately complex TV set control program needs on the order of 1 million bytes, mostly constant data • Simple techniques can be used to achieve this • No protection is necessary (one fixed program runs) • Switching memory spaces can be software controlled • Only the Read-Only Memory needs to be this large !

  8. read ROM address: address Read- Only page address Memory register input page write 'core' register data Windowing to extend the memory space • Windowing logic can be built inside memory chips • Standard stuff for all kinds of (Flash) ROM’s • Can also save a lot of address pins!

  9. 16 bits CPU address 16 12 bits 12 bits 4 bits entries memory address 24 bits ‘Memory mapper’ address extension • The 74LS610 provides 16 windows of 4096 bytes, each of these can select from 4096 of these windows in physical memory (total 16 million bytes!)

  10. Floating point hardware • With minicomputers, floating point operations required a complete cabinet full of hardware Is it worth this amount of trouble? • Software implementation of the basic floating point operations takes 20..400 instructions • Special instructions like 'normalise' needed for high speed • Some programs have parts which execute one floating point operation every 5 to 10 instructions • ’Benchmark' programs are not representative !

  11. The first single chip FP co-processors (1) • Based upon a 16 bit single-chip microcomputer • AMD 9511A for the basic 32 and 64 bit operations • AMD 9512 for 32 bit basic and transcedental (sine, log) operations • Attached to the main processor as I/O device • Command/status transfer by writing/reading ports • Data transfer could be done with DMA

  12. The first single chip FP co-processors (2) • These devices took 1 millisecond for a 64 bits add! • The main processors at that time were 8 bits wide, these co-processors performed floating point operations 2..10 times faster • Their speed is comparable to a low-end minicomputer floating point unit • These devices do NOT use the IEEE standard Simply because it did not yet exist at that time

  13. The IEEE standard Intel 8087 and cousins • The 8087 (at least) influenced the IEEE standard • It was an instruction set extension co-processor20 microseconds for an 80 bits floating point add • The 80387 was much quicker (first with 32 bit bus)'only' 2 microseconds for the same 80 bits addition • The 80486 integrated FP with the main CPU • With the Pentium, FP became ‘pipelined’one FP operation per clock, at > 100 MHz !

  14. The ‘Weitek’ approach FP co-processors • These use the address bus to control operations • Address 0..7: Write/read FP data registers 0..7 • Address 8..15: Add written data to FP registers 0..7 • Address 16..23: Ditto, but subtract from registers • Address 24..31: Ditto, but multiply with registers • Etcetera (just an example, of course) • Speed limit: main processor read/write speed • These can interface with different main processors

  15. Adhering to the IEEE standard • A given set of basic FP operations must be provided • Includes division, square root and FP  Decimal • The precision of operation results is precisely defined • Cutting corners to speed things up is not allowed • Four different rounding modes must be provided • It must be possible to calculate results with less than the maximum precision All these rules have been broken !

  16. Intel’s operating system co-processor • Almost the same pins as their FP co-processor • Contains timers to generate timeouts • An interrupt controllerattached to the timers and 'event' input pins • A 16 KiloByte ROM with the iRMX operating system to be run by the main processor  Fraud: not a co-processor at all!

More Related