Processor support devices Part 3: Memory management, floating point

Processor support devicesPart3: Memory management, floating point dr.ir. A.C. VerschuerenEindhoven University of TechnologySection of Digital Information Systems

memory management 16082 CPU 16032 main memory Without memory management: 4 clocks With memory management: 5 clocks! Separate memory management devices

clock address control read (CPU > memory) without data address virtual physical control read (CPU > memory management) with control read (memory management > memory) data Memory management timing problem • Only 25% slower is not bad for a separate Memory Management Unit (MMU) !

Solving the timing problem (1) • MMU must know virtual address before the actual memory cycle starts • Use last clock cycle of memory cycle to transfer virtual address for next memory cycle • Extra hardware to keep address to memory stable • Impossible with single clock (cache) memory cycles • Separate buses for virtual and physical address • Lot of pins on the memory management device !

Solving the timing problem (2) • The best solution is to integrate the MMU with the CPU in a single device • Only physical addresses on the external bus • Easier control of the MMU from the CPU • Memory management can be placed before on-chip caches allowing much higher system speeds!

Memory management: the simple way (1) • Majority of processors are 8‑bit machinesdoesn't make sense to control a VCR with a Pentium! • The processing speed and I/O are adequate • But the memory space has become too small • Marketing wants a lot of functions to be present... • Consumers want an easy to use appliance... • They want 'On-Screen-Display' and 'full menu control' In a lot of different languages - you simply cannot sell anEnglish speaking TV set in Germany or France...

Memory management: the simple way (2) • Standard address size for 8 bit CPU’s is 16 bits • This gives 65536 bytes of memory • A moderately complex TV set control program needs on the order of 1 million bytes, mostly constant data • Simple techniques can be used to achieve this • No protection is necessary (one fixed program runs) • Switching memory spaces can be software controlled • Only the Read-Only Memory needs to be this large !

read ROM address: address Read- Only page address Memory register input page write 'core' register data Windowing to extend the memory space • Windowing logic can be built inside memory chips • Standard stuff for all kinds of (Flash) ROM’s • Can also save a lot of address pins!

16 bits CPU address 16 12 bits 12 bits 4 bits entries memory address 24 bits ‘Memory mapper’ address extension • The 74LS610 provides 16 windows of 4096 bytes, each of these can select from 4096 of these windows in physical memory (total 16 million bytes!)

Floating point hardware • With minicomputers, floating point operations required a complete cabinet full of hardware Is it worth this amount of trouble? • Software implementation of the basic floating point operations takes 20..400 instructions • Special instructions like 'normalise' needed for high speed • Some programs have parts which execute one floating point operation every 5 to 10 instructions • ’Benchmark' programs are not representative !

The first single chip FP co-processors (1) • Based upon a 16 bit single-chip microcomputer • AMD 9511A for the basic 32 and 64 bit operations • AMD 9512 for 32 bit basic and transcedental (sine, log) operations • Attached to the main processor as I/O device • Command/status transfer by writing/reading ports • Data transfer could be done with DMA

The first single chip FP co-processors (2) • These devices took 1 millisecond for a 64 bits add! • The main processors at that time were 8 bits wide, these co-processors performed floating point operations 2..10 times faster • Their speed is comparable to a low-end minicomputer floating point unit • These devices do NOT use the IEEE standard Simply because it did not yet exist at that time

The IEEE standard Intel 8087 and cousins • The 8087 (at least) influenced the IEEE standard • It was an instruction set extension co-processor20 microseconds for an 80 bits floating point add • The 80387 was much quicker (first with 32 bit bus)'only' 2 microseconds for the same 80 bits addition • The 80486 integrated FP with the main CPU • With the Pentium, FP became ‘pipelined’one FP operation per clock, at > 100 MHz !

The ‘Weitek’ approach FP co-processors • These use the address bus to control operations • Address 0..7: Write/read FP data registers 0..7 • Address 8..15: Add written data to FP registers 0..7 • Address 16..23: Ditto, but subtract from registers • Address 24..31: Ditto, but multiply with registers • Etcetera (just an example, of course) • Speed limit: main processor read/write speed • These can interface with different main processors

Adhering to the IEEE standard • A given set of basic FP operations must be provided • Includes division, square root and FP  Decimal • The precision of operation results is precisely defined • Cutting corners to speed things up is not allowed • Four different rounding modes must be provided • It must be possible to calculate results with less than the maximum precision All these rules have been broken !

Intel’s operating system co-processor • Almost the same pins as their FP co-processor • Contains timers to generate timeouts • An interrupt controllerattached to the timers and 'event' input pins • A 16 KiloByte ROM with the iRMX operating system to be run by the main processor  Fraud: not a co-processor at all!

Processor support devices Part 3: Memory management, floating point