Memory Management

Memory Management • Basic memory management • Swapping • Virtual memory • Page replacement algorithms • Modeling page replacement algorithms • Design issues for paging systems • Implementation issues • Segmentation

Memory Management faster Intel : 8-, 16-, 32-bits MIPS: 32- bit 32 KB to a few MB 128 MB to 1GB larger • Ideally programmers want memory that is large, fast, non volatile • Memory hierarchy • small amount of fast, expensive memory – cache • some medium-speed, medium price main memory • gigabytes of slow, cheap disk storage • Memory manager handles the memory hierarchy 40 GB to 160 GB

Basic Memory Management

Basic Memory ManagementMemory management:(1) swapping and paging (2) without swapping and pagingMonoprogramming without Swapping or Paging Model (a) was used on mainframes and minicomputers, and is rarely used any more. Model (b) is used on some palmtop computers and embedded systems. Model (c) was used by the early personal computers. The portion of the system in ROM is called BIOS (Basic Input Output System) Except on simple embedded systems, monoprogramming is hardly used anymore.

Multiprogramming with Fixed Partitions (a) separate input queues for each partition (b) Single input queue

(-) multiple input queues • queue for a large partition is empty but queue for a small partition is full • since the partitions are fixed, any space in a partition not used by a job is lost • single input queue: whenever a partition becomes free, the job closest to the front of the queue that fits in it could be loaded into the empty partition and run • different strategy: since it is undesirable to waste a large partition on a small job, search the whole input queue whenever a partition becomes free and pick the largest job that fits the partition.

Swapping

Swapping Two general approaches to memory management • Swapping: Method of copying a process’s memory contents to secondary storage, removing the process from the memory and allocating the new free memory to a new process, running it for a while, then putting it back on disk. • Virtual memory: Capability of operating systems that enables programs to address more memory locations than are actually provided in main memory. Virtual memory systems help remove much of the burden of memory management from programmers, freeing them to concentrate on application development  Sec. 4.3.

Memory allocation time • Swapping system: • The number of processes in memory varies dynamically. • Locations of processes in memory vary dynamically. • Size of the partitions varies dynamically. • Memory Compaction: When swapping creates multiple holes in memory, it is possible to combine them all into one big one by moving all the processes downward as far as possible. • Usually not done because it requires a lot of CPU time.

How much memory should be allocated for a process when it is created or swapped? • If processes are created with a fixed size that never change, then the allocation is simple: the OS allocates exactly what is needed, no more and no less. • If processes’ data segments can grow, a problem occurs whenever a process tries to grow.

Allocation space for a growing data (a)Allocating space for growing data segment If the hole between processes A and B runs out, A or B will have to be moved to a hole with enough space, swapped out of the memory until a large enough hole can be created, or killed. (b)Allocating space for growing stack & data segment If the hole between stack segment and data segment runs out, the process will have to be moved to a hole with enough space, swapped out of the memory until a large enough hole can be created, or killed.

Two ways to keep track of memory usage • bit maps • lists

Memory Management with Bit Maps • Memory is divided up into allocation units, the size of unit may be as small as a few words as large as several kilobytes. • Part of memory with 5 processes, 3 holes • tick marks show allocation units • shaded regions are free

Trade-off: • The smaller the allocation unit, the larger the bitmap. • If the allocation unit is chosen large, the bitmap will become smaller, but the memory may be wasted in the last unit of the process if the the process size is not an exact multiple of the allocation unit. Main problem: • When it has been decided to bring a k-unit process into memory, the memory manager must search the bitmap to find a run of k consecutive 0 bits in the map. Searching a bitmap for a run of a given length is a slow operation.

Memory Management with Linked ListsLinked list of allocated and free memory segments • The segment list is kept sorted by address. Sorting this way has advantage that when a process terminates or is swapped out, updating the list is straightforward.

Updating the list requires replacing a P with H. • Two entries are coalesced into one, and the list becomes one entry shorter. • The same with (b). • Three entries are merged and two items are removed from the list.

Algorithms to allocate memory for a newly created processAssume that the memory manager knows how much memory to allocate. • First fit :The memory manager scans along the list of segments until it finds a hole that is big enough. The hole is then broken up into two pieces, one for the process and one for the unused memory. • It is a fast algorithm because it searches as little as possible. • Next fit :It works the same way as first, except that it keeps track of where it is whenever it finds a suitable hole. The next time it is called to find a hole, it starts searching the list from the place where it left off last time. • Simulations (Bays, 1977) show that it gives slightly worse performance than first fit. • Best fit: It searches the entire list and takes the smallest hole that is adequate. • It is slower than first fit.

Worst fit :To get around the problem of breaking up nearly exact matches into a process and tiny hole, it always takes the largest available hole, so that the hole broken off will be big enough to be useful. • Simulation has shown that the worst fit is not a very good idea either. • Quick fit :It maintains separate lists for some of the more common sizes requested. • e.g. a table with n entries, in which the first entry is a pointer to the head of a list of 4-KB holes, the second entry is the a pointer to a list of 8-KB holes, the third entry a pointer to 12-KB holes. • Finding a hole of required size is fast. • It has the same disadvantage as all schemes that sort by hole size, when a process terminates or is swapped out, finding its neighbor to see if a merge is possible is expensive.

Virtual Memory

Virtual Memory • Virtual memory: Capability of operating systems that enables programs to address more memory locations than are actually provided in main memory. Virtual memory systems help remove much of the burden of memory management from programmers, freeing them to concentrate on application development (Devised by Fotheringham, 1961) • Basic idea: the combined size of a program, data, and stack may exceed the amount of physical memory available for it. OS keeps those parts of the program currently in use in main memory, and the rest on disk • e.g. 16-MB program can run on a 4-MB machine by carefully choosing which 4-MB to keep in memory at each instant, with pieces of program being swapped between disk and memory as needed.

Paging • Paging: Virtual memory organization technique that divides an address space into fixed blocks of contiguous address. When applied to a process’s virtual address space, the blocks are called pages, which store process data and instructions. When applied to main memory, the blocks are called page frames.

Virtual address: Program-generated address (using indexing, base registers, segment registers and other ways). • Virtual address space: formed by all virtual address. Pentium II pro:36 bits address: 236 = 64GB • Memory management unit (MMU): a chip or collection of chips that maps the virtual addresses onto the physical memory addresses

Example of how the mapping works. • Virtual addresses: 16-bit (0 – 64KB) • Physical memory: 64KB • User program can be up to 64KB, but it cannot be loaded into memory entirely and run. • The virtual address space is divided into units called pages. • The corresponding units in physical memory are called page frames. • The pages and frame pages are always the same size. 4KB (512B – 64KB in real system) • 8 frame pages, 16 virtual pages e.g. MOV REG, 0 it is transformed into (by MMU) MOV REG, 8192

(24567-28671) (8192-12287) (4196-8191) (0-4095) e.g. MOV REG, 8192 is transformed into MOV REG, 24576 In the actual hardware, a Present/absent bit keeps track of which pages are physically present in memory.

Page fault: Fault that occurs as the result of an error when a process attempts to access a nonresident page, in which case the OS can load it from disk. • e.g. MOV REG, 32780 • (12-th byte within virtual page 8) • MMU notices that the page is unmapped and causes CPU to trap to OS. • OS picks a little-used page frame and writes back to the disk. • Then it fetches the page just referenced into frame page just freed. • Change the map and restart the trapped instruction.

Page Tables Page table: Table that stores entries that map page numbers to page frames. A page table contains an entry for each of a process’s virtual pages. e.g. 16-bit address: High-order 4 bits: virtual page number. Low-order 12 bits: offset 8196 is transformed into 24580 by MMU. Internal operation of MMU with 16 4 KB pages

The purpose of page table is to map virtual pages onto page frames. • Two major issues must be faced: (1) The page table can be extremely large. e.g. a computer uses 32-bit virtual addresses, page size: 4KB Page number = 232/ 212 = 220 (1 million) Remember that each process needs its own page table because it has its own virtual address space. (2) The mapping must be fast. The virtual-to-physical mapping must be done on every memory reference. A typical instruction has an instruction word, and often a memory operand as well. Consequently, it is necessary to make 1, 2, or sometimes more page table reference per instruction.

Hardware solutions: • Simplest design: one page table consisting of an array of fast hardware registers, with one entry for each virtual page, indexed by virtual page number. • Advantage: straightforward, and requires no memory reference. • Disadvantage: expensive (if the page table is large) • Page table entirely in main memory, and one hardware register that points to the start of the page table • Advantage: allows the memory map to be changed at a context switch by reloading one register. • Disadvantage: requires one or more memory references to read page table entries during the execution of each instruction. • Variations of the two approaches

Multilevel Page TablesTo get around the problem of having to store huge page tables in memory all the time. Second-level Page tables Second-level page tables 32-bit virtual address PT1:10 bits, PT2: 10 bits Offset:12 bits (Page size: 4KB ) Page number: 220 Top-level page table The secret to the multilevel page table method is to avoid keeping all tables in memory all the time. e.g. a process needs 12Mbytes, 4MB for text, the next 4MB for data, and the top 4MB for stack. Only 4 page tables are actually needed: top-level table, second level tables for 0 to 4M, 4M to 8M, and top 4M. e.g. Virtual address = 0x00402004, then PT1=1, PT2=2, Offset=4

Structure of a Page Tables Entry • The exact layout of an entry is highly machine dependent, but the kind of information present is roughly the same. • The size varies from computer to computer, but 32 bits is a common size. • Page frame number: the goal of the page mapping is to locate this value. • Present/absent bit: If this bit is 1, the entry is valid and can be used. If it is 0, the virtual page to which the entry belongs is not currently in memory. • Modified and Referenced bits: keep track of page usage. When a page is written to, the hardware automatically sets the modified bit. If the page in it has been modified, it must be written back to the disk. Modified bit is sometimes called dirty bit. The reference bit is set whenever a page is referenced. • Caching disabled bit: allows caching to be disabled for the page.

TLBs – Translation Lookaside Buffers • All paging schemes keep the page tables in memory => performance problems! • Most programs tend to make a large number of references to a small number of pages, and not the other way around • Solution: equip computers with a small hardware device for mapping virtual addresses to physical addresses without going through the page table • This device is called associative memory (AM) or translation lookaside buffer. It is usually inside the MMU and consists of a small number of entries (normally 32)

A TLB to speed up paging • When a virtual address is presented to the MMU for translation, the hardware first check to see if its virtual page number is present in TLB by comparing it to all the entries simultaneously. If a valid match is found and the access does not violate the protection bits, the page frame is taken directly from TLB, without going to the page table. • Hit ratio: fraction of memory references that can be satisfied from the TLBs. The higher the hit ratio, the better the performance. • When the virtual page number is not in TLB, the MMU detects the miss and does an ordinary page lookup.

Software TLB Management • Hardware TLB Management • MMU hardware recognizes the virtual memory has page table. TLB management and TLB fault handling are done by TLB. • Software TLB Management • Modern RISC computers do nearly all of these page management in software. • e.g. SPARC, MIPS, Alpha, and HP PA. • On these machines, TLB entries are explicitly loaded by the OS. When a TLB miss occurs, it just generates a TLB fault and tosses the problem to OS. The OS must find the page, remove an entry from the TLB, enter the new one, and restart the instruction that faulted. And, of course, all of this must be done in a handful of instructions because TLB misses occur much more frequently than page faults. • If TLB is reasonably large to reduce the miss rate, software management of TLB turns out to be acceptably efficient (Uhlig, 1994). • Main gain: simpler MMU, more area on CPU chip for cache and other features.

Inverted Page Tables • Today: 32-bit virtual address space and physical memory, 4 Kbytes pages size => each process need 2 20 entries in its page table (PT) with 4 bytes per entry = 4 Mbytes / process and PT is large but manageable (multilevel paging schemes) • RISC chips with 64-bit virtual address space?: • 64-bit virtual address space >>>> physical memory • 64-bit address space = 20 million terabytes • 4 Kbytes page size => 2 52 = 4 quadrillion PT entries => requires rethinking!!!!! • Solution: virtual address space immense, physical pages frames still manageable => inverted page table  in this design, there is one entry per page frame in real memory, rather than one entry per page of virtual address space. • E.g. with 64-bit virtual addresses, a 4-KB page, and 256 MB of RAM, and inverted page table only requires 65,536 entries. The entry keeps track of which (process, virtual page) is located in the page frame.

All virtual pages currently in memory that have the same hash value are chained together • Comparison of a traditional page table with an inverted page table • IBM and HP workstations use inverted page tables. It will become more common as 64-bit machines become wide-spread.

Page Replacement Algorithms

Page Replacement Algorithms • Page fault => OS has to select a page for replacement • Modified page => write back to disk • Not modified page => just overwrite with new page • How to decide which page should be replaced? • random • many algorithms take into account • usage • age • ...

Optimal Page Replacement Algorithm • What is optimal page replacement algorithm? • Unrealizable page-replacement strategy that replaces the page that will not be used until furthest in the future. • Easy to describe - impossible to implement because OS cannot look into future • Useful to evaluate page replacement algorithms • Best (optimal) page replacement algorithm • page fault occurs, a set of pages is in memory • label all pages with the number of instructions that will be executed before this page will be used again in the future • replace the page with the highest number • It is of no use in practical.

NRU(Not Recently Used) Page Replacement Algorithm Check in this order • What is NRU page replacement algorithm? • Page replacement strategy that uses referenced bits and modified bits to replace page. • Status bits associated with each page • R: page referenced (read or written) • M: page modified (written) (dirty bit, dirty page) • Four classes: • class 0: not referenced, not modified • class 1: not referenced, modified • class 2: referenced, not modified • class 4: referenced, modified • NRU removes a page at random from the lowest numbered nonempty class • Low overhead

FIFO Page Replacement Algorithm Page loaded first Most recently loaded page Time 0 3 7 8 12 14 15 18 A B C D E F G H • What is FIFO page replacement algorithm? • It is a page replacement strategy that replaces the page that has been in memory longest. • OS maintains list of all pages currently in memory. • Pages are stored in list by age. • FIFO replaces oldest pages in case of page fault. • Incurs low overhead, but does not predict future page usage accurately. • FIFO is rarely used in its pure form.

Second Chance Page Replacement Algorithm Page loaded first Most recently loaded page Time 0 3 7 8 12 14 15 18 A B C D E F G H A is treated like newly loaded page Time 3 7 8 12 14 15 18 20 B C D E F G H A • What is second chance page replacement algorithm? • It is a variation of FIFO page replacement that uses the referenced bit and FIFO queue to determine which page to replace. If the oldest page’s referenced bit is off, it replace the page. Otherwise it turns off the referenced bit on the oldest page and moves it to the tail of FIFO queue, and examines the next page or pages until it locates a page with its referenced bit turned off. • R: referenced bit. • Second chance is a reasonable algorithm • But, inefficient because it is moving pages around on its list

The Clock Page Replacement Algorithm When a page fault occurs, the page the arrow is pointing to is inspected. Action taken depends on the R bit R=0: evict page R=1: clear R & advance What is clock page replacement? It is a variation of second chance page replacement strategy that arranges the pages in a circular list instead of a linear list. • Pointer to the oldest page • R bit 0: page not referenced in last round => replace • R bit 1: page referenced in last round • set R bit to 0 • advance until first page with R = 0 is found • advance pointer to next entry in both cases

Least Recently Used (LRU) Page Replacement Algorithm 0 1 2 3 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 3 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 Pages referenced in this order: 0 1 2 3 2 1 0 3 2 3 • What is LRU page replacement algorithm? Page-replacement strategy that replaces the page that has not been referenced for longest time. LRU generally predicts future page usage well but incurs significant overhead. • Linked list. It is expensive: maintaining the list is time consuming operation. • Implement with special hardware — a counter. Each page table entry must also have a filed large enough to contain the counter. • Another special hardware that can contain a matrix of nn bits, initially all 0. At any instant, the row whose value is lowest is the least recently used.

Simulating LRU in Software Previous LRU algorithms are realizable in principle if machines have this hardware. They are no use to OS designer who is making a system for a machine that does not have this hardware. Solution: NFU (Not Frequently Used) algorithm: It requires a software counter associated with each page, initially zero. At each clock interrupt, OS scans all pages in memory. For each page, the R bit (0 or 1) is added to the counter. —Main problem of NFU algorithm: it never forget anything. Aging: Modifies NFU algorithm as follows, and makes it able to simulate LRU quite well. (1) The counters are each shifted right 1 bit before R bit is added in; (2) The R bit is added to the leftmost, rather than the rightmost.

The aging algorithm simulates LRU in software Note 6 pages for 5 clock ticks, (a) – (e) In practice, 8 bits is enough if a clock tick is around 20 msec.

The Working Set Page Replacement Algorithm W(k,t) Working set: the set of pages that a process is currently using. k: most recent memory reference t: time w(k,t): the size of the working set at time, t k

page span = current virtual time – time of last use : predetermined page span The working set page replacement algorithm: • The hardware is assumed to set R and M bits. • A periodic clock interrupt is assumed to cause software to run that clears R bit on every clock tick. • On page every fault, the page table is scanned to look for a suitable page to evict.

4.4.9 The WSClock Page Replacement Algorithm— An improved algorithm that is based on the clock algorithm but also uses the working set information.page span = current virtual time – time of last use: predetermined page span.

Review of Page Replacement Algorithms

Segmentation

Memory Management