1 / 51

Chapter 8: Main Memory

Chapter 8: Main Memory. Chapter 8: Memory Management. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium. Objectives. To provide a detailed description of various ways of organizing memory hardware

tsanford
Download Presentation

Chapter 8: Main Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8: Main Memory

  2. Chapter 8: Memory Management • Background • Swapping • Contiguous Memory Allocation • Paging • Structure of the Page Table • Segmentation • Example: The Intel Pentium

  3. Objectives • To provide a detailed description of various ways of organizing memory hardware • To discuss various memory-management techniques, including paging and segmentation • To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging

  4. Background • Program must be brought from disk into main memory and placed within a process for it to be run • Main memory and registers are only storage CPU can access directly • Registers internal to CPU, accessed in one CPU cycle • Main memory accessed through memory bus, can take many CPU cycles • Causes memory stalls • Cache sits between main memory and CPU registers • Faster access, compensates for speed difference between CPU and memory • Accurate memory management and hardware-enforced memory protection are essential to safe multitasking

  5. Background • Each process has its own memory space • Defined by base registerand limitregister • Hardware-enforced protection: CPU checks every memory access the process makes to insure it’s in the process’ space • Generates interrupt on illegal memory access • Only kernel-level privilege can supersede this for: • Operations to load/change the base & limit register addresses • Accessing other process’s memory space

  6. Address Binding • Memory starts at 0x0, but program can start from any base address • Program code uses symbolic addresses (like function and variable names) • Compiler binds them to relocatable addresses (fixed addresses relative to the start of the program’s memory space) • Linker or loader binds relocatable addresses to fixed physical addresses in memory

  7. Address Binding • Compile time • If memory location known a priori, absolute codecan be generated • Must recompile code if starting location changes • Load time • Compiler generates relocatable code if memory location is not known at compile time • Can be loaded at any starting location without recompiling, but must be reloaded if starting location changes during execution • Execution time • Binding delayed until run time if the process can be moved during its execution from one memory segment to another • Approach used by most OS

  8. Logical vs. Physical Address Space • We need to differentiate two different kinds of memory addresses • A memory address in a program, as seen by the CPU, is a logicaladdress, also called a virtual address • The corresponding address in physical memory is a physicaladdress • Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme

  9. Memory-Management Unit (MMU) • Run-time mapping from virtual to physical memory address done by special hardware: the MMU • Can see it as an interface hardware between CPU and memory • In reality, it’s part of the CPU (has been part of Intel CPU since 80286) • MMU adds the value in a relocation register (i.e. the base address) to every address generated by a user process at the time it is sent to memory • The user program only deals with logical addresses; it never sees the real physical addresses

  10. Dynamic Loading • Routines are not loaded until it is called • Calling routine checks if called routine is in memory; load it if not • Until then, routine kept on disk in relocatable code format • Better memory-space utilization; unused routine is never loaded • Useful when large amounts of code are needed to handle infrequently occurring cases • Operating system provides dynamic loading library • User program’s responsibility to take advantage of dynamic loading

  11. Dynamic Linking • Linking can be postponed until execution time • Dynamically linked libraries (*.DLL in Windows) • At compile time, a stub code is included in the binary instead of the routine; used to locate the appropriate library routine • At run time, stub replaces itself with the address of the routine in DLL, and executes the routine • Very advantageous • Only one copy of common routines • Saves disk/memory spaces, makes updates easier • Since routine can be loaded in another process’s protected memory space, kernel support needed to check if routine is in memory

  12. Swapping • A process can be swapped temporarily out of memory to a backing store (typically, hard drive), and then brought back into memory for continued execution • Roll out, roll in: swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed • Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped • System maintains a special ready queue of runnable processes which have memory images on disk

  13. Memory Protection • Several user processes loaded into memory at once • Registers used to protect user processes from each other, and from changing operating-system size due to transient code • Limit register contains range of logical addresses; each logical address must be less than the limit register or interrupt is generated • Relocation register contains value of base address in memory • MMU maps logical address to physical address dynamically

  14. Contiguous Allocation • Multiple Partition Method • Memory divided in fixed-size partitions; each partition contains exactly one process • When a process terminates, its partition is freed and allocated to a new process • One of the earliest memory allocation techniques, used in IBM OS/360 (circa 1964) • Clearly limited • Limits multitasking to the number of partitions • Limits size of processes to size of partitions OS process 5 process 8 process 9 process 2

  15. Contiguous Allocation • Variable Partition Scheme • OS keeps track of which parts of memory are occupied/free • Initially memory is empty; as processes are added and then removed, holes of available memory of various sizes form • When a process is added to memory, it is allocated memory from a hole large enough to accommodate it • If there isn’t one, OS can wait (reduce multitasking) or allocate a smaller process (risk starvation) • This is dynamic allocation OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2

  16. Contiguous Allocation • Dynamic allocation problem: Given several possible holes of different sizes, how to pick the one to allocate process into? • First-fit: Allocate the first hole that is big enough • Fastest method • Best-fit: Allocate the smallest hole that is big enough • Must search entire list (unless ordered by size) • Produces the smallest leftover hole (least memory wasted) • Worst-fit: Allocate the largest hole • Must search entire list (unless ordered by size) • Produces the largest leftover hole (which can then be allocated to another process) • Good idea, but typically the worst of the three methods both in speed and memory use

  17. Fragmentation • External Fragmentation • Total memory space exists to satisfy a request, but it is not contiguous • On average, a third of memory lost to fragmentation • Internal Fragmentation • Typically, memory is divided in blocks and allocated by block, rather than byte • Allocated blocks may be larger than requested memory (i.e. last block not entirely needed); the extra memory in the block is lost • Unused memory internal to a partition • Reduce external fragmentation by compaction • Shuffle memory contents to place all free memory together in one large hole • Done at execution time; only possible if dynamic relocation is allowed • Can be a very expensive operation OS process 5 process 9 process 10 process 2

  18. Paging • Non-contiguous allocation eliminates problems of external fragmentation and compaction; also makes swapping easier • Paging • Supported by hardware • Physical memory divided into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 16 MB) • Logical memory divided into blocks of same size called pages • OS keeps track of all free frames • To run a program of size n pages, need to find n free frames and load program • Page table used to translate logical to physical addresses • Internal fragmentation is still a problem though

  19. Paging

  20. Paging Address Translation • Address generated by CPU is divided into: • Page number (p) used as an index into the page table which contains base address of each frame (f) in physical memory • Page offset (d) combined with base address to define the physical memory address that is sent to the memory unit • For given logical address space 2m divided in page size2n, a logical address translates to: page number page offset p d m - n n

  21. Paging Address Translation 16-byte of logical memory and 4-byte pages/frames: m = 4 and n = 2 page number page offset 10 01 1001 = 9 (contains j) Page number 10 (2, directs to frame 1, which begins at 1x4 = 4 bytes in physical memory) Offset 01 (1, means +1byte) Total: 1x4 +1 = 5 bytes in physical memory (contains j)

  22. Paging Hardware

  23. Paging • No external fragmentation • Any free frame can be allocated • Internal fragmentation capped at almost 1 frame per process • Worst case: process needs n frames + 1 byte, meaning OS allocates n+1 frames and wastes almost entire last one • So we’d like frames as small as possible • Overhead for a page table for each process, which is in main memory • A page table entry is 4 bytes (32 bits) large • Size of page table proportional to the number of pages it contains • So we’d like pages as large as possible • Typically, pages 4KB in size • Can be larger – Solaris pages are 8Kb to 4MB in size

  24. Frames • Process sees its memory space as a contiguous set of pages • In reality, OS loads it page by page in whichever frames are available • OS needs to maintain a frame table • How many frames there are • Which frames are allocated • Which frames are free

  25. Frames After allocation Before allocation

  26. Implementation of Page Table • Hardware implementation • Page table stored in CPU in dedicated registers • Fast address translation • Modified and updated only by the kernel’s dispatcher • Changing page table requires changing all registers • Number of registers limits number of page table entries; typically less than 256 • Not practical for modern computers, with millions of page table entries • Software implementation • Page table stored in main memory • CPU only stores the page table base register (PTBR) • Only one register to modify to change the page table • Every data/instruction access requires two memory accesses, one for the page table and one for the data/instruction • Slower translation

  27. Translation Look-aside Buffer • The two-memory-access problem can be solved by the use of a special fast-lookup hardware cache called translation look-aside buffer(TLB) • TLB entry: <page number, frame number> • Special hardware search all keys simultaneously • Very fast • Very expensive, so limited to 64 – 1024 entries • If page is found (TLB hit), this takes less than 10% longer than direct memory access! • If not found (TLB miss) • Look for it in the page table • Add it to the TLB, clearing another entry if TLB is full • Some TLB include an address space identifier (ASID) with each entry, identifying the process to which it belongs • TLB can safely store pages from multiple processes • Without this info, TLB needs to be flushed each time a new page table is loaded

  28. Translation Look-aside Buffer

  29. Translation Look-aside Buffer • The Effective Access Time(EAT) is the time to access an address in physical memory • For example • It takes 100 ns to access data in physical memory once the address is known • It takes 20 ns to look up an address in the TLB • 90% TLB hit rate • In the 10% TLB misses, we then need to access the page table in memory (an extra 100ns) • EAT for accessing memory directly (as in Real Mode) = 100ns • EAT for paging = 0.9 * (20 + 100) + 0.1 * (20 + 100 + 100) = 130ns

  30. Memory Protection • Memory protection implemented by associating protection bits with each frame • Example: read / write / execute bits • Illegal operations caught by CPU and cause interrupt • Valid-invalid bit attached to each entry in the page table • “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page • “invalid” indicates that the page is not in the process’ logical address space

  31. Valid or Invalid Bit Internal fragmentation page table length register

  32. Shared Pages • Pages give us a new option for shared memory: shared pages • If reentrant code, execute-only code that cannot be modified, is used by several processes, we can put this code in shared pages that are part of all processes • Code is not modifiable, so no race condition / critical section problem • Shared code must appear in same location in the logical address space of all processes

  33. Structure of the Page Table • Hierarchical Paging • Tables of page tables • Hashed Page Tables • Page table sorted by hash index • Inverted Page Tables • Table entries represent frames, maps to pages

  34. Hierarchical Page Tables • Given a modern computer with large (32 bit) logical address space and small (4KB) pages, each process may have a million pages • Each page is represented by a 4 byte entry in the page table, giving a 4MB page table per process • Would be a lot to allocate in a contiguous block in main memory • Break up the logical address space into multiple page tables • A simple technique is a two-level page table • Outer page table point to (inner) page table • Page table points to page, as before

  35. Two-Level Page-Table Scheme

  36. Two-Level Paging Translation • A logical address (on 32-bit machine with 4K page size) is divided into: • a page number consisting of 20 bits • a page offset (d) consisting of 12 bits • Since the page table is paged, the page number is further divided into: • a 10-bit page number (p1) • a 10-bit page offset (p2) • Thus, a forward-mappedlogical address translates as follows:

  37. Three-Level Paging Scheme • Given a 64-bit processor and logical space, two-level paging may be insufficient • To keep inner page table size constant, outer page table grows considerably • Alternative is three-level paging • Second outer page table maps to outer page table, which maps to inner page table, which maps to memory • Second outer page still big, we could split it again • And again, and again… • But for each split, we need one more memory access to translate the address, which slows down the system

  38. Hashed Page Tables • Hash function converts large & complex data into “unique” & uniformly-distributed integer values • Unique in theory, but in practice hash collisions occur where different data to map to the same hash value • Hashed page table • Apply hash function to page number • Entry in hashed page table is a linked list of <page#, frame#> entries, to account for collisions • Find the entry that matches the page number, and retrieve the frame number

  39. Inverted Page Table • Page table has one entry for each page in logical memory, consisting the address of the frame storing it • Inverted page table has one entry for each frame in physical memory • Each entry consists of the address of the page stored in that memory location • With information about the process that owns that page • No need for one page table per process – only one table for the system • Use hash function to improve search

  40. Segmentation • Segmentationis a memory-management scheme similar to a user view of memory • Memory is separated in segments such as “the code”, “the data”, “the subroutine library”, “the stack” • No real order or relation between them • Memory access is done by specifying two information, segment number and offset

  41. 1 4 2 3 Segmentation 1 2 3 4 physical memory user view

  42. Segmentation Architecture • Logical address consists of <segment-number, offset> • Segment tableit to a physical addresses • Segment base: starting physical address • Segment limit: length of the segment

  43. Segmentation Architecture

  44. Example: The Intel Pentium • Part of the Intel IA-32 architecture • Pentium combines segmentation and paging • CPU generates logical address • Given to segmentation unit (GDT), which produces linear addresses • Given to paging unit, which generates physical address in main memory • Segmentation unit + paging unit  MMU

  45. Pentium Segmentation • Logical address space of a process divided in two partitions • Private segments to the process; information in local descriptor table • Shared segments between all processes; information stored in globaldescriptor table • Logical address is <selector, offset> • Selector is <segment, global/local, privilege level> • Converted to linear address • CPU • Six segment registers (CS, DS, ES, SS, FS, GS) • Six 8-byte cache registers to hold the corresponding descriptors; no need to fetch them from memory each time

  46. Pentium Paging • Two-level paging scheme • 32-bit linear address • Page directory(outer page table) entry pointed by bits 22-31, points to an inner page table • Inner page table entry pointed by bits 12-21, points to 4KB page • Offset inside page pointed by bits 0-11 • Gives the physical address • CPU • Page directory of current process pointed to by Page Directory Base Register, contained in CR3 register linear address physical address

  47. Example: Linux on Pentium • Only uses six segments • Kernel code; kernel data; user code; user data; task state segment (TSS); default LDT segment • All processes share user code/data segments, but have their own TSS and may create a custom LDT segment • Linux only recognises two privilege levels, kernel (00) and user (11)

  48. Example: Linux on Pentium • Three-level paging scheme • Logical address is <global directory, middle directory, page table, offset> • But Pentium only has a two-level paging scheme! • On Pentium, Linux middle directory is 0 bits (i.e. non-existent)

  49. Review • What is the difference between external and internal fragmentation? • What is the difference between logical and physical memory? • Compare the memory allocation schemes of contiguous variable-size memory allocation, contiguous segmentation, and paging, in terms of external fragmentation, internal fragmentation, and memory sharing.

  50. Exercises • Read everything • If you have the “with Java” textbook, skip the Java sections and subtract 1 to the following section numbers • 8.1 • 8.2 • 8.3 • 8.5 • 8.6 • 8.9 • 8.11 • 8.13 • 8.14 • 8.15 • 8.17 • 8.23 • 8.25 • 8.26 • 8.27 • 8.28 • 8.29 • 8.32

More Related