Virtual Memory : Motivation Historically, there were two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burden of a small, limited amount of main memory. [Patt&Henn 04] …a system has been devised to make the core drum combination appear to programmer as a single level store, the requisite transfers taking place automatically Kilbum et al.
DATA MAIN PROCESSOR MEMORY MANAGE- MENT UNIT HIGH- SPEED CACHE MAIN MEMORY BACKING STORE CONTROL LOGICAL ADDRESS PHYSICAL ADDRESS So: the purpose of VM • provide sharing • automatically manage the M hierarchy (as “one-level”) • simplify loading (for relocation)
Virtual Address Page fault Using elaborate Software page fault Handling algorithm Address Translator Physical Address Structure of Virtual Memory from Processor to Memory
64K virtual address space 32K main memory Main memory address Virtual address } 4K } 4K (b) A Paging System (a)
Virtual page page frame Main memory Page frame Page Table 1 = present in main memory, 0 = not present in main memory
Disk storage Physical memory Virtual page number Page table The page table maps each page in virtual memory to either a page in physical memory or a page stored on disk, which is the next level in the hierarchy.
Technology Technology Access Time $ per GB in 2004 SRAM 0.5 – 5ns $4,000 – 10,000 DRAM 50 - 70ns $100 - 200 Magnetic disk 5 -20 x 10^6ns $0.5 - 2
Typical ranges of parameters for virtual memory. These figures, contrasted with the values for caches, represent increases of 10 to 100,000 times.
PAGE MAP Virtual Address Mapping VIRTUAL ADDRESS Address within Page Page Number Displacement Base Address of Page PAGE (in Memory)
Terminology • Page • Page fault • Virtual address • Physical address • Memory mapping or address translation
VM Simplifies Loading • VM provide relocation function • Address mapping allows programs to be load in any location in physical M • Under VM relocation does not need special OS + hardware support as in the past
Address Translation Consideration • Direct mapping using register sets • Indirect mapping using tables • Associative mapping of frequently used pages
The Page Table (PT) must have one entry for each page in virtual memory! How many Pages? How large is PT?
4 Key Design Decisions in VM Design • Pages should be large enough to amortize the high access time. (from 4 KB to 16 KB are typical, and some designers are considering size as large as 64 KB.) • Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative)
4 Key Design Decisions in VM Design (con’d) • Page fault (misses) in a virtual memory system can be handled in software, because the overhead will be small compared to the access time to disk. Furthermore, the software can afford to used clever algorithms for choosing how to place pages, because even small reductions in the miss rate will pay for the cost of such algorithms. • Using write-through to manage writes in virtual memory will not work since writes take too long. Instead, we need a scheme that reduce the number of disk writes.
What happens on a write ? • Write-through to secondary storage is impractical for VM • write-back is used: • advantages (reduce number of writes to disk, amortize the cost) • dirty-bit
Page Size Selection Constraints • Efficiency of secondary memory device • Page table size • Fragmentation (internal) (last part of last page) • Program logic structure logic block size: < 1K ~ 4K • Table fragmentation [Kai, P68] (PT occupies some space)
Page Size Selection • PT size • Miss ratio • PT transfer from disk to memory efficiency • Internal fragmentation text heap stack • Start-up time of a process - the smaller the faster! 3 x 0.5 = 1.5 times of a page size per process!
An Example Case 1 VM page size 512 VM address space 64K Total virtual page = = 128 pages 64K 512
Case 2 VM page size 512 = 29 VM address space 4G = 232 Total virtual page = = 8M pages if each PTE has 32 bits: so total PT size (bytes) 8M x 4 = 32M bytes Note : assuming Main Memory has working set 4M byte or = = = 213 = 8192 frames 4G 512 ~ ~ 4M 512 222 29
How about VM address space =252 (R-6000) (4 Petabytes) page size 4K bytes so total number of virtual pages: 252 212 = 240 = !
Techniques for Reducing PT Size • Set a lower limit, and permit dynamic growth • Permit growth from both directions • Inverted page table (a hash table) • Multi-Level page table (segments and pages) • PT itself can be paged: I.e. put PT itself in virtual address space (Note: some small portion of pages should be in main memory and never paged out)
Address within Page 11 bits 11 bits 10 bits Segment Number Page Number Displacement Base of Segment Table 0 1 2047 SEGMENT TABLE Base Address of Page Table 0 1 2047 PAGE TABLE Base Address of Page Base + 0 Base + 1 Base + 1023 PAGE (in Memory) Two-level Address mapping
Placement: OS designers always pick lower miss rates vs. simpler placement algorithm • So, “fully associativity - VM pages can go anywhere in the main M (compare with sector cache) • Question: why not use associative hardware? (# of PT entries too big!)
VM: Implementation Issues • Page faults handling • Translation lookahead buffer (TLB) • Protection issues
Fast Address Translation PT must involve at least two accesses of M for each M address Improvement: • Store PT in fast registers (Example: Xerox: 256 R ?) • TLB for multiprogramming, should store pid as part of tags in TLB.
Page Fault Handling • When a virtual page number is not in TLB, then PT in M is accessed (through PTBR) to find the PTE • If PTE indicates that the page is missing a page fault occurs • Context switch!
Virtual page number TLB Physical memory Page table Disk storage The TLB acts as a cache on the page table for the entries that map to physical pages only
Some typical values for a TLB might be: Miss penaly some time may be as high as upto 100 cycles. TLB size can be as long as 16 entries.
TLB Design • Placement policy: • Small TLBs: full-associativity can be used • large TLBs: fully-associativity may be too slow • Replacement policy: sometime even random policy is used for speed/simplicity
Virtual address TLB access No Yes TLB miss exception TLB hit? No Yes Write? Try to read data from cache Check protection Yes No Write data into cache, update the dirty bit, and Put the data and the address into the write buffer Cache hit? Cache miss stall Processing a read or a write through the DECStation 3100 TLB and cache
Virtual address Requested access type pid ip iw S/U RWX TLB Page map Page fault PME (x) Replacement policy Page frame address in memory (PFA) RWX pid M C P PFA in S.M. iw Access fault Operation validation Physical address If s/u = 1 - supervisor mode PME(x) * C = 1-page PFA modified PME(x) * P = 1-page is private to process PME(x) * pid is process identification number PME(x) * PFA is page frame address Virtual to read address translation using page map PME - Page map entry
Translation Lookaside Buffer • TLB - miss rate is low (Clark-Emer data  3~4 times smaller then usually cache miss ratio) • When TLB-miss, the penalty is relatively low (a TLB miss usually result in a cache fetch)
cont’d • TLB-miss implies higher miss rate for the main cache • TLB translation is process-dependent • strategies for context switching 1. tagging by context 2. flushing complete purge by context (shared) No absolute answer
Valid Dirty Tag Physical page number = = A Case Study Virtual address 31 30 29 28 27 …………….....15 14 13 12 11 10 9 8 ………..…3 2 1 0 Virtual page number Page offset 20 12 TLB 20 TLB hit Physical address 16 Tag 14 2 Index Byte offset Valid Tag Data Cache 32 Cache hit Data DECStation 3100
Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 8-32 bytes (block) 1 to 4 blocks 1,024+ bytes (disk sector = page) Review: The Memory Hierarchy • Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Processor Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory (Relative) size of the memory at each level
Virtual Memory • Use main memory as a “cache” for secondary memory • Allows efficient and safe sharing of memory among multiple programs • Provides the ability to easily run programs larger than the size of physical memory • Simplifies loading a program for execution by providing for code relocation (i.e., the code can be loaded anywhere in main memory) • What makes it work? – again the Principle of Locality • A program is likely to access a relatively small portion of its address space during any period of time • Each program is compiled into its own address space – a “virtual” address space • During run-time each virtual address must be translated to a physical address (an address in main memory)
Two Programs Sharing Physical Memory • A program’s address space is divided into pages (all one fixed size) or segments (variable sizes) • The starting location of each page (either in main memory or in secondary memory) is contained in the program’s page table Program 1 virtual address space main memory Program 2 virtual address space
Translation Physical page number Page offset 29 . . . 12 11 0 Physical Address (PA) Address Translation • A virtual address is translated to a physical address by a combination of hardware and software • So each memory request first requires an address translation from the virtual space to the physical space • A virtual memory miss (i.e., when the page is not in physical memory) is called a page fault Virtual Address (VA) 31 30 . . . 12 11 . . . 0 Virtual page number Page offset
Address Translation Mechanisms Virtual page # Offset Physical page # Offset Physical page base addr V 1 1 1 1 1 1 0 1 0 1 0 Main memory Page Table (in main memory) Disk storage
miss VA PA Trans- lation Cache Main Memory CPU hit data Virtual Addressing with a Cache • Thus it takes an extra memory access to translate a VA to a PA • This makes memory (cache) accesses very expensive (if every access was really two accesses) • The hardware fix is to use a Translation Lookaside Buffer (TLB) – a small cache that keeps track of recently used address mappings to avoid having to do a page table lookup
Physical page base addr V Tag 1 1 1 0 1 TLB Making Address Translation Fast Virtual page # Physical page base addr V 1 1 1 1 1 1 0 1 0 1 0 Main memory Page Table (in physical memory) Disk storage
Translation Lookaside Buffers (TLBs) • Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped V Virtual Page # Physical Page # Dirty Ref Access • TLB access time is typically smaller than cache access time (because TLBs are much smaller than caches) • TLBs are typically not more than 128 to 256 entries even on high end machines
hit ¾ t ¼ t miss VA PA TLB Lookup Cache Main Memory CPU miss hit Trans- lation data A TLB in the Memory Hierarchy • A TLB miss – is it a page fault or merely a TLB miss? • If the page is loaded into main memory, then the TLB miss can be handled (in hardware or software) by loading the translation information from the page table into the TLB • Takes 10’s of cycles to find and load the translation info into the TLB • If the page is not in main memory, then it’s a true page fault • Takes 1,000,000’s of cycles to service a page fault • TLB misses are much more frequent than true page faults
TLB Event Combinations Yes – what we want! Yes – although the page table is not checked if the TLB hits Yes – TLB miss, PA in page table Yes – TLB miss, PA in page table, but data not in cache Yes – page fault Impossible – TLB translation not possible if page is not present in memory Impossible – data not allowed in cache if page is not in memory
Reducing Translation Time • Can overlap the cache access with the TLB access • Works when the high order bits of the VA are used to access the TLB while the low order bits are used as index into cache Block offset 2-way Associative Cache Index PA Tag VA Tag Tag Data Tag Data PA Tag TLB Hit = = Cache Hit Desired word
PA VA Trans- lation Main Memory CPU Cache hit data Why Not a Virtually Addressed Cache? • A virtually addressed cache would only require address translation on cache misses but • Two different virtual addresses can map to the same physical address (when processes are sharing data), i.e., two different cache entries hold data for the same physical address – synonyms • Must update all cache entries with the same physical address or the memory becomes inconsistent