Lecture 9-1 Virtual Memory

Lecture 9-1Virtual Memory Original Note By Prof. Mike Schulte Present by Pradondet Nilagupta Spring 2001

Virtual Memory • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. • VM provides the following benefits • Allows multiple programs to share the same physical memory • Allows programmers to write code as though they have a very large amount of main memory • Automatically handles bringing in data from disk • Cache terms vs. VM terms • Cache block => page or segment • Cache Miss => page fault or address fault

Virtual Memory Basics • Programs reference “virtual” addresses in a non-existent memory • These are then translated into real “physical” addresses • Virtual address space may be bigger than physical address space • Divide physical memory into blocks, called pages • Anywhere from 512 to 16MB (4k typical) • Virtual-to-physical translation by indexed table lookup • Add another cache for recent translations (the TLB) • Invisible to the programmer • Looks to your application like you have a lot of memory! • Anyone remember overlays?

VM: Page Mapping Process 1’s Virtual Address Space Page Frames Process 2’s Virtual Address Space Disk Physical Memory

VM: Address Translation 20 bits 12 bits Log2 of pagesize Virtual page number Page offset Per-process page table Valid bit Protection bits Dirty bt Reference bit Page Table base Physical page number Page offset To physical memory

Typical Page Parameters • It’s a lot like what happens in a cache • But everything (except miss rate) is a LOT worse

Paging vs. Segmentation • Pages are fixed sized blocks • Segments vary from 1 byte to 232 (for 32bit addresses) bytes

Cache and VM Parameters • How is virtual memory different from caches? • Software controls replacement - why? • Size of virtual memory determined by size of processor address • Disk is also used to store the file system - nonvolatile

Paged and Segmented VM(Figure 5.38, pg. 442) • Virtual memories can be catagorized into two main classes • Paged memory : fixed size blocks • Segmented memory : variable size blocks

Paged vs. Segmented VM • Paged memory • Fixed sized blocks (4 KB to 64 KB) • One word per address (page number + page offset) • Easy to replace pages (all same size) • Internal fragmentation (not all of page is used) • Efficient disk traffic (optimize for page size) • Segmented memory • Variable sized blocks (up to 64 KB or 4GB) • Two words per address (segment + offset) • Difficult to replace segments (find where segment fits) • External fragmentation (unused portions of memory) • Inefficient disk traffic (may have small or large transfers) • Hybrid approaches • Paged segments: segments are a multiple of a page size • Multiple page sizes: (e.g., 8 KB, 64 KB, 512 KB, 4096 KB)

Pages are Cached in a Virtual Memory System Can Ask the Same Four Questions we did about caches • Q1: Block Placement • choice: lower miss rates and complex placement or vice versa • miss penalty is huge • so choose low miss rate ==> place page anywhere in physical memory • similar to fully associative cache model • Q2: Block Addressing - use additional data structure • fixed size pages - use a page table • virtual page number ==> physical page number and concatenate offset • tag bit to indicate presence in main memory

Normal Page Tables • Size is number of virtual pages • Purpose is to hold the translation of VPN to PPN • Permits ease of page relocation • Make sure to keep tags to indicate page is mapped • Potential problem: • Consider 32bit virtual address and 4k pages • 4GB/4KB = 1MW required just for the page table! • Might have to page in the page table… • Consider how the problem gets worse on 64bit machines with even larger virtual address spaces! • Alpha has a 43bit virtual address with 8k pages… • Might have multi-level page tables

Inverted Page Tables Similar to a set-associative mechanism • Make the page table reflect the # of physical pages (not virtual) • Use a hash mechanism • virtual page number ==> HPN index into inverted page table • Compare virtual page number with the tag to make sure it is the one you want • if yes • check to see that it is in memory - OK if yes - if not page fault • If not - miss • go to full page table on disk to get new entry • implies 2 disk accesses in the worst case • trades increased worst case penalty for decrease in capacity induced miss rate since there is now more room for real pages with smaller page table

Inverted Page Table Page Offset • Only store entries • For pages in physical • memory Hash Page Frame V = OK Frame Offset

Address Translation Reality • The translation process using page tables takes too long! • Use a cache to hold recent translations • Translation Lookaside Buffer • Typically 8-1024 entries • Block size same as a page table entry (1 or 2 words) • Only holds translations for pages in memory • 1 cycle hit time • Highly or fully associative • Miss rate < 1% • Miss goes to main memory (where the whole page table lives) • Must be purged on a process switch

Back to the 4 Questions • Q3: Block Replacement (pages in physical memory) • LRU is best • So use it to minimize the horrible miss penalty • However, real LRU is expensive • Page table contains a use tag • On access the use tag is set • OS checks them every so often, records what it sees, and resets them all • On a miss, the OS decides who has been used the least • Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate

Last Question • Q4: Write Policy • Always write-back • Due to the access time of the disk • So, you need to keep tags to show when pages are dirty and need to be written back to disk when they’re swapped out. • Anything else is pretty silly • Remember – the disk is SLOW!

Page Sizes An architectural choice • Large pages are good: • reduces page table size • amortizes the long disk access • if spatial locality is good then hit rate will improve • Large pages are bad: • more internal fragmentation • if everything is random each structure’s last page is only half full • Half of bigger is still bigger • if there are 3 structures per process: text, heap, and control stack • then 1.5 pages are wasted for each process • process start up time takes longer • since at least 1 page of each type is required to prior to start • transfer time penalty aspect is higher

More on TLBs • The TLB must be on chip • otherwise it is worthless • small TLB’s are worthless anyway • large TLB’s are expensive • high associativity is likely • ==> Price of CPU’s is going up! • OK as long as performance goes up faster

Address Translation withPage Table (Figure 5.40, pg. 444) • A page table translates a virtual page number into a physical page number • The page offset remains unchaged • Page tables are large • 32 bit virtual address • 4 KB page size • 2^20 4 byte table entries = 4MB • Page tables are stored in main memory => slow • Cache table entries in a translation buffer

Fast Address Translation with Translation Buffer (TB)(Figure 5.41, pg. 446) • Cache translated addresses in TB • Alpha 21064 data TB • 32 entries • fully associative • 30 bit tag • 21 bit physical address • Valid and read/write bits • Separate TB for instr. • Steps in translation • compare page no. to tags • check for memory access violation • send physical page no. of matching tag • combine physical page no. and page offset

Selecting a Page Size • Reasons for larger page size • Page table size is inversely proportional to the page size; therefore memory saved • Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses • Reasons for a smaller page size • Want to avoid internal fragmentation: don’t waste storage; data must be contiguous within page • Quicker process start for small processes - don’t need to bring in more memory than needed

Memory Protection • With multiprogramming, a computer is shared by several programs or processes running concurrently • Need to provide protection • Need to allow sharing • Mechanisms for providing protection • Provide Base and Bound registers: Base ฃAddress ฃBound • Provide both user and supervisor (operating system) modes • Provide CPU state that the user can read, but cannot write • Branch and bounds registers, user/supervisor bit, exception bits • Provide method to go from user to supervisor mode and vice versa • system call : user to supervisor • system return : supervisor to user • Provide permissions for each flag or segment in memory

Alpha VM Mapping(Figure 5.43, pg. 451) • “64-bit” address divided into 3 segments • seg0 (bit 63=0) user code • seg1 (bit 63 = 1, 62 = 1) user stack • kseg (bit 63 = 1, 62 = 0) kernel segment for OS • Three level page table, each one page • Reduces page table size • Increases translation time • PTE bits; valid, kernel & user read & write enable

Alpha 21064 Memory Hierarchy • The Alpha 21064 memory hierarchy includes • A 32 entry, fully associative, data TB • A 12 entry, fully associative instruction TB • A 8 KB direct-mapped physically addressed data cache • A 8 KB direct-mapped physically addressed instruction cache • A 4 entry by 64-bit instruction prefetch stream buffer • A 4 entry by 256-bit write buffer • A 2 MB directed mapped second level unified cache • The virtual memory • Maps a 43-bit virtual address to a 34-bit physical address • Has a page size of 8 KB

Alpha Memory Performance: Miss Rates 8K 8K 2M

Alpha CPI Components • Largest increase in CPI due to • I stall: Instruction stalls from branch mispredictions • Other: data hazards, structural hazards

Pitfall: Address space to small • One of the biggest mistakes than can be made when designing an architect is to devote to few bits to the address • address size limits the size of virtual memory • difficult to change since many components depend on it (e.g., PC, registers, effective-address calculations) • As program size increases, larger and larger address sizes are needed • 8 bit: Intel 8080 (1975) • 16 bit: Intel 8086 (1978) • 24 bit: Intel 80286 (1982) • 32 bit: Intel 80386 (1985) • 64 bit: Intel Merced (1998)

Pitfall: Predicting Cache Performance of one Program from Another Program • 4KB Data cache miss rate 8%,12%,or 28%? • 1KB Instr cache miss rate 0%,3%,or 10%? • Alpha vs. MIPS for 8KB Data:17% vs. 10%

Pitfall: Simulating Too Small an Address Trace

Virtual Memory Summary • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • The large miss penalty of virtual memory leads to different stategies from cache • Fully associative, TB + PT, LRU, Write-back • Designed as • paged: fixed size blocks • segmented: variable size blocks • hybrid: segmented paging or multiple page sizes • Avoid small address size

Summary 2: Typical Choices

Lecture 9-1 Virtual Memory

Lecture 9-1 Virtual Memory

Presentation Transcript

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory

Virtual Memory 1

Virtual Memory 1

Chapter 9: Virtual Memory

Chapter 9 Virtual Memory

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory

Chapter 9 Virtual Memory

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory

Lecture 9 Virtual Memory (chapter 9)

Chapter 9: Virtual Memory

Lecture: Virtual Memory

Chapter 9 Virtual Memory

Chapter 9: Virtual Memory

Lecture 22 Virtual Memory (1)

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory