Virtual Memory

Virtual Memory

Announcements • TA Swap • Tyler Steele is a 415 TA • Rohit Doshi is a 414 TA • My office hours today: 3:25-4:25

Processor Control Tertiary Storage (Tape) Secondary Storage (Disk) Second Level Cache (SRAM) Main Memory (DRAM) On-Chip Cache Datapath Registers 10,000,000s (10s ms) Speed (ns): 1s 10s-100s 100s 10,000,000,000s (10s sec) Size (bytes): 100s Ks-Ms Ms Gs Ts Review: Memory Hierarchy of a Modern Computer System • Take advantage of the principle of locality to: • Present as much memory as in the cheapest technology • Provide access at speed offered by the fastest technology

Review: A Summary on Sources of Cache Misses • Compulsory(cold start): first reference to a block • “Cold” fact of life: not a whole lot you can do about it • Note: When running “billions” of instruction, Compulsory Misses are insignificant • Capacity: • Cache cannot contain all blocks access by the program • Solution: increase cache size • Conflict(collision): • Multiple memory locations mapped to same cache location • Solutions: increase cache size, or increase associativity • Two others: • Coherence (Invalidation): other process (e.g., I/O) updates memory • Policy: Due to non-optimal replacement policy

32-Block Address Space: Block no. 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Direct mapped: block 12 can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 (12 mod 4) Fully associative: block 12 can go anywhere Block no. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Block no. Block no. Set 0 Set 1 Set 2 Set 3 Review: Where does a Block Get Placed in a Cache? • Example: Block 12 placed in 8 block cache

Review: Other Caching Questions • What line gets replaced on cache miss? • Easy for Direct Mapped: Only one possibility • Set Associative or Fully Associative: • Random • LRU (Least Recently Used) • What happens on a write? • Write through: The information is written to both the cache and to the block in the lower-level memory • Write back: The information is written only to the block in the cache • Modified cache block is written to main memory only when it is replaced • Question is block clean or dirty?

Virtual Address Physical Address Yes No Save Result Data Read or Write (untranslated) Caching Applied to Address Translation TLB Physical Memory CPU Cached? • Question is one of page locality: does it exist? • Instruction accesses spend a lot of time on the same page (since accesses sequential) • Stack accesses have definite locality of reference • Data accesses have less page locality, but still some… • Can we have a TLB hierarchy? • Sure: multiple levels at different sizes/speeds Translate (MMU)

What Actually Happens on a TLB Miss? • Hardware traversed page tables: • On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) • If PTE valid, hardware fills TLB and processor never knows • If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards • Software traversed Page tables (like MIPS) • On TLB miss, processor receives TLB fault • Kernel traverses page table to find PTE • If PTE valid, fills TLB and returns from fault • If PTE marked as invalid, internally calls Page Fault handler • Most chip sets provide hardware traversal • Modern operating systems tend to have more TLB faults since they use translation for many things • Examples: • shared segments • user-level portions of an operating system

Goals for Today • Virtual memory • How does it work? • Page faults • Resuming after page faults • When to fetch? • What to replace? • Page replacement algorithms • FIFO, OPT, LRU (Clock) • Page Buffering • Allocating Pages to processes

disk page 0 page 1 page table page 2 page 3 page 4 page N Physical memory Virtual memory What is virtual memory? • Each process has illusion of large address space • 232 for 32-bit addressing • However, physical memory is much smaller • How do we give this illusion to multiple processes? • Virtual Memory: some addresses reside in disk

Virtual Memory • Separates users logical memory from physical memory. • Only part of the program needs to be in memory for execution • Logical address space can therefore be much larger than physical address space • Allows address spaces to be shared by several processes • Allows for more efficient process creation

Virtual Memory • Load entire process in memory (swapping), run it, exit • Is slow (for big processes) • Wasteful (might not require everything) • Solutions: partial residency • Paging: only bring in pages, not all pages of process • Demand paging: bring only pages that are required • Where to fetch page from? • Have a contiguous space in disk: swap file (pagefile.sys)

Disk How does VM work? • Modify Page Tables with another bit (“valid”) • If page in memory, valid = 1, else valid = 0 • If page is in memory, translation works as before • If page is not in memory, translation causes a page fault 0 1 2 3 32 :V=1 4183 :V=0 177 :V=1 5721 :V=0 Mem Page Table

Page Faults • On a page fault: • OS finds a free frame, or evicts one from memory (which one?) • Want knowledge of the future? • Issues disk request to fetch data for page (what to fetch?) • Just the requested page, or more? • Block current process, context switch to new process (how?) • Process might be executing an instruction • When disk completes, set valid bit to 1, and current process in ready queue

Steps in Handling a Page Fault

Resuming after a page fault • Should be able to restart the instruction • For RISC processors this is simple: • Instructions are idempotent until references are done • More complicated for CISC: • E.g. move 256 bytes from one location to another • Possible Solutions: • Ensure pages are in memory before the instruction executes

Page Fault (Cont.) • Restart instruction • block move • auto increment/decrement location

When to fetch? • Just before the page is used! • Need to know the future • Demand paging: • Fetch a page when it faults • Prepaging: • Get the page on fault + some of its neighbors, or • Get all pages in use last time process was swapped

Performance of Demand Paging • Page Fault Rate 0  p  1.0 • if p = 0 no page faults • if p = 1, every reference is a fault • Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + swap page out + swap page in + restart overhead )

Demand Paging Example • Memory access time = 200 nanoseconds • Average page-fault service time = 8 milliseconds • EAT = (1 – p) x 200 + p (8 milliseconds) = (1 – p x 200 + p x 8,000,000 = 200 + p x 7,999,800 • If one access out of 1,000 causes a page fault EAT = 8.2 microseconds. This is a slowdown by a factor of 40!!

What to replace? • What happens if there is no free frame? • find some page in memory, but not really in use, swap it out • Page Replacement • When process has used up all frames it is allowed to use • OS must select a page to eject from memory to allow new page • The page to eject is selected using the Page Replacement Algorithm • Goal: Select page that minimizes future page faults

Page Replacement • Prevent over-allocation of memory by modifying page-fault service routine to include page replacement • Use modify (dirty) bitto reduce overhead of page transfers – only modified pages are written to disk • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory

Page Replacement

Page Replacement Algorithms • Random: Pick any page to eject at random • Used mainly for comparison • FIFO: The page brought in earliest is evicted • Ignores usage • Suffers from “Belady’s Anomaly” • Fault rate could increase on increasing number of pages • E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3 and 4 • OPT: Belady’s algorithm • Select page not used for longest time • LRU: Evict page that hasn’t been used the longest • Past could be a good predictor of the future

First-In-First-Out (FIFO) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 3 frames (3 pages can be in memory at a time per process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • Belady’s Anomaly: more frames  more page faults 1 1 4 5 2 9 page faults 2 1 3 3 3 2 4 1 1 5 4 2 10 page faults 2 1 5 3 3 2 4 4 3

FIFO Illustrating Belady’s Anomaly

Optimal Algorithm • Replace page that will not be used for longest period of time • 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • How do you know this? • Used for measuring how well your algorithm performs 1 4 6 page faults 2 3 4 5

Least Recently Used (LRU) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 1 5 1 2 2 2 2 2 5 4 3 4 5 3 3 4 3 4

0xffdcd: add r1,r2,r3 0xffdd0: ld r1, 0(sp) Implementing Perfect LRU • On reference: Time stamp each page • On eviction: Scan for oldest frame • Problems: • Large page lists • Timestamps are costly • Approximate LRU • LRU is already an approximation! 13 t=4 t=14 t=14 t=5 14 14

LRU: Clock Algorithm • Each page has a reference bit • Set on use, reset periodically by the OS • Algorithm: • FIFO + reference bit (keep pages in circular list) • Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict. • Problem: • Low accuracy for large memory R=1 R=1 R=0 R=0 R=1 R=0 R=1 R=1 R=1 R=0 R=0

R=1 R=1 R=0 R=0 R=1 R=0 R=1 R=1 R=1 R=0 R=0 LRU with large memory • Solution: Add another hand • Leading edge clears ref bits • Trailing edge evicts pages with ref bit 0 • What if angle small? • What if angle big?

Clock Algorithm: Discussion • Sensitive to sweeping interval • Fast: lose usage information • Slow: all pages look used • Clock: add reference bits • Could use (ref bit, modified bit) as ordered pair • Might have to scan all pages • LFU: Remove page with lowest count • No track of when the page was referenced • Use multiple bits. Shift right by 1 at regular intervals. • MFU: remove the most frequently used page • LFU and MFU do not approximate OPT well

evict add Page Buffering • Cute simple trick: (XP, 2K, Mach, VMS) • Keep a list of free pages • Track which page the free page corresponds to • Periodically write modified pages, and reset modified bit used free unmodified free list modified list (batch writes = speed)

Allocating Pages to Processes • Global replacement • Single memory pool for entire system • On page fault, evict oldest page in the system • Problem: protection • Local (per-process) replacement • Have a separate pool of pages for each process • Page fault in one process can only replace pages from its own process • Problem: might have idle resources

Allocation of Frames • Each process needs minimum number of pages • Example: IBM 370 – 6 pages to handle SS MOVE instruction: • instruction is 6 bytes, might span 2 pages • 2 pages to handle from • 2 pages to handle to • Two major allocation schemes • fixed allocation • priority allocation

Summary • Demand Paging: • Treat memory as cache on disk • Cache miss  get page from disk • Transparent Level of Indirection • User program is unaware of activities of OS behind scenes • Data can be moved without affecting application correctness • Replacement policies • FIFO: Place pages on queue, replace page at end • OPT: replace page that will be used farthest in future • LRU: Replace page that hasn’t be used for the longest time • Clock Algorithm: Approximation to LRU • Arrange all pages in circular list • Sweep through them, marking as not “in use” • If page not “in use” for one pass, than can replace

Virtual Memory

Virtual Memory

Presentation Transcript

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

VIRTUAL MEMORY

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory

Virtual Memory