360 likes | 479 Views
Virtual Memory. Announcements. TA Swap Tyler Steele is a 415 TA Rohit Doshi is a 414 TA My office hours today: 3:25-4:25. Processor. Control. Tertiary Storage (Tape). Secondary Storage (Disk). Second Level Cache (SRAM). Main Memory (DRAM). On-Chip Cache. Datapath. Registers.
E N D
Announcements • TA Swap • Tyler Steele is a 415 TA • Rohit Doshi is a 414 TA • My office hours today: 3:25-4:25
Processor Control Tertiary Storage (Tape) Secondary Storage (Disk) Second Level Cache (SRAM) Main Memory (DRAM) On-Chip Cache Datapath Registers 10,000,000s (10s ms) Speed (ns): 1s 10s-100s 100s 10,000,000,000s (10s sec) Size (bytes): 100s Ks-Ms Ms Gs Ts Review: Memory Hierarchy of a Modern Computer System • Take advantage of the principle of locality to: • Present as much memory as in the cheapest technology • Provide access at speed offered by the fastest technology
Review: A Summary on Sources of Cache Misses • Compulsory(cold start): first reference to a block • “Cold” fact of life: not a whole lot you can do about it • Note: When running “billions” of instruction, Compulsory Misses are insignificant • Capacity: • Cache cannot contain all blocks access by the program • Solution: increase cache size • Conflict(collision): • Multiple memory locations mapped to same cache location • Solutions: increase cache size, or increase associativity • Two others: • Coherence (Invalidation): other process (e.g., I/O) updates memory • Policy: Due to non-optimal replacement policy
32-Block Address Space: Block no. 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Direct mapped: block 12 can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 (12 mod 4) Fully associative: block 12 can go anywhere Block no. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Block no. Block no. Set 0 Set 1 Set 2 Set 3 Review: Where does a Block Get Placed in a Cache? • Example: Block 12 placed in 8 block cache
Review: Other Caching Questions • What line gets replaced on cache miss? • Easy for Direct Mapped: Only one possibility • Set Associative or Fully Associative: • Random • LRU (Least Recently Used) • What happens on a write? • Write through: The information is written to both the cache and to the block in the lower-level memory • Write back: The information is written only to the block in the cache • Modified cache block is written to main memory only when it is replaced • Question is block clean or dirty?
Virtual Address Physical Address Yes No Save Result Data Read or Write (untranslated) Caching Applied to Address Translation TLB Physical Memory CPU Cached? • Question is one of page locality: does it exist? • Instruction accesses spend a lot of time on the same page (since accesses sequential) • Stack accesses have definite locality of reference • Data accesses have less page locality, but still some… • Can we have a TLB hierarchy? • Sure: multiple levels at different sizes/speeds Translate (MMU)
What Actually Happens on a TLB Miss? • Hardware traversed page tables: • On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) • If PTE valid, hardware fills TLB and processor never knows • If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards • Software traversed Page tables (like MIPS) • On TLB miss, processor receives TLB fault • Kernel traverses page table to find PTE • If PTE valid, fills TLB and returns from fault • If PTE marked as invalid, internally calls Page Fault handler • Most chip sets provide hardware traversal • Modern operating systems tend to have more TLB faults since they use translation for many things • Examples: • shared segments • user-level portions of an operating system
Goals for Today • Virtual memory • How does it work? • Page faults • Resuming after page faults • When to fetch? • What to replace? • Page replacement algorithms • FIFO, OPT, LRU (Clock) • Page Buffering • Allocating Pages to processes
disk page 0 page 1 page table page 2 page 3 page 4 page N Physical memory Virtual memory What is virtual memory? • Each process has illusion of large address space • 232 for 32-bit addressing • However, physical memory is much smaller • How do we give this illusion to multiple processes? • Virtual Memory: some addresses reside in disk
Virtual Memory • Separates users logical memory from physical memory. • Only part of the program needs to be in memory for execution • Logical address space can therefore be much larger than physical address space • Allows address spaces to be shared by several processes • Allows for more efficient process creation
Virtual Memory • Load entire process in memory (swapping), run it, exit • Is slow (for big processes) • Wasteful (might not require everything) • Solutions: partial residency • Paging: only bring in pages, not all pages of process • Demand paging: bring only pages that are required • Where to fetch page from? • Have a contiguous space in disk: swap file (pagefile.sys)
Disk How does VM work? • Modify Page Tables with another bit (“valid”) • If page in memory, valid = 1, else valid = 0 • If page is in memory, translation works as before • If page is not in memory, translation causes a page fault 0 1 2 3 32 :V=1 4183 :V=0 177 :V=1 5721 :V=0 Mem Page Table
Page Faults • On a page fault: • OS finds a free frame, or evicts one from memory (which one?) • Want knowledge of the future? • Issues disk request to fetch data for page (what to fetch?) • Just the requested page, or more? • Block current process, context switch to new process (how?) • Process might be executing an instruction • When disk completes, set valid bit to 1, and current process in ready queue
Resuming after a page fault • Should be able to restart the instruction • For RISC processors this is simple: • Instructions are idempotent until references are done • More complicated for CISC: • E.g. move 256 bytes from one location to another • Possible Solutions: • Ensure pages are in memory before the instruction executes
Page Fault (Cont.) • Restart instruction • block move • auto increment/decrement location
When to fetch? • Just before the page is used! • Need to know the future • Demand paging: • Fetch a page when it faults • Prepaging: • Get the page on fault + some of its neighbors, or • Get all pages in use last time process was swapped
Performance of Demand Paging • Page Fault Rate 0 p 1.0 • if p = 0 no page faults • if p = 1, every reference is a fault • Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + swap page out + swap page in + restart overhead )
Demand Paging Example • Memory access time = 200 nanoseconds • Average page-fault service time = 8 milliseconds • EAT = (1 – p) x 200 + p (8 milliseconds) = (1 – p x 200 + p x 8,000,000 = 200 + p x 7,999,800 • If one access out of 1,000 causes a page fault EAT = 8.2 microseconds. This is a slowdown by a factor of 40!!
What to replace? • What happens if there is no free frame? • find some page in memory, but not really in use, swap it out • Page Replacement • When process has used up all frames it is allowed to use • OS must select a page to eject from memory to allow new page • The page to eject is selected using the Page Replacement Algorithm • Goal: Select page that minimizes future page faults
Page Replacement • Prevent over-allocation of memory by modifying page-fault service routine to include page replacement • Use modify (dirty) bitto reduce overhead of page transfers – only modified pages are written to disk • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory
Page Replacement Algorithms • Random: Pick any page to eject at random • Used mainly for comparison • FIFO: The page brought in earliest is evicted • Ignores usage • Suffers from “Belady’s Anomaly” • Fault rate could increase on increasing number of pages • E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3 and 4 • OPT: Belady’s algorithm • Select page not used for longest time • LRU: Evict page that hasn’t been used the longest • Past could be a good predictor of the future
First-In-First-Out (FIFO) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 3 frames (3 pages can be in memory at a time per process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • Belady’s Anomaly: more frames more page faults 1 1 4 5 2 9 page faults 2 1 3 3 3 2 4 1 1 5 4 2 10 page faults 2 1 5 3 3 2 4 4 3
Optimal Algorithm • Replace page that will not be used for longest period of time • 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • How do you know this? • Used for measuring how well your algorithm performs 1 4 6 page faults 2 3 4 5
Least Recently Used (LRU) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 1 5 1 2 2 2 2 2 5 4 3 4 5 3 3 4 3 4
0xffdcd: add r1,r2,r3 0xffdd0: ld r1, 0(sp) Implementing Perfect LRU • On reference: Time stamp each page • On eviction: Scan for oldest frame • Problems: • Large page lists • Timestamps are costly • Approximate LRU • LRU is already an approximation! 13 t=4 t=14 t=14 t=5 14 14
LRU: Clock Algorithm • Each page has a reference bit • Set on use, reset periodically by the OS • Algorithm: • FIFO + reference bit (keep pages in circular list) • Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict. • Problem: • Low accuracy for large memory R=1 R=1 R=0 R=0 R=1 R=0 R=1 R=1 R=1 R=0 R=0
R=1 R=1 R=0 R=0 R=1 R=0 R=1 R=1 R=1 R=0 R=0 LRU with large memory • Solution: Add another hand • Leading edge clears ref bits • Trailing edge evicts pages with ref bit 0 • What if angle small? • What if angle big?
Clock Algorithm: Discussion • Sensitive to sweeping interval • Fast: lose usage information • Slow: all pages look used • Clock: add reference bits • Could use (ref bit, modified bit) as ordered pair • Might have to scan all pages • LFU: Remove page with lowest count • No track of when the page was referenced • Use multiple bits. Shift right by 1 at regular intervals. • MFU: remove the most frequently used page • LFU and MFU do not approximate OPT well
evict add Page Buffering • Cute simple trick: (XP, 2K, Mach, VMS) • Keep a list of free pages • Track which page the free page corresponds to • Periodically write modified pages, and reset modified bit used free unmodified free list modified list (batch writes = speed)
Allocating Pages to Processes • Global replacement • Single memory pool for entire system • On page fault, evict oldest page in the system • Problem: protection • Local (per-process) replacement • Have a separate pool of pages for each process • Page fault in one process can only replace pages from its own process • Problem: might have idle resources
Allocation of Frames • Each process needs minimum number of pages • Example: IBM 370 – 6 pages to handle SS MOVE instruction: • instruction is 6 bytes, might span 2 pages • 2 pages to handle from • 2 pages to handle to • Two major allocation schemes • fixed allocation • priority allocation
Summary • Demand Paging: • Treat memory as cache on disk • Cache miss get page from disk • Transparent Level of Indirection • User program is unaware of activities of OS behind scenes • Data can be moved without affecting application correctness • Replacement policies • FIFO: Place pages on queue, replace page at end • OPT: replace page that will be used farthest in future • LRU: Replace page that hasn’t be used for the longest time • Clock Algorithm: Approximation to LRU • Arrange all pages in circular list • Sweep through them, marking as not “in use” • If page not “in use” for one pass, than can replace