1 / 52

Operating Systems & Memory Systems: Address Translation

Operating Systems & Memory Systems: Address Translation. CPS 220 Professor Alvin R. Lebeck Fall 2001. Outline. Address Translation basics 64-bit Address Space Managing memory OS Performance Throughout Review Computer Architecture Interaction with Architectural Decisions.

teness
Download Presentation

Operating Systems & Memory Systems: Address Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

  2. Outline • Address Translation • basics • 64-bit Address Space • Managing memory • OS Performance Throughout • Review Computer Architecture • Interaction with Architectural Decisions CPS 220

  3. System Organization interrupts Processor Cache Core Chip Set I/O Bus Main Memory Disk Controller Graphics Controller Network Interface Graphics Disk Disk Network CPS 220

  4. Applications Software Operating System Compiler This is IT CPU Memory I/O Hardware Multiprocessor Networks Computer Architecture • Interface Between Hardware and Software CPS 220

  5. Memory Hierarchy 101 Very fast 1ns clock Multiple Instructions per cycle P $ SRAM, Fast, Small Expensive DRAM, Slow, Big,Cheap (called physical or main) Memory Magnetic, Really Slow, Really Big, Really Cheap => Cost Effective Memory System (Price/Performance) CPS 220

  6. Virtual Memory: Motivation Virtual • Process = Address Space + thread(s) of control • Address space = PA • programmer controls movement from disk • protection? • relocation? • Linear Address space • larger than physical address space • 32, 64 bits v.s. 28-bit physical (256MB) • Automatic management Physical CPS 220

  7. Virtual Memory • Process = virtual address space + thread(s) of control • Translation • VA -> PA • What physical address does virtual address A map to • Is VA in physical memory? • Protection (access control) • Do you have permission to access it? CPS 220

  8. Virtual Memory: Questions • How is data found if it is in physical memory? • Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped • What data should be replaced on a miss? (Take CPS210 …) CPS 220

  9. Segmented Virtual Memory • Virtual address (232,264) to Physical Address mapping (230) • Variable size, base + offset, contiguous in both VA and PA Virtual Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000 0x11000 CPS 220

  10. Intel Pentium Segmentation Physical Address Space Logical Address Offset Seg Selector Global Descriptor Table (GDT) Segment Descriptor Segment Base Address CPS 220

  11. Pentium Segmention (Continued) • Segment Descriptors • Local and Global • base, limit, access rights • Can define many • Segment Registers • contain segment descriptors (faster than load from mem) • Only 6 • Must load segment register with a valid entry before segment can be accessed • generally managed by compiler, linker, not programmer CPS 220

  12. Offset Virtual page number Virtual Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000 0x11000 Paged Virtual Memory • Virtual address (232,264) to Physical Address mapping (228) • virtual page to physical page frame • Fixed Size units for access control & translation CPS 220

  13. Page Table • Kernel data structure (per process) • Page Table Entry (PTE) • VA -> PA translations (if none page fault) • access rights (Read, Write, Execute, User/Kernel, cached/uncached) • reference, dirty bits • Many designs • Linear, Forward mapped, Inverted, Hashed, Clustered • Design Issues • support for aliasing (multiple VA to single PA) • large virtual address space • time to obtain translation CPS 220

  14. L1 L2 L3 PO 21 seg 0/1 10 10 10 13 base + + + phys page frame number Alpha VM Mapping (Forward Mapped) • “64-bit” address divided into 3 segments • seg0 (bit 63=0) user code/heap • seg1 (bit 63 = 1, 62 = 1) user stack • kseg (bit 63 = 1, 62 = 0) kernel segment for OS • Three level page table, each one page • Alpha 21064 only 43 unique bits of VA • (future min page size up to 64KB => 55 bits of VA) • PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) • What do you do for replacement? CPS 220

  15. Inverted Page Table (HP, IBM) • One PTE per page frame • only one VA per physical frame • Must search for virtual address • More difficult to support aliasing • Force all sharing to use the same VA Virtual page number Offset Inverted Page Table (IPT) Hash VA PA,ST Hash Anchor Table (HAT) CPS 220

  16. Dir Table Offset Intel Pentium Segmentation + Paging Physical Address Space Logical Address Linear Address Space Offset Seg Selector Page Table Global Descriptor Table (GDT) Page Dir Segment Descriptor Segment Base Address CPS 220

  17. The Memory Management Unit (MMU) • Input • virtual address • Output • physical address • access violation (exception, interrupts the processor) • Access Violations • not present • user v.s. kernel • write • read • execute CPS 220

  18. Translation Lookaside Buffers (TLB) • Need to perform address translation on every memory reference • 30% of instructions are memory references • 4-way superscalar processor • at least one memory reference per cycle • Make Common Case Fast, others correct • Throw HW at the problem • Cache PTEs CPS 220

  19. Page Number Page offset phys frame v r w tag 1 2 4 . . . . . . . . . . . . 3 48 48:1 mux Fast Translation: Translation Buffer • Cache of translated addresses • Alpha 21164 TLB: 48 entry fully associative CPS 220

  20. TLB Design • Must be fast, not increase critical path • Must achieve high hit ratio • Generally small highly associative • Mapping change • page removed from physical memory • processor must invalidate the TLB entry • PTE is per process entity • Multiple processes with same virtual addresses • Context Switches? • Flush TLB • Add ASID (PID) • part of processor state, must be set on context switch CPS 220

  21. Hardware Managed TLBs • Hardware Handles TLB miss • Dictates page table organization • Compilicated state machine to “walk page table” • Multiple levels for forward mapped • Linked list for inverted • Exception only if access violation CPU TLB Control Memory CPS 220

  22. Software Managed TLBs • Software Handles TLB miss • Flexible page table organization • Simple Hardware to detect Hit or Miss • Exception if TLB miss or access violation • Should you check for access violation on TLB miss? CPU TLB Control Memory CPS 220

  23. Mapping the Kernel 264-1 User Stack Physical Memory • Digital Unix Kseg • kseg (bit 63 = 1, 62 = 0) • Kernel has direct access to physical memory • One VA->PA mapping for entire Kernel • Lock (pin) TLB entry • or special HW detection Kernel Kernel User Code/ Data 0 CPS 220

  24. Considerations for Address Translation Large virtual address space • Can map more things • files • frame buffers • network interfaces • memory from another workstation • Sparse use of address space • Page Table Design • space • less locality => TLB misses OS structure • microkernel => more TLB misses CPS 220

  25. Address Translation for Large Address Spaces • Forward Mapped Page Table • grows with virtual address space • worst case 100% overhead not likely • TLB miss time: memory reference for each level • Inverted Page Table • grows with physical address space • independent of virtual address space usage • TLB miss time: memory reference to HAT, IPT, list search CPS 220

  26. Virtual page number Offset Hash Hashed Page Table (HPT) VA PA,ST Hashed Page Table (HP) • Combine Hash Table and IPT [Huck96] • can have more entries than physical page frames • Must search for virtual address • Easier to support aliasing than IPT • Space • grows with physical space • TLB miss • one less memory ref than IPT CPS 220

  27. VPBN VPBN VPBN next next next PA0 attrib PA0 attrib PA0 attrib Clustered Page Table (SUN) • Combine benefits of HPT and Linear [Talluri95] • Store one base VPN (TAG) and several PPN values • virtual page block number (VPBN) • block offset VPBN Boff Offset Hash ... VPBN next PA0 attrib PA1 attrib PA2 attrib PA3 attrib ... CPS 220

  28. Reducing TLB Miss Handling Time • Problem • must walk Page Table on TLB miss • usually incur cache misses • big problem for IPC in microkernels • Solution • build a small second-level cache in SW • on TLB miss, first check SW cache • use simple shift and mask index to hash table CPS 220

  29. Next Time • More TLB issues • Virtual Memory & Caches • Multiprocessor Issues CPS 220

  30. Operating Systems & Memory Systems: Managing the Memory System CPS 220 Professor Alvin R. Lebeck

  31. Review: Address Translation • Map from virtual address to physical address • Page Tables, PTE • va->pa, attributes • forward mapped, inverted, hashed, clustered • Translation Lookaside Buffer • hardware cache of most recent va->pa translation • misses handled in hardware or software • Implications of larger address space • page table size • possibly more TLB misses • OS Structure • microkernels -> lots of IPC -> more TLB misses CPS 220

  32. 2 3 2 3 2 3 0 1 0 1 0 1 2 3 0 1 7 Cache Memory 102 • Block 7 placed in 4 block cache: • Fully associative, direct mapped, 2-way set associative • S.A. Mapping = Block Number Modulo Number Sets • DM = 1-way Set Assoc • Cache Frame • location in cache • Bit-selection DM 7 mod 4 SA 7 mod 2 FA Set 1 Set 0 Main Memory CPS 220

  33. Block Address TAG Index Block offset Cache Indexing • Tag on each block • No need to check index or block offset • Increasing associativity shrinks index, expands tag Fully Associative: No index Direct-Mapped: Large index CPS 220

  34. Address Translation and Caches • Where is the TLB wrt the cache? • What are the consequences? • Most of today’s systems have more than 1 cache • Digital 21164 has 3 levels • 2 levels on chip (8KB-data,8KB-inst,96KB-unified) • one level off chip (2-4MB) • Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping CPS 220

  35. TLBs and Caches CPU CPU CPU VA VA VA VA Tags $ PA Tags TLB $ TLB VA PA PA L2 $ TLB $ MEM PA PA MEM MEM Overlap $ access with VA translation: requires $ index to remain invariant across translation Conventional Organization Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem CPS 220

  36. Virtual Caches • Send virtual address to cache. Called VirtuallyAddressed Cache or just VirtualCachevs. Physical Cache or Real Cache • Avoid address translation before accessing cache • faster hit time to cache • Context Switches? • Just like the TLB (flush or pid) • Cost is time to flush + “compulsory” misses from empty cache • Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process • I/O must interact with cache CPS 220

  37. I/O and Virtual Caches Virtual Cache interrupts Processor Physical Addresses Cache Memory Bus I/O Bridge • I/O is accomplished • with physical addresses • DMA • flush pages from cache • need pa->va reverse • translation • coherent DMA I/O Bus Main Memory Disk Controller Graphics Controller Network Interface Graphics Disk Disk Network CPS 220

  38. Aliases and Virtual Caches 264-1 User Stack Physical Memory • aliases(sometimes called synonyms); Two different virtual addresses map to same physical address • But, but... the virtual address is used to index the cache • Could have data in two different locations in the cache Kernel Kernel User Code/ Data 0 CPS 220

  39. Page Offset Page Address Address Tag Block Offset Index Index with Physical Portion of Address • If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag • Limits cache to page size: what if want bigger caches and use same trick? • Higher associativity • Page coloring CPS 220

  40. Page Offset Page Address Address Tag Block Offset Index Page Coloring for Aliases • HW that guarantees that every cache frame holds unique physical address • OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame • one form of page coloring CPS 220

  41. Virtual Memory and Physically Indexed Caches Cache Page frames • Notion of bin • region of cache that may contain cache blocks from a page • Random vs careful mapping • Selection of physical page frame dictates cache index • Overall goal is to minimize cache misses CPS 220

  42. Careful Page Mapping [Kessler92, Bershad94] • Select a page frame such that cache conflict misses are reduced • only choose from available pages (no replacement induced) • static • “smart” selection of page frame at page fault time • dynamic • move pages around CPS 220

  43. Page Coloring • Make physical index match virtual index • Behaves like virtual index cache • no conflicts for sequential pages • Possibly many conflicts between processes • address spaces all have same structure (stack, code, heap) • modify to xor PID with address (MIPS used variant of this) • Simple implementation • Pick abitrary page if necessary CPS 220

  44. Bin Hopping • Allocate sequentially mapped pages (time) to sequential bins (space) • Can exploit temporal locality • pages mapped close in time will be accessed close in time • Search from last allocated bin until bin with available page frame • Separate search list per process • Simple implementation CPS 220

  45. Best Bin • Keep track of two counters per bin • used: # of pages allocated to this bin for this address space • free: # of available pages in the system for this bin • Bin selection is based on low values of used and high values of free • Low used value • reduce conflicts within the address space • High free value • reduce conflicts between address spaces CPS 220

  46. Hierarchical • Best bin could be linear in # of bins • Build a tree • internal nodes contain sum of child <used,free> values • Independent of cache size • simply stop at a particular level in the tree CPS 220

  47. Benefit of Static Page Coloring • Reduces cache misses by 10% to 20% • Multiprogramming • want to distribute mapping to avoid inter-address space conflicts CPS 220

  48. Dynamic Page Coloring • Cache Miss Lookaside (CML) buffer [Bershad94] • proposed hardware device • Monitor # of misses per page • If # of misses >> # of cache blocks in page • must be conflict misses • interrupt processor • move a page (recolor) • Cost of moving page << benefit CPS 220

  49. Outline • Page Coloring • Page Size CPS 220

  50. A Case for Large Pages • Page table size is inversely proportional to the page size • memory saved • Fast cache hit time easy when cache <= page size (VA caches); • bigger page makes it feasible as cache size grows • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, • larger page size maps more memory • reduces TLB misses CPS 220

More Related