1 / 73

実践システムソフトウェア Practical System Software

実践システムソフトウェア Practical System Software. 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻. Schedule. Oct. 6 Introduction Oct. 13 Linux and Windows Basic Architecture Oct. 20 Lab Experiment 1 Setting up and Introducing the WRK environment Oct. 27 Scheduler Nov. 10 Lab Experiment 2

cain-gibson
Download Presentation

実践システムソフトウェア Practical System Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 実践システムソフトウェアPracticalSystem Software 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻 part 4

  2. Schedule • Oct. 6 Introduction • Oct. 13 Linux and Windows Basic Architecture • Oct. 20 Lab Experiment 1 • Setting up and Introducing the WRK environment • Oct. 27 Scheduler • Nov. 10 Lab Experiment 2 • Adding a new system call for counting some kernel events (context switch, page fault, process creation/deletion, etc..) • Assignment 1 • Nov. 17 No class • Nov. 24 Synchronizations • Dec. 1 Lab Experiment 3 • User space synchronization and priority inversion • Assignment 2 • Dec. 8 Virtual Memory • Dec. 15 Lab Experiment 4 • Experiment on working set • Assignment 3 • Dec. 22 I/O & File System • Jan. 12 Lab Experiment 5 • Final Assignment • Jan. 19 Lab Experiment 6 • Jan. 26 Lab Experiment 7 Grade: based on the number of attendances and the evaluation of four reports part 4

  3. Memory Management in WindowsNT & Linux Advanced operating systems course, The University of Tokyo

  4. Outline • Important notions • Physical memory management • Virtual memory management • Windows NT • Linux 2.6

  5. Important notions • (Page) Frame: unit of the physical memory (usually 4kB) • Page: unit of the virtual memory (same size with page frame) • PTE: page table entry, associates a virtual page to a physical frame • Virtual address space: a set of address intervals (described by PTEs) comprising the valid memory addresses of a process • MMU: memory management unit, a piece of hardware performing the address translation based on PTE entries • TLB: translation look-aside buffer, fast PTE cache

  6. Physical memory management Advanced operating systems course, The University of Tokyo

  7. Kernel responsibilities • Keep track allocated and free physical memory • Granularity is page frame • Memory allocation and release are frequent • Needs to be done quickly • Avoid memory fragmentation • Utilize memory efficiently • Swap in and out memory pages on demand • Note: physical memory management details will be discussed in Win/Linux sections below!

  8. Virtual Memory: Concept and Techniques Advanced operating systems course, The University of Tokyo

  9. The Concept of Virtual Memory • Application always references “virtual addresses” • Hardware and software translates, or maps, virtual addresses to physical • Not all of an application’s virtual address space is in physical memory at one time... • ...but hardware and software fool the application into thinking that it is • The rest is kept on disk, and is brought into physical memory automatically as needed

  10. Address Translation Steps (on Intel architecture) • Intel x86 provides two levels of address translation • Segmentation (mandatory, since 8086) • Paging (optional, since 80386) • Segmentation: first level of address translation • Intel: logical address (selector:offset) to linear address (32 bits) • Windows/Linux virtual address is Intel linear address (32 bits) • Paging: second level of address translation • Virtual address (32 bits) to physical address • Physical address: 32 bits (4 GB)

  11. Segmentation • The programmer‘s view of memory: • Collection of variable-sized segments (text, data, stack, subroutines,..) • No necessary ordering among segments • Logical address: <segment-number, offset> • Hardware: • Segment table containing base address and limit for each segment Segment table Physical memory s limit base CPU s d yes < + no Trap, addressing error

  12. Drawback of Segmentation: Memory fragmentation • External Fragmentation – total memory space exists to satisfy a request, but it is not contiguous. • Internal Fragmentation – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used. • Reduce external fragmentation by compaction • Shuffle memory contents to place all free memory together in one large block. • Compaction is possible only if relocation is dynamic, and is done at execution time. • I/O problem • Latch job in memory while it is involved in I/O. • Do I/O only into OS buffers.

  13. Solution: Paging • Dynamic storage allocation algorithms for varying-sized chunks of memory may lead to fragmentation • Solutions: • Compaction – dynamic relocation of processes • Noncontiguous allocation of process memory in equally sized pages (this avoids the memory fitting problem) • Paging breaks physical memory into fixed-sized blocks (called frames) • Logical (virtual) memory is broken into pages (of the same size)

  14. Supported page sizes on different architectures • On x86, default page size is 4 KB • x86 supports 4KB or 4MB • In PAE mode, large pages are 2 MB • On x64, default page size is 4 KB • large pages are 4 MB • On Itanium, default page size is 8 KB • Itanium supports :4k, 8k, 16k, 64k, 256k, 1mb, 4mb, 16mb, 64mb, or 256mb – large is 16MB

  15. Paging: Basic Method • When a process is executed, its pages are loaded into any available frames from backing store (disk) • Hardware support for paging consists of page tables • Logical addresses consist of page number and offset Physicalmemory CPU p d f d Logical address Physicaladdress p offset Page number Page framesare typically2-4 kb Page table

  16. Interpreting a x86 Virtual Address x86 32-bit 31 22 21 12 11 0 Page table selector Page table entry selector Byte within page 12 bits 10 bits 10 bits • Two-level page table: • Page table selector • Page table entry

  17. index physical page number (“page frame number” or “PFN”) index x86 Virtual Address Translation PFN 0 1 31 0 2 Page table selector Page table entry selector Byte within page 3 4 CR3 5 physical address 6 7 8 9 10 11 12 Page Tables (up to 512 per process, plus up to 512 system-wide) Page Directory (one per process, 1024 entries) Physical Pages (up to 2^20)

  18. Interpreting a x64 Virtual Address x64 64-bit (48-bit in today’s processors) 47 39 38 30 29 21 20 12 11 0 Page directory pointer selector Page map level4 selector Page table selector Page table entry selector Byte within page 12 bits 9 bits 9 bits 9 bits 9 bits • Four-level page table: • Page map level 4 selector • Page directory selector • Page table selector • Page table entry

  19. x64 Virtual Address Translation PFN 0 1 48 0 2 Page mapLevel 4 Page dirpointer Page table selector Page table entry selector Byte within page 3 4 5 6 7 8 9 10 11 12 Page Map Level 4 Page Directory Pointers PageDirectories Page Tables Physical Pages (up to 2^40) CR3

  20. Page Table Entries (PTEs) • Page tables are array of Page Table Entries (PTEs) • Valid PTEs have two fields: • Page Frame Number (PFN) • Flags describing state and protection of the page Res (writable on MP Systems) Res Res Global Res (large page if PDE) Dirty Accessed Cache disabled Write through Owner Write (writable on MP Systems) Reserved bitsare used only when PTE is not valid 12 31 0 valid Page frame number U P Cw Gi L D A Cd Wt O W V

  21. PTE Status and Protection Bits (Intel x86)

  22. Paging needs Hardware Support • Every memory access requires access to page table • Page table should be implemented in hardware • Page tables exist on a per-user process basis • Small page tables can be just a set of registers • Problem: size of physical memory, # of processes • Page tables should be kept in memory • Only base address of page table is kept in a special register • Problem: speed of memory accesses • Translation look-aside buffers (TLBs) • Associative registers store recently used page table entries • TLBs are fast, expensive, small: 8..2048 entries • TLB must be flushed on process context switches

  23. Translation Look-Aside Buffer (TLB) • Address translation requires two lookups: • Find right table in page directory • Find right entry in page table • Most CPU cache address translations • Array of associative memory: translation look-aside buffer (TLB) • TLB: virtual-to-physical page mappings of most recently used pages Virtual page #: 17 Virtual page #: 5 Page frame 290 Virtual page #: 64 Invalid Virtual page #: 17 Page frame 1004 Simultaneousread and compare Virtual page #: 7 Invalid Virtual page #: 65 Page frame 801

  24. Paging Hardware With TLB Physicalmemory CPU p d f d Logical address Physicaladdress TLB hit offset Page # Frame # Page number TLB p TLB miss Page table

  25. Page Fault Handling • Reference to invalid page is called a page fault • Kernel trap handler dispatches • Memory manager fault handler called • MmAccessFault () in WindowsNT • do_page_fault() in Linux • Runs in context of thread that incurred the fault • Attempts to resolve the fault orraises exception • Page faults can be caused by variety of conditions: • Write protection • Invalid address • Accessing kernel page from user-mode, etc..

  26. Copy-On-Write Pages • Used for sharing between process address spaces • Pages are originally set up as shared, read-only, faulted from the common file • Access violation on write attempt alerts pager • Pager makes a copy of the page and allocates it privately to the process doing the write, backed to the paging file • So, only need unique copies for the pages in the shared region that are actually written (“lazy evaluation”) • Original values of data are still shared • Used intensively in both Windows and Linux!

  27. How Copy-On-Write Works:Before Process Address Space Process Address Space Physical memory Orig. Data Page 1 Orig. Data Page 2 Page 3

  28. How Copy-On-Write Works:After Process Address Space Process Address Space Physical memory Orig. Data Page 1 Mod’d. Data Page 2 Page 3 Copy of page 2

  29. Swapping In a multiprogramming environment: • Processes can temporarily be swappedout of memory to backing store in order to allow for execution of other processes On the basis of physical addresses: • Then, processes will be swapped in into same memory space that they occupied previously On the basis of logical addresses: • What current OSes call swapping is rather paging out whole processes. • Then, processes can be swapped in at arbitrary physical addresses. Main memory Operatingsystem Backing store User space Swapout ProcessP1 ProcessP2 Swapin

  30. Memory Mapped Files • A way to take part of a file and map it to a range of virtual addresses (address space is 2 GB, but files can be much larger) • Called : • “file mapping objects” in Windows API • File backed mappings in Linux • Bytes in the file then correspond one-for-one with bytes in the region of virtual address space • Read from the “memory” fetches data from the file • Pages are kept in physical memory as needed • Changes to the memory are eventually written back to the file (can request explicit flush) • Initial mapped files in a process include: • The executable image • One or more Dynamically Linked Libraries (DLLs) • Processes can map additional files as desired (data files or DLLs)

  31. Memory Mapped Files • No need to perform direct file I/O (read/write) • Data structures will be saved – be careful with pointers • Convenient & efficient in-memory algorithms: • Can process data much larger than physical memory • Improved performance for file processing • No need to manage buffers and file data • OS does the hard work: efficient & reliable • Multiple processes can share memory • No need to consume space in paging file

  32. Windows NT Advanced operating systems course, The University of Tokyo

  33. Page Frame Number (PFN) Database • One entry (24 bytes) for each physical page • Describes state of each page in physical memory • Entries for active/valid and transition pages contain: • Original PTE value (to restore when paged out) • Original PTE virtual address and container PFN • Working set index hint (for the first process...) • Entries for other pages are linked in: • Free, standby, modified, zeroed, bad lists (parity error will kill kernel) • Share count (active/valid pages): • Number of PTEs which refer to that page; 1->0: candidate for free list • Reference count: • Locking for I/O: INC when share count 1->0; DEC when unlocked • Share count = 0 & reference count = 1 is possible • Reference count 1->0: page is inserted in free, standby or modified lists

  34. Page Frame Number (PFN) Database – states of PFN entries

  35. PFN Entry details in different states • A PFN entry is a C union, holds different information depending on its state Working set index Forward link PTE address PTE address Share count Backward link Flags Type Reference count Flags Type Reference count Original PTE contents Original PTE contents PFN on PTE PFN on PTE PFN for a page in working set PFN for a page on the standby or modified list

  36. PFN Entry details in different states (cont..) Forward link Event address PTE address PTE address Color chain PFN number Share count Flags Type Reference count Flags Type Reference count Original PTE contents Original PTE contents PFN on PTE PFN on PTE PFN for a page on free or zero list PFN for a page with an I/O progress

  37. Managing Physical Memory • System keeps PFN entries on one of several lists: • Free page list • Modified page list • Standby page list • Zero page list • Bad page list - pages that failed memory test at system startup • Lists are implemented by forward/backward index numbers of the PFN entries • Maintained as FIFO lists or queues

  38. 32-bit x86 Address Space • 32-bits = 4 GB Default 3 GB user space 2 GB User process space 3 GB User process space 2 GB System Space 1 GB System Space

  39. 64-bit Address Spaces • 64-bits = 17,179,869,184 GB • x64 today supports 48 bits virtual = 262,144 GB • IA-64 today support 50 bits virtual = 1,048,576 GB Itanium x64 8192 GB (8 TB) User process space 7152 GB (7 TB) User process space 6657 GB System Space 6144 GB System Space

  40. Virtual Address Space Descriptors (VADs) VAD AVL tree of VADs • VADs describe layout of virtual address space • Not the page mappings (that is in the PTEs) • Used by memory manager to • Interpret access faults • Assists in “lazy evaluation” • Find available memory areas Process Object VAD VAD VAD VAD Virtual Address Space Descriptors

  41. Relation of PTEs and PFN database Process 1 page table Zeroed valid Active Invalid:disk address Standby Standby Invalid:transition Free Active valid Invalid:disk address Modified Prototype PTE Valid Active valid Bad Invalid:transition Modified Process 2 page table PFN database Modifiedno write Invalid:disk address Process 3 page table

  42. Shared Memory & Mapped Files • Shared memory + copy-on-write per default • Executables are mapped as read-only • Memory manager uses section objects to implement shared memory (file mapping objects in Windows API) Process 1 virtual memory Physical memory compiler image Process 2 virtual memory

  43. Prototype PTEs • Software structure to manage potentially shared pages • Array of prototype PTEs is created as part of section object(part of segment structure) • First access of a page mapped to a view of a section object:memory manager uses prototype PTE to fill in real PTE used for address translation; • Reference count for shared pages in PFN database • Shared page valid: • process & prototype PTE point to physical page • Page invalidated: • process PTE points to prototype PTE • Prototype PTE describes 5 states for shared page: • Active/valid, Transition, Demand zero, Page file, Mapped file • Layer between page table and page frame database

  44. Prototype PTEs for shared pages – the bigger picture • Two virtual pages in a mapped view • First page is valid; 2nd page is invalid and in page file • Prototype PTE contains exact location • Process PTE points to prototype PTE PFN n Segment structure PFN Valid PFN n Invalid - points to prototype PTE Valid PFN n PFN n Invalid – in page file PTE address Page directory Share count=1 Page table Physicalmemory PFN entry Prototype pagetable

  45. Process Working Set • Working set: • All the physical pages “owned” by a process • Essentially, all the pages the process can reference without incurring a page fault • Working set limit: • The maximum pages the process can own • When limit is reached, a page must be released for every page that’s brought in (“working set replacement”) • Default upper limit on size for each process • System-wide maximum calculated & stored in MmMaximumWorkingSetSize • Approximately RAM minus 512 pages (2 MB on x86) minus min size of system working set (1.5 MB on x86) • Interesting to view (gives you an idea how much memory you’ve “lost” to the OS) • True upper limit: 2 GB minus 64 MB for 32-bit Windows

  46. System Working Set • Just as processes have working sets, Windows’ pageable system-space code and data lives in the “system working set” • Made up of 4 components: • Paged pool • Pageable code and data in the exec • Pageable code and data in kernel-mode drivers, Win32K.Sys, graphics drivers, etc. • Global file system data cache

  47. Working Set List • A process always starts with an empty working set • It then incurs page faults when referencing a page that isn’t in its working set • Many page faults may be resolved from memory (to be described later) newer pages older pages PerfMon Process “WorkingSet”

  48. Working Set Replacement • When working set max reached (or working set trim occurs), must give up pages to make room for new pages • Local page replacement policy (most Unix systems implement global replacement) • Means that a single process cannot take over all of physical memory unless other processes aren’t using it • Page replacement algorithm is least recently accessed (pages are aged) PerfMon Process “WorkingSet” to standby or modified page list

  49. Paging Dynamics • New pages are allocated to working sets from the top of the free or zero page list • Pages released from the working set due to working set replacement go to the bottom of: • The modified page list (if they were modified while in the working set) • The standby page list (if not modified) • Decision made based on “D” (dirty = modified) bit in page table entry • Association between the process and the physical page is still maintained while the page is on either of these lists

  50. Standby and Modified Page Lists • Modified pages go to modified (dirty) list • Avoids writing pages back to disk too soon • Unmodified pages go to standby (clean) list • They form a system-wide cache of “pages likely to be needed again” • Pages can be faulted back into a process from the standby and modified page list • These are counted as page faults, but not page reads

More Related