1 / 28

Today

Today. Lab 5 out Puts a *lot* together: pointers, debugging, C, etc. 1) Processor sends virtual address to MMU ( memory management unit ) 2-3) MMU fetches PTE from page table in cache/memory 4) MMU sends physical address to cache/memory 5) Cache/memory sends data word to processor.

arien
Download Presentation

Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today • Lab 5 out • Puts a *lot* together: pointers, debugging, C, etc.

  2. 1) Processor sends virtual address to MMU (memory management unit) 2-3) MMU fetches PTE from page table in cache/memory 4) MMU sends physical address to cache/memory 5) Cache/memory sends data word to processor Address Translation: Page Hit 2 Cache/ Memory CPU Chip PTEA MMU 1 PTE VA CPU 3 PA 4 Data 5

  3. 1) Processor sends virtual address to MMU 2-3) MMU fetches PTE from page table in cache/memory 4) Valid bit is zero, so MMU triggers page fault exception 5) Handler identifies victim (and, if dirty, pages it out to disk) 6) Handler pages in new page and updates PTE in memory 7) Handler returns to original process, restarting faulting instruction Address Translation: Page Fault Exception Page fault handler 4 2 Cache/ Memory Disk CPU Chip Victim page PTEA MMU 1 5 CPU VA PTE 3 7 New page 6

  4. Hmm… Translation Sounds Slow! • The MMU accesses memory twice: once to first get the PTE for translation, and then again for the actual memory request from the CPU • The PTEs may be cached in L1 like any other memory word • But they may be evicted by other data references • And a hit in the L1 cache still requires 1-3 cycles • What can we do to make this faster?

  5. Speeding up Translation with a TLB • Solution: add another cache! • Translation Lookaside Buffer (TLB): • Small hardware cache in MMU • Maps virtual page numbers to physical page numbers • Contains complete page table entries for small number of pages • Modern Intel processors: 128 or 256 entries in TLB

  6. TLB Hit CPU Chip TLB PTE 2 3 VPN Cache/ Memory MMU 1 PA VA CPU 4 Data 5 A TLB hit eliminates a memory access

  7. TLB Miss CPU Chip TLB 4 PTE 2 VPN Cache/ Memory MMU 1 3 VA CPU PTEA PA 5 Data 6 A TLB miss incurs an additional memory access (the PTE)Fortunately, TLB misses are rare

  8. Virtual Memory Summary • Programmer’s view of virtual memory • Each process has its own private linear address space • Cannot be corrupted by other processes, why? • System view of virtual memory • Uses memory efficiently by caching virtual memory pages • Efficient only because of locality • Simplifies memory management and sharing • Simplifies protection by providing a convenient interpositioning point to check permissions

  9. Data & addressing Integers & floats Machine code & C x86 assembly programming Procedures & stacks Arrays & structs Memory & caches Processes Virtual memory Memory allocation Java vs. C Roadmap C: Java: Car c = new Car(); c.setMiles(100); c.setGals(17); float mpg = c.getMPG(); car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: get_mpg: pushq %rbp movq %rsp, %rbp ... popq %rbp ret OS: Machine code: 0111010000011000 100011010000010000000010 1000100111000010 110000011111101000011111 Computer system:

  10. Memory Allocation Topics • Dynamic memory allocation • Size/number of data structures may only be known at run time • Need to allocate space on the heap • Need to de-allocate (free) unused memory so it can be re-allocated • Implementation • Implicit free lists • Explicit free lists – subject of next programming assignment • Segregated free lists • Garbage collection • Common memory-related bugs in C programs

  11. Dynamic Memory Allocation • Programmers use dynamic memory allocators(such as malloc) to acquire VM at run time. • For data structures whose size is only known at runtime. • Dynamic memory allocators manage an area of process virtual memory known as the heap. Application Dynamic Memory Allocator Heap User stack Top of heap (brkptr) Heap (via malloc) Uninitialized data (.bss) Initialized data (.data) Program text (.text) 0

  12. Dynamic Memory Allocation • Allocator maintains heap as collection of variable sized blocks, which are either allocatedor free • Allocator requests space in heap region; VM hardware and kernel allocate these pages to the process • Application objects are typically smaller than pages, so the allocator manages blocks within pages • Types of allocators • Explicit allocator:application allocates and frees space • E.g. malloc and free in C • Implicit allocator: application allocates, but does not free space • E.g. garbage collection in Java, ML, and Lisp

  13. The mallocPackage #include <stdlib.h> void *malloc(size_t size) • Successful: • Returns a pointer to a memory block of at least sizebytes(typically) aligned to 8-byte boundary • If size == 0, returns NULL • Unsuccessful: returns NULL and sets errno void free(void *p) • Returns the block pointed at by p to pool of available memory • p must come from a previous call to mallocor realloc Other functions • calloc: Version of malloc that initializes allocated block to zero. • realloc: Changes the size of a previously allocated block. • sbrk: Used internally by allocators to grow or shrink the heap.

  14. Malloc Example void foo(int n, int m) { inti, *p; /* allocate a block of n ints */ p = (int *)malloc(n * sizeof(int)); if (p == NULL) { perror("malloc"); exit(0); } for (i=0; i<n; i++) p[i] = i; /* add space for m ints to end of p block */ if ((p = (int*)realloc(p, (n+m) * sizeof(int))) == NULL) { perror("realloc"); exit(0); } for (i=n; i < n+m; i++) p[i] = i; /* print new array */ for (i=0; i<n+m; i++) printf("%d\n", p[i]); free(p); /* return p to available memory pool */ }

  15. Assumptions Made in This Lecture • Memory is word addressed (each word can hold a pointer) • block size is a multiple of words Allocated block (4 words) Free block (3 words) Free word Allocated word

  16. Allocation Example p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)

  17. Allocation Example p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)

  18. Allocation Example p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)

  19. Allocation Example p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)

  20. Allocation Example p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)

  21. How are going to implement that?!? • What information does the allocator need to keep track of?

  22. Constraints • Applications • Can issue arbitrary sequence of malloc() and free() requests • free() requests must be made only for a previously malloc()’d block • Allocators • Can’t control number or size of allocated blocks • Must respond immediately to malloc() requests • i.e., can’t reorder or buffer requests • Must allocate blocks from free memory • i.e., blocks can’t overlap, why not? • Must align blocks so they satisfy all alignment requirements • 8 byte alignment for GNU malloc (libcmalloc) on Linux boxes • Can’t move the allocated blocks once they are malloc()’d • i.e., compaction is not allowed. Why not?

  23. Performance Goal: Throughput • Given some sequence of malloc and free requests: • R0, R1, ..., Rk, ... , Rn-1 • Goals: maximize throughput and peak memory utilization • These goals are often conflicting • Throughput: • Number of completed requests per unit time • Example: • 5,000 malloc() calls and 5,000 free()calls in 10 seconds • Throughput is 1,000 operations/second

  24. Performance Goal: Peak Memory Utilization • Given some sequence of malloc and free requests: • R0, R1, ..., Rk, ... , Rn-1 • Def: Aggregate payload Pk • malloc(p) results in a block with a payload of p bytes • After request Rkhas completed, the aggregate payload Pkis the sum of currently allocated payloads • Def: Current heap size = Hk • Assume Hk is monotonically nondecreasing • Allocator can increase size of heap using sbrk() • Def: Peak memory utilization after k requests • Uk = ( maxi<k Pi ) / Hk • Goal: maximize utilization for a sequence of requests. • Why is this hard? And what happens to throughput?

  25. Fragmentation • Poor memory utilization is caused by fragmentation • internal fragmentation • external fragmentation

  26. Internal Fragmentation • For a given block, internal fragmentation occurs if payload is smaller than block size • Caused by • overhead of maintaining heap data structures (inside block, outside payload) • padding for alignment purposes • explicit policy decisions (e.g., to return a big block to satisfy a small request) why would anyone do that? block payload Internal fragmentation Internal fragmentation

  27. External Fragmentation • Occurs when there is enough aggregate heap memory, but no single free block is large enough • Depends on the pattern of future requests • Thus, difficult to measure p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) Oops! (what would happen now?) p4 = malloc(6)

More Related