Memory Operation and Performance Caches in Virtual Memory

Lecture 10 – Memory Operation and Performance Caches – repeat some concepts Virtual Memory (VM)

Example of a matrix int data[M][N]; for (i = 0 ; i < N; i++) { for (j = 0; j < M; j++) { sum += data[i[j]; } } This is a MxN matrix N M

Row-major and Column-major – note the sequence Row major – sequence of access data Column major

Accessing a column-major You can see, it will jump to a few bytes after next value

Accessing row data – is faster • It will be faster. It is because once it accesses [0,0], it will load [0,1], [0,2] …up to [1, 3] into the cache line. Row major is faster than column major

Changing the order of the iterations is not always better. Below is an example. int original[M][N]; int transposed[N][M]; for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { transposed[i][j] = original[j][i]; } } row major column major

Effect of rotating shape Rotate by 90 degree

Insufficient Temporal Locality // the solution is to add a square cache memory int original[M][N]; int transposed[N][M]; for (k = 0; k < M / m; k++) { for (l = 0; l < N / n; k++) { for (i = k*m; i < (k+1)*m; i++) { for (j = l*n; j < (l+1)*n; j++) { transposed[i][j] = original[j][i]; } } } } column-major and row-major

Blocked transpose gets around cache misses • m and n must be a square and is determined by the cache line size, say 32 bytes.

Virtual memory – Glossary • thrashing (n.) a phenomenon of virtual memory systems that occurs when the program, by the manner in which it is referencing its data and instructions, regularly causes the next memory locations referenced to be overwritten by recent or current instructions. The result is that the performance is slow. • thread (n.) a lightweight or small granularity process. • tiling (n.) A regular division of a mesh into patches, or tiles. Tiling is the most common way to do geometric decomposition.

Virtual Memory • virtual memory (n.) A system that stores portions of an address space that are not being actively used. • When a reference is made to a value not presently in main memory, the virtual memory manager must swap some values in main memory for the values required. • Virtual memory is used by almost all uniprocessors and multiprocessors, but not array processors and multicomputers. • Muticomputers still employ real memory storage only on each node.

Virtual Memory (VM) • The term virtual memory refers to a combination of hardware and operating system software that solves several computing problems. • It receives a single name because it is a single mechanism, but it meets several goals: • To simplify memory management and program loading by providing virtual addresses. • To allow multiple large programs to be run without the need for large amounts of RAM, by providing virtual storage.

Virtual Addresses • Segmentation – group pages together with different size • Memory Protection – due to the support of more than ONE process, to protect the memory being corrupted by others • Paging – use the same size in disk and memory and load it into memory or from memory to dis. But computers hold several programs in memory at the same time.

Page and Segmentation Page 16K Page 16K Page 16K Page 16K segmentation 1 Page 16K Page 16K Page 16K segmentation 2

Memory Protection If there are more than two processes (programs in the memory), there is a need to protect the programs not to modified by others. Program 1 X memory Program 2

contradictory about VM facts: • The compiler determines the address at which a program will execute, by hard-wiring a lot of addresses of variables and instructions into the machine code it generates. • The location of the program is not determined until the program is executed and may be anywhere in main memory. Program 1 Program 2 memory

Solution to contradictory facts • Code Relocation: Have the compiler generate addresses relative to a base address, and change the base address when the program is executed. This means that the address of each reference is calculated explicitly by adding the relative address to the base address. This is the Drawback.: • Address Translation: At run time, provide programs the illusion that there are no other programs in memory. Compilers can then generate any absolute address they wish. Two programs may contain references to the same address without interference.

Virtual and Physical Addresses • The addresses issued by the compiler are called virtual addresses. • The addresses that result from the translation are called physical addresses, because they refer to an actual memory chip.

Multiple programs without relocation

Relocatable code can share memory B+X, not X

Segment A segment is a region of the address space of varying length. In the next figure, there are two segments, one used to store program A and the other, program B. Each segment can be mapped to a region of physical memory independently, as shown, but the whole segment has to be translated as one contiguous (continuous) chunk. continuous

Segment address translation memory disk

Memory Protection Memory • It is to protect the memory from modifying by others. • This is important not only to prevent malicious attacks or eavesdropping but also to contain unintended catastrophic errors. • If a computer has ever frozen or crashed on you, you have probably experienced a bug in one program careening out of control and trampling over the memory of other programs as well as that of the operating system. Address translation is the foremost tool in preventing such behavior. There are 6 prorgams (colors)

Paging • the allocation of memory into chunks of varying size causes external fragmentation. • To solve this problem we can change the nature of the address translation so that, instead of mapping virtual to physical address in big chunks of varying size, it maps them in small chunks of constant size, Memory Disk

An example of Paging

Page fault • The page needed is not in the memory. • The operating system will load it from the disk (virtual memory) • It takes time to load from disk • The performance is down • The performance is measured in terms of number of page faults. • A program having a page fault of 10 is better than a program with 20 page faults. memory 1 2 4 3 disk

Page fault – not in the main memory, has to load for disk

Working Sets • The working set of a program is the set of memory pages that the program is currently using actively. • The principle of locality suggests that the working set of a program will be, at any given time, much smaller than the memory used by the program over its lifetime. • The working set will change as the program executes. • It will change both in the exact pages that are members of it and in the number of pages. • The working set of a program will expand and contract as the program's locality becomes more or less constrained. • It is the size of the working set that is important in choosing a victim program. memory 1 2 in this example, keep 2 in memory

Thrashing • When the working set is smaller, it causes the operating system to re-load to the same memory locations. The performance is affected by this as it will create many collisions. • The computer will be doing a lot of work moving pages back and forth between memory and disk, but no useful work will get done. This situation is often referred to as thrashing. CPU is busy but is not productive, as it loads data without executing

Thrasing – here, the program has insufficient memory to execute and load it from memory It performs swaps in and out

Relationship between working set and page fault keep a large number in memory Better to keep a small number

Impact of VM on Performance • int data[M][N]; • for (i = 0 ; i < N; i++){ • for (j = 0; j < M; j++){ • sum += data[j][i]; } } //column major – more page fault

Impact of VM on Performance • int data[M][N]; • for (j = 0 ; j < N; j++){ • for (i = 0; i < M; i++){ • sum += data[j][i]; } } //row major – less page fault

summary • Make use of cache size – it means to load up to 32 or 64 bytes to the cache • Understand the row major against column major to gain performance • Try to reduce the page fault (page fault means that the page is not in main memory, the CPU has to load from disk.)

Operating System Interaction • Dynamic Linking • Time-Sharing • Threads

Dynamic Linking • Libraries • Dynamic-Link Libraries (DLLs) • Example of DLL

Libraries • Almost all programs are composed from many separately compiled units. When you write a single-file program, it is compiled to a representation of machine instructions called an object file. • For example, Visual C++ creates an .obj file from your C++ source code. The .obj file may seem to be a complete program, but there is much more code required to make it complete. Your code library

Reason of using library Don’t memorise • 1 many functions, such as memory allocation, do not require special privileges to perform, and they do not take much CPU time. If these functions were invoked using a time-consuming system call, it would have a dramatic impact on performance. It is much faster to implement them as simple functions. • 2 these functions are language specific. OSs are language independent, and it would greatly complicate the OS to provide run-time support for all languages, even if that were possible. • 3 even when system calls are required, some additional "glue" code is needed to translate between the standard language interface, such as printf() or operator <<, and the calling convention that is needed to set up parameters and invoke a trap instruction.

Library in Visual C++

Example of linking

Explanation - static • In the above diagram, the application object has to link with malloc and callinig main() to form an executable (exe) file. Your code Run time library compilation

Dynamic-Link Libraries (DLLs) • dynamic linking means where linking is performed on demand at runtime. • An advantage of dynamic linking is that executable files can be much smaller than statically linked executables. • Of course, the executable is not complete without all of the associated library files, but if many executables share a set of libraries, there can be a significant, overall savings. Run time Your code library compilation

Advantage of DLL (1) Don’t memorise • In most systems, the space savings extend to memory. • When libraries are dynamically linked, the operating system can arrange to let applications share the library code so that only one copy of the library is loaded into memory. • With static linking, each executable is a monolithic binary program. If several programs are using the same libraries, there will be several copies of the code in memory. Run time Your code library library library compilation

Advantage of DLL(2) • Another potential memory savings comes from the fact that dynamically linked libraries do not necessarily need to be loaded. For example, an image editor may support input and output of dozens of file formats. It could be expensive (and unnecessary) to link conversion routines for all of these formats. • With dynamic linking, the program can link code as it becomes useful, saving time and memory. This can be especially useful in programs with ever growing lists of features.

Disadvantage of DLL • First, there are version problems. Like all software, libraries tend to evolve. New libraries may be incompatible with old libraries in subtle ways. If you update the libraries used by a program, it may have good, bad, or no effects on the program's behavior. In contrast, a statically linked program will never change its behavior unless the entire program is relinked and installed.

Summary • Dynamic link is to combine the library during run time • It reduces program size, but causes version problem.

Memory Operation and Performance Caches in Virtual Memory