1 / 46

Performance Analysis of HPC with Lmbench

Performance Analysis of HPC with Lmbench. Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www. Simple, portable benchmarks Compares different Unix systems performance Measures latency and bandwidth

mireya
Download Presentation

Performance Analysis of HPC with Lmbench

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22nd 2005 netfiles.uiuc.edu/dunat2/www

  2. Simple, portable benchmarks Compares different Unix systems performance Measures latency and bandwidth Only analyzes performance of processor, memory, network, file system and disk Free software Lmbench: Micro-Benchmark Suite

  3. Compiler & optimization issues • The GNU C compiler is used for all the resources but copper • IBM xlc compiler was used on copper. • All of the benchmarks were compiled with optimization -O except the benchmarks that calculate clock speed and the context switch times

  4. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  5. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  6. Inter Process Communication Bandwidth MB/sec • Transfers 64 MB of data in 64 KB chunks through • Unix Pipe • Unix sockets • TCP/IP sockets

  7. Inter Process Communication Bandwidth Co MB/sec • Transfers 64 MB of data in 64 KB chunks through • Unix Pipe • Unix sockets • TCP/IP sockets W

  8. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • File and VM system • Inter process communication • Memory read latencies

  9. A reread benchmark, intended to be used on a file that is in memory File reread : copies data from the kernel’s file system page into the processor’s buffer Mmap reread : maps the entire file (8 MB) into process’s address space Cached file read

  10. Bandwidth Pipe/TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • File and VM system • Inter process communication • Memory read latencies

  11. Measures how fast the system can bcopy data Bcopy copies n bytes from string source to string destination An 8 MB to 8 MB copy, does not fit in the cache Kernel bcopy and C library bcopy C library bcopy shown in the next slide Memory copy

  12. Bandwidth Pipe/TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • File and VM system • Inter process communication • Memory read latencies

  13. Read Measures the time to read data into the processor An unrolled loop that sums up a series of integers Write Measures the time to write data to memory An unrolled loop that stores a value into an integer Memory read/write

  14. 1 2 3

  15. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  16. Operating System Entry/ Signal Handling / Process Creation Costs • Process-related latencies • System Call null call, null I/O, stat, open/close • Signal Handling signal installation, signal handling • Process Creation fork + exit, fork + execve, fork + /bin/sh -c

  17. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  18. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  19. Context Switching • The time to save the state of one process and restore the state of another process • The processes are connected in a ring of Unix pipes • A token is passed from process to process • The process allocates an array and sums the array • Context-switch time doesn't include the overhead of doing the work. • Two parameters: number and size of processes

  20. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  21. Interprocess Communication Latencies • Passing a small message back and forth between two processes • The time reported is one round trip • Message size: a byte or a word • Metrics: Pipe, Unix Socket, UDP and TCP , RPC/UDP-TCP, TCP connection latency

  22. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  23. File & VM System • File create/ delete creates a number of small files in the current working directory and then removes the files • Mmap latency : costs of mmapping and unmmapping varying file sizes • Prot fault : the time to catch a protection fault • Page fault : the cost of page faulting pages from a file • 100 fd selct : the time to do a select on n file descriptors

  24. Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Metrics in the Benchmark Latency • System call • Signal handling • Process creation • Basic CPU operations • Context switching • Inter process communication • File and VM system • Memory read latencies

  25. Memory Latencies • Measures memory read latency for varying memory sizes and strides • The size of the array starts from 512 bytes • The stride varies from 16 to 1024 • Does not include the instruction execution time

  26. the best has problems Conclusion

  27. THANK YOU ! Have a nice weekend !

  28. References • “Lmbench – Tools for Performance Analysis” http://www.bitmover.com/lmbench/ • Larry McVoy and Carl Staelin, “Lmbench: Portable tools for performance analysis” http://www.usenix.org/publications/library/proceedings/ sd96/full_papers/mcvoy.pdf • Carl Staelin, “Lmbench:an extensible micro-benchmark suite” http://www.hpl.hp.com/techreports/2004/HPL-2004-213.html

More Related