1 / 36

The Effect of Multi-core on HPC Applications in Virtualized Systems

The Effect of Multi-core on HPC Applications in Virtualized Systems. Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim ¹, Youngjin Kwon¹, Young- ri Choi², and Jaehyuk Huh¹ ¹ KAIST (Korea Advanced Institute of Science and Technology) ² KISTI (Korea Institute of Science and Technology Information).

arlo
Download Presentation

The Effect of Multi-core on HPC Applications in Virtualized Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin Kwon¹, Young-ri Choi², and Jaehyuk Huh¹¹ KAIST(Korea Advanced Institute of Science and Technology)² KISTI(Korea Institute of Science and Technology Information)

  2. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  3. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  4. Benefits of Virtualization VM VM VM Virtual Machine Monitor Hardware • Improve system utilization by consolidation

  5. Benefits of Virtualization VM VM VM Windows Linux Solaris Virtual Machine Monitor Hardware • Improve system utilization by consolidation • Support for multiple types of OSes on a system

  6. Benefits of Virtualization VM VM VM Windows Linux Solaris Virtual Machine Monitor Hardware • Improve system utilization by consolidation • Support for multiple types of OSes on a system • Fault isolation

  7. Benefits of Virtualization VM VM VM Windows Linux Solaris Virtual Machine Monitor Virtual Machine Monitor Hardware Hardware • Improve system utilization by consolidation • Support for multiple types of OSes on a system • Fault isolation • Flexible resource management

  8. Benefits of Virtualization • Improve system utilization by consolidation • Support for multiple types of OSes on a system • Fault isolation • Flexible resource management VM VM VM Windows Linux Solaris Virtual Machine Monitor Virtual Machine Monitor Hardware Hardware

  9. Benefits of Virtualization Cloud • Improve system utilization by consolidation • Support for multiple types of OSes on a system • Fault isolation • Flexible resource management • Cloud computing VM VM VM Windows Linux Solaris Virtual Machine Monitor Hardware

  10. Virtualization for HPC • Benefits of virtualization • Improve system utilization by consolidation • Support for multiple types of OSes on a system • Fault isolation • Flexible resource management • Cloud computing • HPC is performance-sensitive • Virtualization can help HPC workloads  resource-sensitive

  11. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  12. Virtualization on Multi-core • More VMs on a physical machine • More complex memory hierarchy (NUCA, NUMA) VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM core core core core core core core core Shared cache Shared cache Memory Memory

  13. Challenges • VM management cost • Semantic gaps • vCPU scheduling, NUMA VM VM VM VM OS core core core core VM VM VM VM Memory Virtual Machine Monitor Virtual Machine Monitor M e m M e m Scheduling, Memory, Communication, I/O multiplexing… core core core core core core core core $ $

  14. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  15. Virtualization for HPC on Multi-core • Virtualization may help HPC • Virtualization on multi-core may have some overheads • For servers, improving system utilization is a key factor • For HPC, performance is a key factor. How much overheads are there? Where do they come from?

  16. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  17. P P P P P P P P P P P P P P P P P P P P L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 Machines • Dual Socket System • 2x 4-core Intel processor • Non-uniform memory access latency • Two 8MB L3 caches shared by 4 cores • Single Socket System • 12-cores AMD processor • Uniform memory access latency • Two 6MB L3 caches shared by 6 cores L3 L3 L3 L3 Memory Single socket: 12-core CPU Dual socket: 2x 4-core CPUs

  18. Workloads • PARSEC • Shared memory model • Input: native • On one machine • Single and Dual socket • Fix: One VM • Vary: 1, 4, 8 vCPUs • NAS Parallel Benchmark • MPI model • Input: class C • On two machines (dual socket) • 1Gb Ethernet switch • Fix: 16 vCPUs • Vary: 2 ~ 16 VMs OS core core core core Memory Virtual Machine Monitor VM VM VM VM VM VM VM VM M e m M e m VM VM core VM VM core VM VM core core VM VM core core core core Virtual Machine Monitor Virtual Machine Monitor $ $ Hardware Hardware Semantic gaps VM management cost

  19. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  20. PARSEC – Single Socket • Single socket • No NUMA effect • Very low virtualization overheads 2~4 % • Execution times normalized to native runs

  21. PARSEC – Single Socket • Single socket + pinvCPU to each pCPU • Reduce semantic gaps by prevent vCPU migration • vCPU migration has negligible effect Similar to unpinned • Execution times normalized to native runs

  22. PARSEC – Dual Socket • Dual socket, unpinnedvCPUs • NUMA effect  semantic gap • Significant increase of overheads 16~37 % • Execution times normalized to native runs

  23. PARSEC – Dual Socket • Dual socket, pinnedvCPUs • May reduce NUMA effect also • Reduced overheads with 1 and 4 vCPUs • Execution times normalized to native runs

  24. XEN and NUMA machine • Memory allocation policy • Allocate up to 4GB chunk on one socket • Scheduling policy • Pinning to allocated socket • Nothing more • Pinning 1 ~ 4 vCPUs on the socket mem. allocated is possible • Impossible with 8 vCPUs M e m M e m core core core core core core core core VM 2 VM 3 $ $ VM 0 VM 1

  25. Mitigating NUMA Effects • Range pinning • Pin vCPUs of a VM on a socket • Work only if # of vCPUs < # of cores on a socket • Range-pinned (best): memory of VM in the same socket • Range-pinned (worst): memory of VM in the other socket • NUMA-first scheduler • If there is an idle core in the socket memory allocated, pick it • If not, anyway, pick a core in the machine • All vCPUs are not active all the time (sync. or I/O)

  26. Range Pinning • For 4 vCPUs case • Range-pinned(best) ≈ Pinned • Execution times normalized to native runs

  27. NUMA-first Scheduler • For 8 vCPUs case • Significant improvement by NUMA-first scheduler • Execution times normalized to native runs

  28. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  29. VM Granularity for MPI model • Fine-grained VMs • Few processes in a VM • Small VM: vCPUs, memory • Fault isolation among processes in different VMs • Many VMs on a machine • MPI communications mostly through the VMM • Coarse-grained VMs • Many processes in a VM • Large VM: vCPUs, memory • Single failure point for processes in a VM • Few VMs on a machine • MPI communications mostly within a VM VMM VMM VMM VMM Hardware Hardware Hardware Hardware

  30. NPB - VM Granularity • Work to do are same for all granularity • 2 VMs: each VM has 8 vCPUs, 8 MPI processes • 16 VMs: each VM has 1 vCPU, 1 MPI processes 11~54 % • Execution times normalized to native runs

  31. NPB - VM Granularity • Fine-grained VMs  significant overheads (avg. 54%) • MPI communications mostly through VMM • Worst in CG with high communication ratio • Small memory per VM • VM management costs of VMM • Coarse-grained VMs  much less overheads (avg. 11%) • Still dual socket, but less overheads than shared memory model  the bottle neck is moved to communication • MPI communication largely within VM • Large memory per VM

  32. Outline • Virtualization for HPC • Virtualization on Multi-core • Virtualization for HPC on Multi-core • Methodology • PARSEC – shared memory model • NPB – MPI model • Conclusion

  33. Conclusion • Questions on virtualization for HPC on multi-core system • How much overheads are there? • Where do they come from? • For shared memory model • Without NUMA  little overheads • With NUMA  large overheads from semantic gaps • For MPI model • Less NUMA effect  communication is important • Fine-grained VMs have large overheads • Communication mostly through VMM • Small memory / VM management cost • Future Works • NUMA-aware VMM scheduler • Optimize communication among VMs in a machine

  34. Thank you!

  35. Backup slides

  36. PARSEC CPU Usage • Environments: native linux, turn on only 8 cores (use 8 threads mode) • Get CPU usage every seconds, then average them • For all workloads, less than 800% (fully parallel)  NUMA-first can work

More Related