the effect of multi core on hpc applications in virtualized systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Effect of Multi-core on HPC Applications in Virtualized Systems PowerPoint Presentation
Download Presentation
The Effect of Multi-core on HPC Applications in Virtualized Systems

Loading in 2 Seconds...

play fullscreen
1 / 36

The Effect of Multi-core on HPC Applications in Virtualized Systems - PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on

The Effect of Multi-core on HPC Applications in Virtualized Systems. Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim ¹, Youngjin Kwon¹, Young- ri Choi², and Jaehyuk Huh¹ ¹ KAIST (Korea Advanced Institute of Science and Technology) ² KISTI (Korea Institute of Science and Technology Information).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Effect of Multi-core on HPC Applications in Virtualized Systems' - arlo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the effect of multi core on hpc applications in virtualized systems

The Effect of Multi-core on HPC Applications in Virtualized Systems

Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin Kwon¹, Young-ri Choi², and Jaehyuk Huh¹¹ KAIST(Korea Advanced Institute of Science and Technology)² KISTI(Korea Institute of Science and Technology Information)

outline
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
outline1
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
benefits of virtualization
Benefits of Virtualization

VM

VM

VM

Virtual Machine Monitor

Hardware

  • Improve system utilization by consolidation
benefits of virtualization1
Benefits of Virtualization

VM

VM

VM

Windows

Linux

Solaris

Virtual Machine Monitor

Hardware

  • Improve system utilization by consolidation
  • Support for multiple types of OSes on a system
benefits of virtualization2
Benefits of Virtualization

VM

VM

VM

Windows

Linux

Solaris

Virtual Machine Monitor

Hardware

  • Improve system utilization by consolidation
  • Support for multiple types of OSes on a system
  • Fault isolation
benefits of virtualization3
Benefits of Virtualization

VM

VM

VM

Windows

Linux

Solaris

Virtual Machine Monitor

Virtual Machine Monitor

Hardware

Hardware

  • Improve system utilization by consolidation
  • Support for multiple types of OSes on a system
  • Fault isolation
  • Flexible resource management
benefits of virtualization4
Benefits of Virtualization
  • Improve system utilization by consolidation
  • Support for multiple types of OSes on a system
  • Fault isolation
  • Flexible resource management

VM

VM

VM

Windows

Linux

Solaris

Virtual Machine Monitor

Virtual Machine Monitor

Hardware

Hardware

benefits of virtualization5
Benefits of Virtualization

Cloud

  • Improve system utilization by consolidation
  • Support for multiple types of OSes on a system
  • Fault isolation
  • Flexible resource management
  • Cloud computing

VM

VM

VM

Windows

Linux

Solaris

Virtual Machine Monitor

Hardware

virtualization for hpc
Virtualization for HPC
  • Benefits of virtualization
    • Improve system utilization by consolidation
    • Support for multiple types of OSes on a system
    • Fault isolation
    • Flexible resource management
    • Cloud computing
  • HPC is performance-sensitive
  • Virtualization can help HPC workloads

 resource-sensitive

outline2
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
virtualization on multi core
Virtualization on Multi-core
  • More VMs on a physical machine
  • More complex memory hierarchy (NUCA, NUMA)

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

core

core

core

core

core

core

core

core

Shared cache

Shared cache

Memory

Memory

challenges
Challenges
  • VM management cost
  • Semantic gaps
    • vCPU scheduling, NUMA

VM

VM

VM

VM

OS

core

core

core

core

VM

VM

VM

VM

Memory

Virtual Machine Monitor

Virtual Machine Monitor

M

e

m

M

e

m

Scheduling, Memory, Communication,

I/O multiplexing…

core

core

core

core

core

core

core

core

$

$

outline3
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
virtualization for hpc on multi core
Virtualization for HPC on Multi-core
  • Virtualization may help HPC
  • Virtualization on multi-core may have some overheads
  • For servers, improving system utilization is a key factor
  • For HPC, performance is a key factor.

How much overheads are there?

Where do they come from?

outline4
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
machines

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

Machines
  • Dual Socket System
    • 2x 4-core Intel processor
    • Non-uniform memory access latency
    • Two 8MB L3 caches shared by 4 cores
  • Single Socket System
    • 12-cores AMD processor
    • Uniform memory access latency
    • Two 6MB L3 caches shared by 6 cores

L3

L3

L3

L3

Memory

Single socket: 12-core CPU

Dual socket: 2x 4-core CPUs

workloads
Workloads
  • PARSEC
    • Shared memory model
    • Input: native
    • On one machine
      • Single and Dual socket
    • Fix: One VM
    • Vary: 1, 4, 8 vCPUs
  • NAS Parallel Benchmark
    • MPI model
    • Input: class C
    • On two machines (dual socket)
      • 1Gb Ethernet switch
    • Fix: 16 vCPUs
    • Vary: 2 ~ 16 VMs

OS

core

core

core

core

Memory

Virtual Machine Monitor

VM

VM

VM

VM

VM

VM

VM

VM

M

e

m

M

e

m

VM

VM

core

VM

VM

core

VM

VM

core

core

VM

VM

core

core

core

core

Virtual Machine Monitor

Virtual Machine Monitor

$

$

Hardware

Hardware

Semantic gaps

VM management cost

outline5
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
parsec single socket
PARSEC – Single Socket
  • Single socket
  • No NUMA effect
  • Very low virtualization overheads

2~4 %

  • Execution times normalized to native runs
parsec single socket1
PARSEC – Single Socket
  • Single socket + pinvCPU to each pCPU
  • Reduce semantic gaps by prevent vCPU migration
  • vCPU migration has negligible effect

Similar to unpinned

  • Execution times normalized to native runs
parsec dual socket
PARSEC – Dual Socket
  • Dual socket, unpinnedvCPUs
  • NUMA effect  semantic gap
  • Significant increase of overheads

16~37 %

  • Execution times normalized to native runs
parsec dual socket1
PARSEC – Dual Socket
  • Dual socket, pinnedvCPUs
  • May reduce NUMA effect also
  • Reduced overheads with 1 and 4 vCPUs
  • Execution times normalized to native runs
xen and numa machine
XEN and NUMA machine
  • Memory allocation policy
    • Allocate up to 4GB chunk on one socket
  • Scheduling policy
    • Pinning to allocated socket
    • Nothing more
  • Pinning 1 ~ 4 vCPUs on the socket mem. allocated is possible
  • Impossible with 8 vCPUs

M

e

m

M

e

m

core

core

core

core

core

core

core

core

VM

2

VM

3

$

$

VM

0

VM

1

mitigating numa effects
Mitigating NUMA Effects
  • Range pinning
    • Pin vCPUs of a VM on a socket
    • Work only if # of vCPUs < # of cores on a socket
    • Range-pinned (best): memory of VM in the same socket
    • Range-pinned (worst): memory of VM in the other socket
  • NUMA-first scheduler
    • If there is an idle core in the socket memory allocated, pick it
    • If not, anyway, pick a core in the machine
    • All vCPUs are not active all the time (sync. or I/O)
range pinning
Range Pinning
  • For 4 vCPUs case
  • Range-pinned(best) ≈ Pinned
  • Execution times normalized to native runs
numa first scheduler
NUMA-first Scheduler
  • For 8 vCPUs case
  • Significant improvement by NUMA-first scheduler
  • Execution times normalized to native runs
outline6
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
vm granularity for mpi model
VM Granularity for MPI model
  • Fine-grained VMs
    • Few processes in a VM
    • Small VM: vCPUs, memory
    • Fault isolation among processes in different VMs
    • Many VMs on a machine
    • MPI communications mostly through the VMM
  • Coarse-grained VMs
    • Many processes in a VM
    • Large VM: vCPUs, memory
    • Single failure point for processes in a VM
    • Few VMs on a machine
    • MPI communications mostly within a VM

VMM

VMM

VMM

VMM

Hardware

Hardware

Hardware

Hardware

npb vm granularity
NPB - VM Granularity
  • Work to do are same for all granularity
  • 2 VMs: each VM has 8 vCPUs, 8 MPI processes
  • 16 VMs: each VM has 1 vCPU, 1 MPI processes

11~54 %

  • Execution times normalized to native runs
npb vm granularity1
NPB - VM Granularity
  • Fine-grained VMs  significant overheads (avg. 54%)
    • MPI communications mostly through VMM
      • Worst in CG with high communication ratio
    • Small memory per VM
    • VM management costs of VMM
  • Coarse-grained VMs  much less overheads (avg. 11%)
    • Still dual socket, but less overheads than shared memory model  the bottle neck is moved to communication
    • MPI communication largely within VM
    • Large memory per VM
outline7
Outline
  • Virtualization for HPC
  • Virtualization on Multi-core
  • Virtualization for HPC on Multi-core
  • Methodology
  • PARSEC – shared memory model
  • NPB – MPI model
  • Conclusion
conclusion
Conclusion
  • Questions on virtualization for HPC on multi-core system
    • How much overheads are there?
    • Where do they come from?
  • For shared memory model
    • Without NUMA  little overheads
    • With NUMA  large overheads from semantic gaps
  • For MPI model
    • Less NUMA effect  communication is important
    • Fine-grained VMs have large overheads
      • Communication mostly through VMM
      • Small memory / VM management cost
  • Future Works
    • NUMA-aware VMM scheduler
    • Optimize communication among VMs in a machine
parsec cpu usage
PARSEC CPU Usage
  • Environments: native linux, turn on only 8 cores (use 8 threads mode)
  • Get CPU usage every seconds, then average them
  • For all workloads, less than 800% (fully parallel)  NUMA-first can work