Multi stack system software
1 / 25

Multi-stack System Software - PowerPoint PPT Presentation

  • Uploaded on

Multi-stack System Software. Jack Lange Assistant Professor University of Pittsburgh. Summary. Commodity and HPC systems have been converging Commodity off the shelf components Linux based HPC systems Cloud computing Problem: Real HPC applications need HPC environments

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Multi-stack System Software' - vartan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multi stack system software

Multi-stack System Software

Jack Lange

Assistant Professor

University of Pittsburgh


  • Commodity and HPC systems have been converging

    • Commodity off the shelf components

    • Linux based HPC systems

    • Cloud computing

  • Problem: Real HPC applications need HPC environments

    • Tightly coupled, massively parallel, and synchronized

    • Current services must provide dedicated HPC systems

  • Can we co-host HPC applications on commodity systems?

  • Dual Stack Approach

    • Provision the underlying software stack along with application

    • Commodity stack should handle commodity applications

    • HPC stack can provide HPC environment

User space partitioning
User Space Partitioning

  • Current systems do support this, but…

  • Interference still exists inside the system software

    • Inherent feature of commodity systems

Socket 2

Socket 1













Commodity Partition

HPC Partition

Hpc vs commodity systems
HPC vs. Commodity Systems

  • Commodity systems have fundamentally different focus than HPC systems

    • Amdahl’s vs. Gustafson’s laws

    • Commodity: Optimized for common case

  • HPC: Common case is not good enough

    • At large (tightly coupled) scales, percentiles lose meaning

    • Collective operations must wait for slowest node

    • 1% of nodes can make 99% suffer

    • HPC systems must optimize outliers (worst case)

Dual stack approach
Dual Stack Approach

  • Partition

    • Segment the underlying hardware resources

    • Assign them to exclusively to specific workloads

  • Isolate

    • Prevent interference from other workloads

    • Hardware: partitions must be course grained

    • Software: eliminate shared state

  • Implementation

    • Independent system software running on isolated resources

Hpc in the cloud
HPC in the cloud

  • Clouds are starting to look like supercomputers…

    • Are we seeing a convergence?

  • Not yet

    • Noise issues

    • Poor isolation

    • Resource contention

    • Lack of control over topology

  • Very bad for tightly coupled parallel apps

    • Require specialized environments that solve these problems

  • Approaching convergence

    • Vision: Dynamically partition cloud resources into HPC and commodity zones

    • This talk: partitioning compute nodes with performance isolation

Commodity vmms
Commodity VMMs

  • Virtualization is considered an “enterprise” technology

    • Designed for commodity environments

    • Fundamentally different, but not wrong!

  • Example: KVM architecture issues

    • Userspace handlers

    • Fairly complex memory management

    • Locking and periodic optimizations

    • Presence of system noise

Palacios vmm
Palacios VMM

  • OS-independent embeddable virtual machine monitor

    • Established compatibility with Linux, Kitten, and Minix

  • Specifically targets HPC applications and environments

    • Consistent performance with very low variance

  • Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers

    • 0-3% overhead at large scales (thousands of nodes)

      • VEE 2011, IPDPS 2010, ROSS 2011

Open source and freely available


Palacios linux

  • Palacios/Linux provides lightweight and high performance virtualized environments

    • Internally manages dedicated resources

      • Memory and CPU scheduling

    • Does not bother with “enterprise features”

      • Page sharing/merging, swapping, overcommitting resources

  • Palacios enables scalable HPC performance on commodity platforms

Vmm comparison
VMM Comparison

  • Primary difference: Consistency

    • Requirement for tightly coupled performance at large scale

  • Example: KVM nested paging architecture

    • Maintains free page caches to optimize performance

      • Requires cache management

    • Shares page tables to optimize memory usage

      • Requires synchronization

Dual stack architecture
Dual Stack Architecture

  • Partitioning at the OS level

HPC Application

Commodity Application(s)

HPC Linux

Commodity Linux


Palacios VMM

Palacios Resource


Linux Kernel

Linux Module Interface


  • Enable cloud to host both commodity and HPC apps

    • Each zone optimized for the target applications


  • Goal: Measure VM isolation properties

  • Partitioned a single node into HPC and commodity zones

    • Commodity Zone: Parallel Kernel compilation

    • HPC Zone: Set of standard HPC benchmarks

    • System:

      • Dual 6-core AMD Opteron with NUMA topology

      • Linux guest environments (HPC and commodity)

  • Important: Local node only

    • Does not promise good performance at scale

    • But, poor performance will magnify at large scales


Commodity VMMs degrade with contention

Palacios delivers

consistent performance

MiniFE: Unstructured implicit finite element solver

Mantevo Project --


  • A dual stack approach can provide HPC environments on commodity clouds

    • HPC and commodity workloads can dynamically share resources

    • HPC requirements can be met without fully dedicated resources

  • Networking is still an open issue

    • Need mechanisms for isolation and partitioning

    • Need high performance networking architectures

      • 1Gbit is not good enough

      • 10Gbit is good, Infiniband is better

    • Need control over placement and topologies

Multi stack operating systems
Multi-stack Operating Systems

  • Future Exascale Systems are moving towards in situ organization

    • Applications traditionally have utilized their own platforms

      • Visualization, storage, analysis, etc

  • Everything must now collapse onto a single platform

What this means for the os
What this means for the OS

  • At Petascale we could optimize each environment separately

    • Each had their own OS and hardware

  • At Exascale workloads will be co-located

    • Can a single OS handle all workloads effectively?

  • Probably Not

    • Each has different resource requirements and behaviors

    • Exascale will need to support multiple OS environments on the same hardware

Beyond virtualization
Beyond Virtualization

  • Virtualization imposes overhead

    • Power: requires transistors

    • Performance: small, but present

    • Interference: Still some dependencies on host OS

  • Might not be available on exascale hardware

  • Can we provide native partitioning?

    • We think so

    • Linux provides the ability to dynamically remove resources (CPUs, memory, etc)

    • These can be taken over by a second OS

Para nativ e architecture
Para-native Architecture

Commodity Application(s)

HPC Application


Palacio VMM



  • Provide LWK environment on a commodity system

    • Each zone optimized for the target applications


  • OS partition created via offlined resources

    • CPUs, memory, PCI devices

  • Secondary OS “booted” on offline resources

  • Issues:

    • OS initialization

      • Boot process

      • Resource discovery

    • Coordination and communication

    • Security and safety

Dual stack memory
Dual Stack Memory

  • Maybe we don’t need to provide an entirely separate OS

    • Instead selectively manage some resources

  • Dual stack memory

    • Provide a separate memory management layer to Linux

  • Features

    • Selectively manage heap per application

    • Provide applications with direct control over memory layout

    • Transparently back memory using large pages

    • Without overhead added by Linux

Dual stack architecture1
Dual Stack Architecture

HPC Application

Commodity Application(s)






  • Provide LWK memory manager on a commodity OS

Commodity linux
Commodity Linux

Initial on-demand Page faults

(500,000 – 600,000 cycles)

Performance comparison
Performance Comparison

Occasional Outliers

(Large page coalescing)

Linux Memory Management

Lightweight Memory Management

Lowlevel noise


  • Commodity systems are not designed to support HPC workloads

    • Different requirements and behaviors than commodity applications

  • A multi stack approach can provide HPC environments in commodity systems

    • HPC requirements can be met without separate physical systems

    • HPC and commodity workloads can dynamically share resources

    • Isolated system software environments

Thank you
Thank you

Jack Lange

Assistant Professor

University of Pittsburgh