multi stack system software
Skip this Video
Download Presentation
Multi-stack System Software

Loading in 2 Seconds...

play fullscreen
1 / 25

Multi-stack System Software - PowerPoint PPT Presentation

  • Uploaded on

Multi-stack System Software. Jack Lange Assistant Professor University of Pittsburgh. Summary. Commodity and HPC systems have been converging Commodity off the shelf components Linux based HPC systems Cloud computing Problem: Real HPC applications need HPC environments

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Multi-stack System Software' - vartan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multi stack system software

Multi-stack System Software

Jack Lange

Assistant Professor

University of Pittsburgh

  • Commodity and HPC systems have been converging
    • Commodity off the shelf components
    • Linux based HPC systems
    • Cloud computing
  • Problem: Real HPC applications need HPC environments
    • Tightly coupled, massively parallel, and synchronized
    • Current services must provide dedicated HPC systems
  • Can we co-host HPC applications on commodity systems?
  • Dual Stack Approach
    • Provision the underlying software stack along with application
    • Commodity stack should handle commodity applications
    • HPC stack can provide HPC environment
user space partitioning
User Space Partitioning
  • Current systems do support this, but…
  • Interference still exists inside the system software
    • Inherent feature of commodity systems

Socket 2

Socket 1













Commodity Partition

HPC Partition

hpc vs commodity systems
HPC vs. Commodity Systems
  • Commodity systems have fundamentally different focus than HPC systems
    • Amdahl’s vs. Gustafson’s laws
    • Commodity: Optimized for common case
  • HPC: Common case is not good enough
    • At large (tightly coupled) scales, percentiles lose meaning
    • Collective operations must wait for slowest node
    • 1% of nodes can make 99% suffer
    • HPC systems must optimize outliers (worst case)
dual stack approach
Dual Stack Approach
  • Partition
    • Segment the underlying hardware resources
    • Assign them to exclusively to specific workloads
  • Isolate
    • Prevent interference from other workloads
    • Hardware: partitions must be course grained
    • Software: eliminate shared state
  • Implementation
    • Independent system software running on isolated resources
hpc in the cloud
HPC in the cloud
  • Clouds are starting to look like supercomputers…
    • Are we seeing a convergence?
  • Not yet
    • Noise issues
    • Poor isolation
    • Resource contention
    • Lack of control over topology
  • Very bad for tightly coupled parallel apps
    • Require specialized environments that solve these problems
  • Approaching convergence
    • Vision: Dynamically partition cloud resources into HPC and commodity zones
    • This talk: partitioning compute nodes with performance isolation
commodity vmms
Commodity VMMs
  • Virtualization is considered an “enterprise” technology
    • Designed for commodity environments
    • Fundamentally different, but not wrong!
  • Example: KVM architecture issues
    • Userspace handlers
    • Fairly complex memory management
    • Locking and periodic optimizations
    • Presence of system noise
palacios vmm
Palacios VMM
  • OS-independent embeddable virtual machine monitor
    • Established compatibility with Linux, Kitten, and Minix
  • Specifically targets HPC applications and environments
    • Consistent performance with very low variance
  • Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers
    • 0-3% overhead at large scales (thousands of nodes)
      • VEE 2011, IPDPS 2010, ROSS 2011

Open source and freely available

palacios linux
  • Palacios/Linux provides lightweight and high performance virtualized environments
    • Internally manages dedicated resources
      • Memory and CPU scheduling
    • Does not bother with “enterprise features”
      • Page sharing/merging, swapping, overcommitting resources
  • Palacios enables scalable HPC performance on commodity platforms
vmm comparison
VMM Comparison
  • Primary difference: Consistency
    • Requirement for tightly coupled performance at large scale
  • Example: KVM nested paging architecture
    • Maintains free page caches to optimize performance
      • Requires cache management
    • Shares page tables to optimize memory usage
      • Requires synchronization
dual stack architecture
Dual Stack Architecture
  • Partitioning at the OS level

HPC Application

Commodity Application(s)

HPC Linux

Commodity Linux


Palacios VMM

Palacios Resource


Linux Kernel

Linux Module Interface


  • Enable cloud to host both commodity and HPC apps
    • Each zone optimized for the target applications
  • Goal: Measure VM isolation properties
  • Partitioned a single node into HPC and commodity zones
    • Commodity Zone: Parallel Kernel compilation
    • HPC Zone: Set of standard HPC benchmarks
    • System:
      • Dual 6-core AMD Opteron with NUMA topology
      • Linux guest environments (HPC and commodity)
  • Important: Local node only
    • Does not promise good performance at scale
    • But, poor performance will magnify at large scales

Commodity VMMs degrade with contention

Palacios delivers

consistent performance

MiniFE: Unstructured implicit finite element solver

Mantevo Project --

  • A dual stack approach can provide HPC environments on commodity clouds
    • HPC and commodity workloads can dynamically share resources
    • HPC requirements can be met without fully dedicated resources
  • Networking is still an open issue
    • Need mechanisms for isolation and partitioning
    • Need high performance networking architectures
      • 1Gbit is not good enough
      • 10Gbit is good, Infiniband is better
    • Need control over placement and topologies
multi stack operating systems
Multi-stack Operating Systems
  • Future Exascale Systems are moving towards in situ organization
    • Applications traditionally have utilized their own platforms
      • Visualization, storage, analysis, etc
  • Everything must now collapse onto a single platform
what this means for the os
What this means for the OS
  • At Petascale we could optimize each environment separately
    • Each had their own OS and hardware
  • At Exascale workloads will be co-located
    • Can a single OS handle all workloads effectively?
  • Probably Not
    • Each has different resource requirements and behaviors
    • Exascale will need to support multiple OS environments on the same hardware
beyond virtualization
Beyond Virtualization
  • Virtualization imposes overhead
    • Power: requires transistors
    • Performance: small, but present
    • Interference: Still some dependencies on host OS
  • Might not be available on exascale hardware
  • Can we provide native partitioning?
    • We think so
    • Linux provides the ability to dynamically remove resources (CPUs, memory, etc)
    • These can be taken over by a second OS
para nativ e architecture
Para-native Architecture

Commodity Application(s)

HPC Application


Palacio VMM



  • Provide LWK environment on a commodity system
    • Each zone optimized for the target applications
  • OS partition created via offlined resources
    • CPUs, memory, PCI devices
  • Secondary OS “booted” on offline resources
  • Issues:
    • OS initialization
      • Boot process
      • Resource discovery
    • Coordination and communication
    • Security and safety
dual stack memory
Dual Stack Memory
  • Maybe we don’t need to provide an entirely separate OS
    • Instead selectively manage some resources
  • Dual stack memory
    • Provide a separate memory management layer to Linux
  • Features
    • Selectively manage heap per application
    • Provide applications with direct control over memory layout
    • Transparently back memory using large pages
    • Without overhead added by Linux
dual stack architecture1
Dual Stack Architecture

HPC Application

Commodity Application(s)






  • Provide LWK memory manager on a commodity OS
commodity linux
Commodity Linux

Initial on-demand Page faults

(500,000 – 600,000 cycles)

performance comparison
Performance Comparison

Occasional Outliers

(Large page coalescing)

Linux Memory Management

Lightweight Memory Management

Lowlevel noise

  • Commodity systems are not designed to support HPC workloads
    • Different requirements and behaviors than commodity applications
  • A multi stack approach can provide HPC environments in commodity systems
    • HPC requirements can be met without separate physical systems
    • HPC and commodity workloads can dynamically share resources
    • Isolated system software environments
thank you
Thank you

Jack Lange

Assistant Professor

University of Pittsburgh