Multi stack system software
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Multi-stack System Software PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Multi-stack System Software. Jack Lange Assistant Professor University of Pittsburgh. Summary. Commodity and HPC systems have been converging Commodity off the shelf components Linux based HPC systems Cloud computing Problem: Real HPC applications need HPC environments

Download Presentation

Multi-stack System Software

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multi stack system software

Multi-stack System Software

Jack Lange

Assistant Professor

University of Pittsburgh


Summary

Summary

  • Commodity and HPC systems have been converging

    • Commodity off the shelf components

    • Linux based HPC systems

    • Cloud computing

  • Problem: Real HPC applications need HPC environments

    • Tightly coupled, massively parallel, and synchronized

    • Current services must provide dedicated HPC systems

  • Can we co-host HPC applications on commodity systems?

  • Dual Stack Approach

    • Provision the underlying software stack along with application

    • Commodity stack should handle commodity applications

    • HPC stack can provide HPC environment


User space partitioning

User Space Partitioning

  • Current systems do support this, but…

  • Interference still exists inside the system software

    • Inherent feature of commodity systems

Socket 2

Socket 1

Memory

Cores

Cores

Memory

6

2

5

1

8

7

4

3

Commodity Partition

HPC Partition


Hpc vs commodity systems

HPC vs. Commodity Systems

  • Commodity systems have fundamentally different focus than HPC systems

    • Amdahl’s vs. Gustafson’s laws

    • Commodity: Optimized for common case

  • HPC: Common case is not good enough

    • At large (tightly coupled) scales, percentiles lose meaning

    • Collective operations must wait for slowest node

    • 1% of nodes can make 99% suffer

    • HPC systems must optimize outliers (worst case)


Dual stack approach

Dual Stack Approach

  • Partition

    • Segment the underlying hardware resources

    • Assign them to exclusively to specific workloads

  • Isolate

    • Prevent interference from other workloads

    • Hardware: partitions must be course grained

    • Software: eliminate shared state

  • Implementation

    • Independent system software running on isolated resources


Hpc in the cloud

HPC in the cloud

  • Clouds are starting to look like supercomputers…

    • Are we seeing a convergence?

  • Not yet

    • Noise issues

    • Poor isolation

    • Resource contention

    • Lack of control over topology

  • Very bad for tightly coupled parallel apps

    • Require specialized environments that solve these problems

  • Approaching convergence

    • Vision: Dynamically partition cloud resources into HPC and commodity zones

    • This talk: partitioning compute nodes with performance isolation


Commodity vmms

Commodity VMMs

  • Virtualization is considered an “enterprise” technology

    • Designed for commodity environments

    • Fundamentally different, but not wrong!

  • Example: KVM architecture issues

    • Userspace handlers

    • Fairly complex memory management

    • Locking and periodic optimizations

    • Presence of system noise


Palacios vmm

Palacios VMM

  • OS-independent embeddable virtual machine monitor

    • Established compatibility with Linux, Kitten, and Minix

  • Specifically targets HPC applications and environments

    • Consistent performance with very low variance

  • Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers

    • 0-3% overhead at large scales (thousands of nodes)

      • VEE 2011, IPDPS 2010, ROSS 2011

Open source and freely available

  • http://www.v3vee.org/palacios


Palacios linux

Palacios/Linux

  • Palacios/Linux provides lightweight and high performance virtualized environments

    • Internally manages dedicated resources

      • Memory and CPU scheduling

    • Does not bother with “enterprise features”

      • Page sharing/merging, swapping, overcommitting resources

  • Palacios enables scalable HPC performance on commodity platforms


Vmm comparison

VMM Comparison

  • Primary difference: Consistency

    • Requirement for tightly coupled performance at large scale

  • Example: KVM nested paging architecture

    • Maintains free page caches to optimize performance

      • Requires cache management

    • Shares page tables to optimize memory usage

      • Requires synchronization


Dual stack architecture

Dual Stack Architecture

  • Partitioning at the OS level

HPC Application

Commodity Application(s)

HPC Linux

Commodity Linux

KVM

Palacios VMM

Palacios Resource

Managers

Linux Kernel

Linux Module Interface

Hardware

  • Enable cloud to host both commodity and HPC apps

    • Each zone optimized for the target applications


Evaluation

Evaluation

  • Goal: Measure VM isolation properties

  • Partitioned a single node into HPC and commodity zones

    • Commodity Zone: Parallel Kernel compilation

    • HPC Zone: Set of standard HPC benchmarks

    • System:

      • Dual 6-core AMD Opteron with NUMA topology

      • Linux guest environments (HPC and commodity)

  • Important: Local node only

    • Does not promise good performance at scale

    • But, poor performance will magnify at large scales


Results

Results

Commodity VMMs degrade with contention

Palacios delivers

consistent performance

MiniFE: Unstructured implicit finite element solver

Mantevo Project -- https://software.sandia.gov/mantevo/index.html


Discussion

Discussion

  • A dual stack approach can provide HPC environments on commodity clouds

    • HPC and commodity workloads can dynamically share resources

    • HPC requirements can be met without fully dedicated resources

  • Networking is still an open issue

    • Need mechanisms for isolation and partitioning

    • Need high performance networking architectures

      • 1Gbit is not good enough

      • 10Gbit is good, Infiniband is better

    • Need control over placement and topologies


Multi stack operating systems

Multi-stack Operating Systems

  • Future Exascale Systems are moving towards in situ organization

    • Applications traditionally have utilized their own platforms

      • Visualization, storage, analysis, etc

  • Everything must now collapse onto a single platform


What this means for the os

What this means for the OS

  • At Petascale we could optimize each environment separately

    • Each had their own OS and hardware

  • At Exascale workloads will be co-located

    • Can a single OS handle all workloads effectively?

  • Probably Not

    • Each has different resource requirements and behaviors

    • Exascale will need to support multiple OS environments on the same hardware


Beyond virtualization

Beyond Virtualization

  • Virtualization imposes overhead

    • Power: requires transistors

    • Performance: small, but present

    • Interference: Still some dependencies on host OS

  • Might not be available on exascale hardware

  • Can we provide native partitioning?

    • We think so

    • Linux provides the ability to dynamically remove resources (CPUs, memory, etc)

    • These can be taken over by a second OS


Para nativ e architecture

Para-native Architecture

Commodity Application(s)

HPC Application

Kitten

Palacio VMM

Linux

Hardware

  • Provide LWK environment on a commodity system

    • Each zone optimized for the target applications


Approach

Approach

  • OS partition created via offlined resources

    • CPUs, memory, PCI devices

  • Secondary OS “booted” on offline resources

  • Issues:

    • OS initialization

      • Boot process

      • Resource discovery

    • Coordination and communication

    • Security and safety


Dual stack memory

Dual Stack Memory

  • Maybe we don’t need to provide an entirely separate OS

    • Instead selectively manage some resources

  • Dual stack memory

    • Provide a separate memory management layer to Linux

  • Features

    • Selectively manage heap per application

    • Provide applications with direct control over memory layout

    • Transparently back memory using large pages

    • Without overhead added by Linux


Dual stack architecture1

Dual Stack Architecture

HPC Application

Commodity Application(s)

Lightweight

Memory

Management

Linux

Hardware

  • Provide LWK memory manager on a commodity OS


Commodity linux

Commodity Linux

Initial on-demand Page faults

(500,000 – 600,000 cycles)


Performance comparison

Performance Comparison

Occasional Outliers

(Large page coalescing)

Linux Memory Management

Lightweight Memory Management

Lowlevel noise


Conclusion

Conclusion

  • Commodity systems are not designed to support HPC workloads

    • Different requirements and behaviors than commodity applications

  • A multi stack approach can provide HPC environments in commodity systems

    • HPC requirements can be met without separate physical systems

    • HPC and commodity workloads can dynamically share resources

    • Isolated system software environments


Thank you

Thank you

Jack Lange

Assistant Professor

University of Pittsburgh

  • [email protected]

  • http://www.cs.pitt.edu/~jacklange


  • Login