1 / 34

Achieving Isolation in Consolidated Environments

Achieving Isolation in Consolidated Environments. Jack Lange Assistant Professor University of Pittsburgh. Consolidated HPC Environments. The future is consolidation of commodity and HPC workloads HPC users are moving onto cloud platforms Dedicated HPC systems are moving towards in-situ

heinz
Download Presentation

Achieving Isolation in Consolidated Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh

  2. Consolidated HPC Environments • The future is consolidation of commodity and HPC workloads • HPC users are moving onto cloud platforms • Dedicated HPC systems are moving towards in-situ • Consolidated with visualization and analytics workloads • Can commodity OS/R’s effectively support HPC consolidation? • Commodity Design Goals • Maximized resource utilization • Fairness • Graceful degradation under load

  3. Hardware Partitioning • Current approaches emphasize hardware space sharing • Current systems do support this, but… • Interference still exists inside the system software • Inherent feature of commodity systems Socket 2 Socket 1 Memory Cores Cores Memory 6 2 5 1 8 7 4 3 Commodity Partition HPC Partition

  4. HPC vs. Commodity Systems • Commodity systems have fundamentally different focus than HPC systems • Amdahl’s vs. Gustafson’s laws • Commodity: Optimized for common case • HPC: Common case is not good enough • At large (tightly coupled) scales, percentiles lose meaning • Collective operations must wait for slowest node • 1% of nodes can make 99% suffer • HPC systems must optimize outliers (worst case)

  5. Multi-stack Approach • Dynamic Resource Partitions • Runtime segmentation of underlying hardware resources • Assigned to specific workloads • Dynamic Software Isolation • Prevent interference from other workloads • Execute on separate system software stacks • Remove cross stack dependencies • Implementation • Independent system software running on isolated resources

  6. Least Isolatable Units • Independently managed sets of isolated HW resources • Our Approach: Decompose system into sets of isolatable components • Independent resources that do not interfere with other components • Workloads execute on dedicated collections of LIUs • Units of allocation • CPU, memory, devices • Each are managed by independent system software stacks

  7. Linux Memory Management • Demand Paging • Primary goal is to optimize memory utilization – not performance • Reduce overhead of common application behavior (fork/exec) • Support many concurrent processes • Large Pages • Integrated with overall demand paging architecture • Implications for HPC • Insufficient resource isolation • System noise • Linux large page solutions contribute to these problems IPDPS 2014 Brian Kocoloski and Jack Lange, HPMMAP: Lightweight Memory Management for Commodity Operating Systems

  8. Transparent Huge Pages • Transparent Huge Pages (THP) • Fully automatic large page mechanism – no system administration or application cooperation • (1) Page fault handler uses large pages when possible • (2) khugepaged • khugepaged • Background kernel thread • Periodically allocates a large page • “Merges” large page into address space of any process requesting THP support • Requires global page table lock • Driven by OS heuristics – no knowledge of application workload

  9. Transparent Huge Pages • Large page faults green, small faults delayed by merges blue • Generally periodic, but not synchronized • Variability increases dramatically under additional load

  10. HugeTLBfs • HugeTLBfs • RAM-based filesystem supporting large page allocation • Requires pre-allocated memory pools reserved by system administrator • Access generally managed through libhugetlbfs • Limitations • Cannot back process stacks and other special regions • VMA permission/alignment constraints • Highly susceptible to overhead from system load

  11. HugeTLBfs • Overhead of small page faults increases substantially • Due to memory exhaustion • HugeTLBfs memory is removed from pools available to small page fault handler

  12. HPMMAP Overview • High Performance Memory Mapping and Allocation Platform • Lightweight memory management for unmodified Linux applications • HPMMAP borrows from the Kitten LWK to impose isolated virtual and physical memory management layers • Provide lightweight versions of memory management system calls • Utilize Linux memory offlining to completely manage large contiguous regions • Memory available in no less than 128 MB regions

  13. HPMMAP Application Integration

  14. Results

  15. Evaluation – Multi-Node Scaling • Sandia cluster (8 nodes, 1Gb Ethernet) • One co-located 4-core parallel kernel build per node • No over-committed cores • 32 rank improvement: 12% for HPCCG, 9% for miniFE, 2% for LAMMPS • miniFE • Network overhead past 4 cores • Single node variability translates into worse scaling (3% improvement in single node experiment)

  16. HPC in the cloud • Clouds are starting to look like supercomputers… • But we’re not there yet • Noise issues • Poor isolation • Resource contention • Lack of control over topology • Very bad for tightly coupled parallel apps • Require specialized environments that solve these problems • Approaching convergence • Vision: Dynamically partition cloud resources into HPC and commodity zones

  17. Multi-stack Clouds • Virtualization overhead is not due to hardware costs • Results from underlying Host OS/VMM architectures and policies • Susceptible to performance overhead and Interference • Goal to provide isolated HPC VMs on commodity systems • Each zone optimized for the target applications Commodity VM(s) Isolated VM KVM Palacios VMM Kitten (Lightweight Kernel) Linux Hardware With JiannanOuyang and Brian Kocoloski

  18. Multi-OS Architecture • Goals: • Fully isolated and independent operation • OS Bypass communication • No cross kernel dependencies • Needed Modifications: • Boot process that initializes subset of offline resources • Dynamic resource (re)assignment to the Kitten LWK • Cross stack shared memory communication • Block Driver Interface

  19. Isolatable Hardware • We view system resources as a collection of Isolatable Units • In terms of both Performance and Management • Some hardware makes this easy • PCI (w/MSI, MSI-X) • APIC • Some hardware makes this difficult • SATA • IO-APIC • IOMMU • Some hardware makes this impossible • Legacy IDE • PCI (w/ Legacy PCI-INTx IRQs) • Some hardware cannot be completely isolated • SRIOV PCI devices • HyperThreaded CPU cores

  20. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  21. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  22. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  23. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  24. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  25. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  26. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  27. Linux Offline Kitten Memory Memory Socket 2 Socket 1 Cores Cores 6 2 5 1 8 7 4 3 PCI Infiniband NIC SATA

  28. Multi-stack Architecture • Allow multiple dynamically created enclaves • Based on runtime isolation requirements • Provides flexibility of fully independent OS/Rs • Isolated Performance and resource management Commodity VM(s) HPC VM HPC App Commodity Application(s) HPC Application KVM Palacios VMM Linux Kitten (1) Kitten (2) Linux Kitten LWK (Lightweight Kernel) Palacio VMM Hardware Hardware

  29. Performance Evaluation • 8 Node InfinibandCluster • Space shared between commodity and HPC workloads • Commodity: Hadoop • HPC: HPCCG • Infinibandpassthrough for HPC VM • 1Gb Ethernet Passthrough for Commodity VM • Compared Multi-stack (Kitten + Palacios) vs. full Linux environment (KVM) • 10 Experiment runs for each configuration • CAVEAT: VM disks were all accessed from Commodity partition • Suffers significant interference (Current work)

  30. Conclusion • Commodity systems are not designed to support HPC workloads • Different requirements and behaviors than commodity applications • A multi stack approach can provide HPC environments in commodity systems • HPC requirements can be met without separate physical systems • HPC and commodity workloads can dynamically share resources • Isolated system software environments are necessary

  31. Thank you Jack Lange Assistant Professor University of Pittsburgh • jacklange@cs.pitt.edu • http://www.cs.pitt.edu/~jacklange

  32. Multi-stack Operating Systems • Future Exascale Systems are moving towards in situ organization • Applications traditionally have utilized their own platforms • Visualization, storage, analysis, etc • Everything must now collapse onto a single platform

  33. Performance Comparison Occasional Outliers (Large page coalescing) Linux Memory Management Lightweight Memory Management Lowlevel noise

More Related