1 / 7

Operating System Attributes for High Performance Computing

Operating System Attributes for High Performance Computing. Ken Rozendal Distinguished Engineer IBM Linux Technology Center. Operating System Attributes for HPC. Reducing NUMA Effects Exploiting larger page sizes Reducing operating system “jitter” Avoiding planned and unplanned downtime

sibley
Download Presentation

Operating System Attributes for High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operating System Attributes forHigh Performance Computing Ken Rozendal Distinguished Engineer IBM Linux Technology Center

  2. Operating System Attributes for HPC • Reducing NUMA Effects • Exploiting larger page sizes • Reducing operating system “jitter” • Avoiding planned and unplanned downtime • Other attributes

  3. Reducing NUMA Effects • Most systems have NUMA attributes due to memory bus and cache designs. • The degree of NUMA behavior is substantially different between systems. • The default OS behavior in placing new memory pages makes critical difference. • The applications need to either code to the default NUMA behavior or explicitly place. • OS needs to provide APIs for discovering NUMA topology and providing placement policies.

  4. Exploiting Larger Page Sizes • Larger page sizes reduce TLB reloads. • Most of the benefit occurs with the first few doublings of the page size. • Using both small and large page sizes requires very flexible allocation policies. • Need to have OS adjust quickly for changing requirements for large pages. • Need to be able to place large pages without changing application source code.

  5. Reducing Operating System “Jitter” • OS “jitter” - interruptions to execution on one node amplified across a cluster • Types of interruption – hardware and software interrupts, daemons • Approaches: • Eliminate types of interrupts (e.g. timer ticks) • Simplify – eliminate unused subsystems • Daemon squashing • Synchronizing interruptions across CPUs on node and nodes in cluster

  6. Avoiding Planned and Unplanned Downtime • Avoid hardware failures causing downtime. • CPUs, Memory, I/O • Avoid downtime due to software updates. • Concurrent update to operating system components • Avoid downtime due to hardware updates. • OS migration between systems • Application migration • Recover from unplanned downtimes: • Checkpoint/restart

  7. Other Operating System Attributes for HPC • Support for standard programming models • Support for high performance interconnects • Parallel file systems • Performance analysis and tuning tools • Parallel application debugging tools • Cluster system management tools

More Related