1 / 33

A Framework for Adaptable Operating and Runtime Systems

A Framework for Adaptable Operating and Runtime Systems. Ron Brightwell Scalable Computing Systems Sandia National Laboratories Albuquerque, New Mexico, USA. FAST-OS PI Meeting June 9-10, 2005. Project Details. Sandia National Laboratories Neil Pundit (Project Director)

duena
Download Presentation

A Framework for Adaptable Operating and Runtime Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Adaptable Operating and Runtime Systems Ron Brightwell Scalable Computing Systems Sandia National Laboratories Albuquerque, New Mexico, USA FAST-OS PI Meeting June 9-10, 2005 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

  2. Project Details • Sandia National Laboratories • Neil Pundit (Project Director) • Ron Brightwell (Coordinating PI) • Rolf Riesen, Trammell Hudson, Zaid Adubayeh • University of New Mexico • Barney Maccabe (PI) • Patrick Bridges • California Institute of Technology • Thomas Sterling (PI)

  3. What’s wrong with current operating systems?

  4. Cluster Network Hardware • DMA between NIC and host memory • Physical addresses on NIC • Can be user- or kernel-space • Memory descriptors on NIC • Benefits associated with offloading • Reduced overhead • Increased bandwidth • Reduced latency

  5. OS Bypass Versus Splintering Cluster Architecture Distributing little bits of the OS

  6. Other Issues • General-purpose operating systems • Generality comes at the cost of performance for all applications • Assume a generic architectural model • Difficult to expose novel features • Lightweight operating systems • Limited functionality • Difficult to add new features • Designed to be used in the context of a specific usage model • Operating system is an impediment to new architectures and programming models

  7. Factors Impacting OS Design

  8. LWK Influences • Lightweight OS • Small collection of apps • Single programming model • Single architecture • Single usage model • Small set of shared services • No history • Puma/Cougar • MPI • Distributed memory • Space-shared • Parallel file system • Batch scheduler

  9. Programming Models

  10. Usage Models

  11. Current and Future System Demands • Architecture • Modern ultrascale machines have widely varying system-level and node-level architectures • Future systems will have further hardware advances (e.g., multi-core chips, PIMs) • Programming model • MPI, Thread, OpenMP, PGAS, … • External services • Parallel file systems, dynamic libraries, checkpoint/restart, … • Usage model • Single, large, long-running simulation • Parameter studies with thousands of single-processor, short-running jobs

  12. Project Goals • Realize a new generation of scalable, efficient, reliable, easy to use operating systems for a broad range of future ultrascale high-end computing systems based on both conventional and advanced hardware architectures and in support of diverse, current and emerging parallel programming models. • Devise and implement a prototype system that provides a framework for automatically configuring and building lightweight operating and runtime system based on the requirements presented by an application, system usage model, system architecture, and the combined needs for shared services.

  13. Approach • Define and build a collection of micro-services • Small components with well-defined interfaces • Implement an indivisible portion of service semantics • Fundamental elements of composition and re-use • Combine micro-services specifically for an application and a target platform • Develop tools to facilitate the synthesis of required micro-services

  14. Tools for Combining Micro-Services • Need to insure that required micro-services are available • Need to insure that applications are isolated from one another within the context of a given usage model • Verifying that a set of constraints are met • Further work will allow for reasoning about additional system properties, such as performance based on feedback from previous runs

  15. Building Custom Operating/Runtime Systems

  16. Timetable • 12 months • Define basic framework and micro-service APIs • Define initial micro-services for supporting a lightweight kernel equivalent • Identify applications and related metrics for evaluating resulting systems • 24 months • Demonstrate configuration and linking tools with multiple lightweight kernel configurations • Define application-specific micro-services for optimizing application performance • Define shared-service micro-services for common application services (e.g. TCP/IP)

  17. Timetable (cont’d) • 36 months • Demonstrate instance of framework for PIM-based system on base-level PIM architecture simulator • Demonstrate application/kernel configurability using application-specific and shared-service micro-services • Release complete software package as open source • Provide detailed report summarizing completed and future work

  18. Related Work • Microkernels • K42, L4, Pebble, Mach, … • Exo-kernel • Extensible operating systems • Spin, Vino, sandboxing, … • Modules • Configurable OS/Runtime • Scout, Think, Flux OSKit, eCos, TinyOS • SREAMS, x-kernel, CORDS

  19. More Info • “Highly Configurable Operating Systems for Ultrascale Systems,” Maccabe et al. In Proceedings of the First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-1), June 2004. (http://coset.irisa.fr/)

  20. Current Thinking

  21. HPC Architecture Framework • Next generation operating system must serve conventional architectures of today and future architectures towards nanoscale technology • Today: commodity clusters, MPPs, vector, multicore • Tomorrow: Heterogeneous, Stream, PIM, Reconfigurable

  22. Some Basic Assumptions • Different programming and execution models will demand different optimizations in OS services for best performance • Micro-services may require more than a single node resource to accomplish function • Micro-services may perform management function on more than one node resource at a time

  23. Seven-Dimensional Space of Architectures • Degree of Coupling • Relative bandwidth between external channel and local memory • Latency and latency hiding strategy support (e.g. multithreading) • Local state capacity • Overhead of context instantiation, switching, and synchronization • Namespace semantics and management towards single system image and virtualization • Resource allocation for memory objects (static versus dynamic) • Intrinsic model of execution • What to do when the application flatlines – exceptions and booting

  24. Programming Models • Distributed namespace • Shared nothing • Shared namespace • Shared addresses, heavyweight threads • Highly dynamic and fine-grained • Lightweight threads • In-memory synchronization • Need to keep an eye on new language developments ( Chpl, Fortress, X10)

  25. Additional Considerations • OS services may consume parallel resources • Different applications may vary dramatically in terms of balance of resource usage and OS service request functionality • Architecture support may vary dramatically for efficient OS service functionality • “local” means all references and resources for an action are immediately available within a single cycle

  26. Initial Focus • Programming models • Distributed namespace (MPI) • Shared namespace (UPC) • Dynamic, fine-grained (ParalleX) • Architectures • Distributed memory (Red Storm) • Shared memory (Columbia) • PIM (MIND simulator)

  27. Currently Brainstorming on Micro-services • Definition of a micro-service • Programming model for micro-services • What does a micro-service need? • What does a micro-service provide? • How does a micro-service behave? • How to support multiple implementations of the same micro-service? • What’s the smallest unit of activity that can be distributed? • Some micro-services may not be useful until combined with other micro-services • Can dynamic compilation serve as a possible implementation strategy for composing services

  28. Composability Framework • Model for composability of micro-services • Need more brainstorming to design the framework • Leverage OpenCatamount’s implementation details when the framework is defined

  29. OpenCatamount • Open source version of Catamount • All Cray, Intel, and nCUBE proprietary code removed • Export control restriction lifted • Boots on a Dell box • Working on configuration and build environment • Likely will be released under Mozilla-style license • Should be up on Sandia download site real soon® • Port to a virtualization environment?

  30. Timeline • Programming model for micro-services • Q1FY06 • Initial draft of the framework design and definition of micro-services • Q2FY06 • Description of tools to do composition • Q3FY06

  31. Most Pressing Issue • What is the name of this project?  • Potential candidates: • ConfigOS, ANOS, ACRONYM, ASCENT, OPUS….

  32. Barney “Working”

  33. FAST-OS Funding Comes Through

More Related