High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ?

High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ? David Abramson, Jon Giddy and Lew Kotler Presentation By: Abhijeet Karnik

Outline • Introduction • Parametric Modeling with Nimrod • Nimrod/G Description • Architecture • Working • Comparison with Nimrod • Globus Toolkit and Grid Issues • Scheduling on the Grid • Cost • Scheduling Algorithms • Case Study: An evaluation of Nimrod/G • Conclusion • References

Introduction • We examine the role of parametric modeling as an application for the global computing grid and explore some heuristics using which we can specify some soft real-time deadlines for larger computational experiments. Nimrod is a specialized parametric modeling system: • It uses a simple declarative parametric modeling language to expresses an experiment • Provides machinery that automates the task of formulating, running, monitoring and collating the results from multiple individual experiments • Incorporates a scheduling component that can manages scheduling of individual experiments to idle computers.

Parametric Modeling With Nimrod Nimrod: In a Nutshell • Is a tool that manages the execution of parametric studies across distributed computers. • Takes responsibility of the overall experiment as well as low-level issues of distributing files to remote systems. • Performs remote computation and gathers the results. • A user describes an experiment to Nimrod, it develops a declarative plan which describes their default values, parameters and commands necessary. • A plan file consists of two main sections the parameter section and the task section. • The machine which invokes Nimrod becomes known as the Root Machine: It controls the experiment. • The dispatcher executes code on remote platforms; each of which is known as a computational node.

Parametric Modeling (Contd.) The Plan file is processed by a tool called the generator. The Generator: • Takes the parameter values and gives the user the choice of actual values. • Builds a run-file which contains a description of each job. • This run-file is then processed by another tool called the dispatcher. The Dispatcher: • Implements File-Transfer commands. • Responsible for the execution of the model on the remote nodes and for managing the computation across the nodes. • Allocates work to machines without any attempt to schedule their execution.

Plan-file Processing Plan File Default Parameter Values & Commands Generator Actual Values Run File (Description of Each job) Processed By Managing Computation, Transfer Commands & Execution Dispatcher

Phases of a Nimrod Computation • Experiment Pre-Processing : Data is set-up for the experiment. • Execution Pre-Processing: Data is prepared for a particular execution. • Execution: Program execution for a given set of parameter values • Execution Post-Processing: Data from a particular experiment is reduced. • Experiment Post-Processing: Results are processed using tools. Phases 1 and 5 are performed once per experiment, while phases 2, 3 and 4 are run for each distinct set of parameters.

Nimrod: Limitations • Nimrod, though successful, suffers from a few limitations when considered in the context of a global computational grid. • Uses a static set of resources and does not discover new ones dynamically. • Has no idea of user deadlines. In a dynamic global grid environment this is not acceptable. • Nimrod relies on UNIX level security, whereas in the global grid, owners of expensive supercomputing resources require a more elaborate security mechanism. • Nimrod does not support a range of access mechanisms.

Nimrod/G: Description • Nimrod/G extends the basic Nimrod model to provide soft performance guarantees in a dynamic and heterogeneous environment . • There is an effective scheduling component in Nimrod/G which seeks to meet such constraints It provides a dynamic and iterative process of resource discovery, resource acquisition and resource monitoring. Nimrod/G is a “Grid-Aware” application. • It exploits the understanding of its problem domain and; • Nature of the Computational Grid • Features:- • High level interface to the user. • Transparent access to the computational resources. • Implements user level Scheduling

Nimrod/G Architecture • Nimrod/G is designed to operate in an environment that comprises a set of sites. • Sites provide access to a set of computers with their own administrative control. • Access to resources are mediated by GRAMs. • Information about the physical characteristics and availability of resources are available from the MDS.

Nimrod-G Architecture Nimrod/G Client Nimrod/G Client Nimrod/G Client Parametric Engine Schedule Advisor Resource Discovery Persistent Info. Dispatcher Grid Directory Services Grid Middleware Services GUSTO Test Bed

Nimrod/G Working • A user initiates a parametric study at a local site. • Nimrod/G then organizes the mapping of individual computations to appropriate remote sites- Scheduling heuristics. • On the local site , the Origin Process operates as the master for the whole system; it exists for the entire duration of the experiment. It is responsible for execution within the specified time and cost constraints. • Client and the origin are distinct because a client may be tied to a particular environment. • It is possible for multiple clients to monitor the same experiment by connecting to the one Origin process

Nimrod/G Working • Each remote site consists of a cluster of computational nodes. • A cluster may be a singe multiprocessor machine, a cluster of workstations, or even a single processor. • A defining characteristic of a cluster is that access to all nodes is thru’ a set of resource managers provided by the Globus Infrastructure. • The Origin process uses the Globus process creation services to start a Nimrod Resource Broker (NRB) on the cluster . • NRB provides capabilities for file staging, creation of jobs and process control beyond that provided by the GRAM.

Attempts to schedule otherwise unrelated tasks so that a user specified deadline is met. Computational Resources are allocated in a dynamic fashion so as to meet specified deadlines and constraints. The scheduling complexity is increased due to the introduction of parameters such as computational economics, deadlines, usage of scattered & remote resources. There is no communication between tasks once they have started. The scheduling reduces to finding suitable resources and execution of the application. Scheduling is restricted to allocating resources statically so that the application can complete; remoteness of resources, deadline and cost constraints and other such complexities are not considered. Nimrod/G Versus Nimrod

Globus Toolkit and Grid Issues • The Globus Toolkit is a collection of software components designed to support the development of applications slated for a high-performance distributed computing environment. • Implementation of a bag-of-services architecture. • Globus components provide basic services such as resource allocation, authentication, information communication, remote data access, fault detection..among others. • Applications and Tools combine these services in different ways to construct ‘grid-enabled’ systems. Nimrod/G Uses 1.Globus Resource Allocation Manager (GRAM) for starting and managing computations on a resource. 2. Metacomputing Directory Service (MDS) provides an API for discovering the structure and state of resources for computation. • Globus Security Infrastructure (GSI) provides a single sign-on, run anywhere capabilities for computations. • Global Access to Secondary Storage (GASS) provides uniform access mechanisms for files on various storage systems.

Cost in a Global Grid • Unless restrictions are placed on access to various resources of a global grid, it is likely to become congested in with too much work. • A fiscal model has been implemented for controlling the amount of work requested wherein users pay for access. • This scheme allows resource providers to set pricing rates for the various machines- this varies between the classes of machines, times of the day, resource demand and classes of users. Nimrod/G:The Cost Matrix

Scheduling Algorithm • Nimrod/G scheduler is responsible for discovering and allocating the resources required to complete an experiment, subject to execution time and budget constraints. Scheduling Heuristics: - • Discover: the number and then the identity of the lowest-cost set of resources able to meet the deadline. A cost matrix is used for this and the output from this phase is set of resources to which jobs should be submitted. • Allocation: Unscheduled jobs are allocated to the candidate resources identified in the discovery phase. • Monitoring: The completion time of submitted jobs is monitored, hence establishing an execution rate for each resource. • Refinement: Rate information is used to update estimates of typical execution times on different resources and hence the expected completion time of the job. This may lead to jumps to steps 1 & 2 so as to discover new resources or drop existing ones from the candidate set

Scheduling Algorithm This scheme continues till the deadline is met or until the cost budget is exceeded. The user is advised and then the deadline can be modified accordingly. • A consequence of using this cost-based implementation is that the cost of an experiment will vary depending on the load and the profile of the users at that time. • This reflects the demand and supply mechanism, less demand will allow the experiment to be performed on cheaper resources. The thinking is more towards “Allowing the user to specify an absolute (soft) deadlines so as to express the timeliness of the computation”

Case Study:An Experiment An experiment has been conducted to test the effectiveness of Nimrod/G architecture and scheduling heuristics in a real time application. Resources were provided by the GUSTO (Globus Ubiquitous Supercomputing Testbed Organization). They are diverse in terms of their size, availability, architecture, processing capability, power, performance, scheduling mechanism & location.

Ionization Chamber • An ionization chamber essentially isolates a certain volume of air and measures the ionization within that volume. • This process however modifies the original photon and electron spectrum entering the volume. • If the ionization chamber is to act as a primary standard for calibration purpose, it is necessary to correct the measured ionization. • Experiments were performed and calculations reported here concern the simulation of the chamber response as a function of the front wall thickness. • Nimrod/G performs this parametric variation.

Computational Results • The ionization chamber study involved 400 tasks; the execution time of the model varied depending on the platform used; 45 minutes to 140 minutes per parameter set. • Three separate experiments were performed, with deadlines of 20 hours, 15 hours and 10 hours respectively . • This allows an evaluation of Nimrod/G’s ability to meet soft real time deadlines. • The graphs obtained for the different deadlines depict the manner in which Nimrod/G allocates additional resources for more stringent deadlines.

Computational Results The number of processors allocated are dependent on the deadline

Results: 20 Hour Deadline 10 CU machines are introduced when the scheduler calculates that it cannot meet the deadline with the 5 CU machines

Results: 15 Hour Deadline Higher CU machines are introduced when the scheduler calculates that it cannot meet the deadline with lower CU machines

Results: 10 Hour Deadline 50 CU machines are introduced 2 hrs later, these were not needed in the 15 and 20 hour deadline experiment.

Computational Cost • Quantifies the impact on cost of the different node selections made for different deadlines: A 10 hour deadline costs three times as much as the 20 hour deadline. • In a dynamic environment it is not possible to show that Nimrod/G is making optimal selections- it is however, effective in selecting more expensive nodes only when the system requires them to meet deadlines.

Conclusion • We have discussed the evolution of a scheduling tool, Nimrod, from a local computing environment to a Global Computing Grid. • Nimrod/G architecture offers a scalable model for resource management and scheduling on computational grids • The algorithm used is simple and adaptive to changes; it incorporates user as well as system requirements. However, future work needs to address issues such as: • Plan to use the concept of Advance Resource Reservation in order to offer the feature wherein the user can say “I am willing to pay $…, can you complete my job by this time…” • Take into account the ability of Globus to reserve resources and incorporate them into the scheduling mechanism. • A notion of priority could be implemented in addition to the cost-based implementation.

References High Performance Parametric Modeling with Nimrod/G: Killer Application for Global Grids?", D. Abramson, J. Giddy and L. Kotler, International Parallel and Distributed Processing Sumposiu (IPDPS), May2000. Web Sites: www.globus.org

High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ?

High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ?

Presentation Transcript

Multilevel Modeling

Improving Application Performance with the PowerBuilder Trace Engine and Profiler Application

Using Performance Monitoring Hardware for Application Performance Analysis

Chapter 6: The Relational Data Model

High Performance Concrete

Improving High Schools Performance Goals and Characteristics of High-performing High Schools

2-D FEM Modeling of Woodrow Wilson Bridge

High Performance Platforms

NONPARAMETRIK

Web application Performance

Quantifying Performance Models

Statistical Modeling of Text

High Performance Cluster Computing Architectures and Systems

Grid Computing and Grid Site Infrastructure

SAM: Tevatron Experiments Using the Grid

Modeling Application Process

Runtime Power Measurement/Modeling and Thermal Modeling

Software Performance Modeling

Introduction to High Performance Computing

I/O Performance Analysis and Tuning: From the Application to the Storage Device