DS-RT 2008 Tutorial – Distributed Simulation on the Grid

DS-RT 2008 Tutorial – Distributed Simulation on the Grid Stephen John Turner Ke Pan Wentong Cai Zengxiang Li Parallel & Distributed Computing Centre Nanyang Technological University, Singapore

Outline • Part 1: Concepts and Challenges • Background and Motivation • Taxonomy of Grid-based Simulation • Decoupled Architecture • Research Challenges • Part 2: A Service Oriented HLA RTI (SOHR) • Service Oriented HLA RTI (SOHR) Framework • Using SOHR • Demonstration of SOHR • Conclusions

1. Background and Motivation

Background and Motivation • Background • Distributed Simulation • High Level Architecture (HLA) • Web and Grid Services • Service Oriented Architecture • Motivation • HLA and the Grid • Vision

Distributed Simulation • Distributed Simulation • Provides a way of linking simulation components (federates) of various types at possibly different locations to create a common virtual environment (federation)

Introduction • Distributed Simulation • Aims to promote the interoperability and reusability of simulation applications • Allows geographically distributed simulation components to be linked together • High Level Architecture (HLA) • DMSO 1.3 standard (DMSO 1998) • IEEE 1516 standard (IEEE 2000) • Defines the Rules, Interface Specification, Object Model Template (OMT) and Federation Development & Execution Process (FEDEP) • A Run Time Infrastructure (RTI) implements the HLA standard

Federation SOM SOM SOM SOM SOM SOM FOM SOM SOM SOM HLA Rules (Federations) HLA Rules (Federates) SimulationSurrogates Passive Viewers Simulations Interface FED Run-Time Infrastructure (RTI) Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management High Level Architecture

High Level Architecture

Limitations of the HLA • In a traditional HLA-based distributed simulation: • A vendor-specific RTI software is required • Federates with different RTI versions cannot cooperate with each other • Software and hardware resource arrangements and security settings are needed beforehand • Because of these inflexibilities, it is not easy to run HLA-based distributed simulations across administrative domains • There is a demand for conducting HLA-based distributed simulation across a WAN • A specific federate which can only be executed at a fixed site due to a specific resource requirement or security issues • A large scale application which needs numerous WAN resources

Web Services • Support XML based distributed computing using the following three industry standards: • Web Service Description Language (WSDL) • Universal Description Discovery and Integration (UDDI) • Simple Object Access Protocol (SOAP)

Registering Web Service Describing Information Specifying Web Service • UDDI • Repository of • Web services • - URL • - WSDL • Provider • … XML Schema (1) Data Input (2) Data Output XML Schema (WSDL) (1) Data Type Definitions (2) Abstract Operations (3) ServiceBindings Web Services Web Service Provider • Web Service Consumer: • Search UDDI repository forapplicable web service • Get and analyze WSDL • Prepare input data (XML) • Send data via SOAP/Invoke web service • Receive output data (XML)

Grid Computing • The Grid enables flexible, secure, coordinated resource sharing among dynamic collections of individuals and institutions, referred to as “virtual organizations” … Ian Foster (Foster 2001) Communities can share geographically distributed resources for their common purpose

Grid Services • The Open Grid Services Architecture (OGSA) defines a set of standard core capabilities and behaviors that address key concerns in Grid Systems (Foster 2002) • Globus Toolkit (GT) is the de facto standard middleware for Grid computing with its newest version GT4 (GT4 2007) providing an OGSA implementation based on the WSRF – WS Resource Framework (WSRF 2007)

WSRF State-full Web Services GT4 Grid Services • Extension of Web Services • State-full Web Services • Convergence of Web and Grid Standards implements

Service Oriented Architecture • Services are discrete web- or grid-based applications that interact dynamically with other services • Necessary service functions • Functional self-description of the service • Publishing of service descriptions • Locating the service with the required functionality • Requesting the required data to initiate the service • Establishing the data exchange • Delivering the results • Advantages • Loose-coupling between services • Interoperability • Platform and Language-neutral

Background and Motivation • Background • Distributed Simulation • High Level Architecture (HLA) • Web and Grid Services • Service Oriented Architecture • Motivation • HLA and the Grid • Vision

Motivation • The development of complex simulation applications usually requires collaborative effort from researchers with different domain knowledge and expertise, possibly at different locations • These simulation systems often require huge computing resources and the data sets required by the simulation may also be geographically distributed • A large-scale distributed simulation can be constructed using HLA – however: • HLA does not provide support for collaborative development of simulation applications • HLA does not provide any mechanism for managing the resources where the simulation is being executed

HLA and the Grid • The High Level Architecture (HLA) enables the construction of large-scale distributed simulations by linking existing and possibly distributed simulation components • Grid technologies enable collaboration and the use of distributed computing resources, while also facilitating access to geographically distributed data sets • The Grid offers exciting new opportunities: • Enables collaboration • Enables the use of distributed computing resources, • Allows access to geographically distributed data sets • Supports Service Oriented Architectures

Vision • A Grid “plug-and-play” distributed collaborative simulation environment, where researchers with different domain knowledge and expertise, at different locations, develop, modify, assemble and execute distributed simulation components over the Grid DS-Grid One of only four “Sister Projects” funded by the UK e-Science Core Programme

RTI RTI RTI Model Factory Model Factory RTI RTI federate federate federate federate federate Vision • Discovery of Models • Discovery of Resources • Management of Simulation Execution

2. Taxonomy of Grid-based Simulation

Taxonomy of Grid-based Simulation • HLA-based Approaches • Grid Facilitated Approach • Grid Enabled Approach • Grid Oriented Approach • Non-HLA Approaches • Our Projects

Grid Facilitated Approach • Grid services facilitate HLA-based distributed simulations (e.g. resource management) while simulation communication is through a vendor-specific RTI • G-HLAM (Rycerz 2006) • Efficient execution of HLA-based distributed simulations on the Grid • Grid services cooperate to obtain an optimized configuration of federation execution • Simulation communications through a vendor-specific RTI • Aegis data Grid for large scale distributed simulation (Wu 2004) • Data resource management services and computing services for HLA-based simulation • Simulation execution still uses a vendor-specific RTI

Grid Facilitated Approach • RTI-G (Choi 2005) • An RTI execution environment based on OGSA • Utilizes MDS, GRAM and GridFTP for dynamic resource allocation and automatic simulation execution • Simulation Grid (Li 2006) • Dynamic and secure resource sharing • Optimized resource utilization • Collaborative activities and fault tolerance for distributed simulations • Registration services for dynamic model resource discovery • Scheduling services for proper simulation deployment • Grid-facilitated approach • Still relies on a vendor-specific RTI for simulation communication • Requires cross-domain trust and a particular prior security setup for simulations to be conducted across administrative domains, which is very cumbersome

Grid Enabled Approach • Grid service interfaces are provided to enable HLA-based distributed simulations in a Grid environment • One form: a client federate communicates with a federate server using Grid services and the federate server representing the client joins an HLA-based distributed simulation using a vendor-specific RTI • Another form: different federations are executed using a vendor-specific RTI at local sites and Grid service interfaces are defined to link the federations to form a larger federation community • XMSF (Pullen 2005) • Integrates simulations with other applications • Web service interfaces are provided for simulation applications • Simulation communications are through a vendor-specific RTI • Web service APIs for HLA Evolved standard (Moller 2006)

Grid Enabled Approach • GDSA – Grid-based Distributed Simulation Architecture (Zhu 2006) • Grid service interfaces are defined for a federation • Multiple federations can be integrated through the defined Grid service interfaces • Simulation Grid (Chai 2006) • RTI Grid service components provide non-real-time communication services between federations • SSB – Grid-based Simulation Service Bus (Xu 2006) • A remote federate can join a local federation in a LAN • Multiple federations are able to exchange data over the Grid • Drawback of the Grid-enabled approach • Vendor-specific RTI execution environments and communication servers have to be set up beforehand, which lacks flexiblity

Grid Oriented Approach • The RTI is implemented using Grid services according to the HLA specification • The six HLA service groups should be mapped to different Grid services in order to create a Service Oriented Architecture • This approach was raised in Fox’s keynote at DSRT 2005 (Fox 2005)

Non-HLA Approaches • Distributed simulations based on GT3 Core (Zhang 2003) • Separation of simulation resources and simulation applications with a server responsible for the organization of resources • IDSim – Interoperable Distributed Simulation (Fitzgibbons 2004) • A framework that builds upon OGSA to provide distributed simulation services to federated simulators • Grid Aware Time Warp (Iskra 2004) • A framework for executing optimistic protocols in a cluster or Grid environment • Drawback of non-HLA approaches • In most of these systems, there is no standard high level communication protocol defined

Our Projects • A Load Management System for Running HLA-based Distributed Simulations over the Grid (Cai 2002, Cai 2005) Grid Facilitated Approach

Our Projects • Grid Services and Service Discovery for HLA-based Distributed Simulation (Zong 2004) Grid Facilitated Approach

Our Projects • Provisioning for HLA-based Distributed Simulation on the Grid (Xie 2005) Grid Enabled Approach This framework was integrated with HLA RePast to give an HLA Grid RePast platform (Chen 2008)

Our Projects • SOAr-DSGrid: Service-Oriented Architecture for Distributed Simulation on the Grid (Chen 2006) Non-HLA Approach • A component-based distributed simulation framework with two different views of a component • User-level view: • Component-based M&S • Execution view: SOA • Mapping between the two views

Our Projects • A Service Oriented HLA RTI on the Grid (Pan 2007) Grid Oriented Approach • This project creates an SOHR framework, which will be discussed in detail later in the second half of this tutorial

3. Decoupled FederateArchitectures

Decoupled Federate Architectures • Traditional Federate Architectures • Decoupled Federate Architectures • Design • Implementation • Benefits of Decoupled Federate Architectures • Federate migration • Federate replication • Fault tolerance of federate and RTI

Traditional Federate Architectures The CRC (Central RTI Component) manages the whole federation The federates are the simulation components The LRC (Local RTI component) is used by each federate to participate and communicate in the federation RTI interface connects the federate and LRC

Traditional Federate Architectures Role of the Federate Simulates the behavior of the simulation component Handles the local computation Keeps the local state of the simulation component Role of the LRC Handles the local federate’s information exchange and request for time advancement Handles the distributed computation Keeps part of the RTI state which is concerned with the local federate Role of the RTI Interface Deliver RTI services and callbacks

Traditional Federate Architectures Tightly Coupled Architecture The RTI interface is provided by traditional programming language (e.g., C++ or Java) API The LRC is provided by RTI developers as a library The federate is compiled and linked with the LRC Each federate has one and only one tightly coupled LRC The federate and its LRC are executed in the same process, sharing the same memory space

Traditional Federate Architectures Decoupled Federate Architectures Design of Decoupled Federate Architectures Implementation Benefits of Decoupled Federate Architectures Federate migration Fault tolerance of federate and RTI etc Decoupled Federate Architectures

Design of Decoupled Federate Architectures The DRC (Decoupled RTI Component) plays the role of LRC in the traditional federate architecture The federate and its DRC are executed in different processes The federate and its DRC are connected via external communication – IPC, Shared memory, Socket, or Web/Grid Services

RTI Independent Implementation RTI-independent decoupled federate architecture Suitable for different RTI implementations A middleware is added between the federate and DRC Federate manager acts as a client RTI manager acts as a proxy for the federate Federate manager and RTI manager communicate via external communication

RTI Dependent Implementation (a) RTI-dependent decoupled federate architecture Suitable for a particular RTI implementation The Web/Grid Services federate connects to the RTI directly The DRC provides a novel RTI interface via Web/Grid Services HLA API

RTI Dependent Implementation (b) RTI-dependent decoupled federate architectures Suitable for a particular RTI implementation Federate manger is used to support legacy federates that use a traditional RTI interface The DRC provides a novel RTI interface via Web/Grid Services HLA API Federate manager does the translation between the traditional and novel RTI interfaces

Traditional Federate Architectures Decoupled Federate Architectures Design Implementation Benefits of Decoupled Federate Architectures Federate migration Fault tolerance of federate and RTI etc Decoupled Federate Architectures

Benefits of Decoupled Federate Architectures Additional functions can be embedded in decoupled federate architectures for following purposes: Federate migration Fault tolerance of federate and RTI etc

Problems The federate and LRC need to be migrated together Both federate state and RTI state in LRC need to be transferred RTI and other federates should be informed about the migration Federation-wide Save and Restore Federation wide synchronization is needed Complete RTI internal state is saved and restored “Freeze free” Migration Complex protocol needed Message duplication and loss Federate Migration in Traditional Federate Architectures

Federate Migration in Decoupled Federate Architecture Migrating and restarting federates connect to the same RTI manager and DRC The DRC keeps its existing connection to the federation Migrating federate is terminated when restarting one has caught up

Fault Tolerance in Traditional Federate Architectures Problems It is difficult to locate the failure source Federate and the LRC crash at the same time Both the state of federate and the LRC should be recovered The RTI and other federates should be informed to connect to the recovered federate

DS-RT 2008 Tutorial – Distributed Simulation on the Grid