resc.rdg.ac.uk

Key principles In the development of the SGS we were guided by several key principles: Following previous experience with the Inferno operating system [2] we decided that the Styx protocol for distributed systems was an appropriate base for the software. We decided to implement the Styx protocol in Java and use this as the foundation for the SGS system. • The software should be easy to install and use, relying on as few dependencies as possible. • In creating Styx Grid Services, one should not have to modify existing executables. • The system should be able to interoperate with other service types • The software should be lightweight and responsive. • The system should be platform-independent as far as possible. http://www.resc.rdg.ac.uk Data streaming, collaborative visualization andcomputational steering using Styx Grid ServicesJon Blower1 Keith Haines1 Ed Llewellin21 Reading e-Science Centre, ESSC, University of Reading, RG6 6AL2BP Institute for Multiphase Flow, Dept. of Earth Sciences, University of Cambridge, CB3 0EZ http://jstyx.sourceforge.net Summary We present the Styx Grid Service (SGS), a system that allows existing binary executables to be wrapped and exposed as remote services. A major advantage of the SGS architecture is that data can be streamed directly between service instances without the need for encoding in XML and without the data passing through a workflow enactor. Additionally, clients can monitor service data such as progress and status asynchronously, without requiring any incoming ports to be open through the firewall. As we shall show, the SGS architecture can be used in collaborative visualization and computational steering. This poster is a summary of Blower et al., 2005 [1]. Please see this paper and the project website (http://jstyx.sf.net) for more details. Root of the server Available Styx Grid Services Instances of the “sgs2” service Figure 1: The “namespace” (virtual file system) of a Styx Grid Service server. The SGS appears as a hierarchy of files on a file server. To create a new service instance, read from the “clone” file: this returns the ID of the service. Set the parameters of the service by writing to the files in the “params/” directory. Run the service by writing “start” to the “ctl” file. The output of the service is read from the files in the “io/out” directory. The service can be steered by writing to the files in the “steering/” directory. Using the JStyx software to create Styx Grid Services Creating an SGS server Running an SGS server is simply a matter of constructing an XML configuration file (below) containing details such as the locations of the executables to wrap, the input files required and the parameters taken. Then a Java program is run that reads this configuration file and generates the server program. Creating client programs The SGSExplorer program (below) is a “universal client” for Styx Grid Services. It reads information from the server about the required inputs and outputs for a service and automatically generates a simple GUI. Alternatively, the API functions provided in the JStyx software allow custom client programs to be written with little knowledge of the underlying mechanisms. The Styx protocol – everything is a file! The Styx protocol [3] is a well-established protocol for building distributed systems. Styx is a key component of the Inferno and Plan 9 operating systems. In every Styx-based system, all resources are represented as a file, or a set of files. However, in a Styx system the “files” are not always literal files on a hard disk. They can represent a block of RAM, the interface to a program, a physical device, a data stream or indeed anything. Styx can therefore be used as a uniform interface to access diverse resource types. Styx files are organized in a hierarchical filesystem known as a namespace. Since the Styx protocol only has to operate on files, it is very lightweight, containing only 13 commands, such as “open”, “read”, “write” etc. Each resource in a Styx system can be represented very naturally as a URL, e.g. styx://myserver:9876/sensors/temperature might represent a file on a remote server that can be read to provide temperature data from a sensor. These URLs are effectively pointers to Styx resources. Styx systems typically use persistent connections. The client initiates the connection and then messages can pass freely between the client and server until the connection is closed. This use of persistent connections means that clients can receive asynchronous messages without requiring incoming ports to be open through the firewall. This is the basis of the monitoring of service data (progress, status etc) in the SGS architecture. <gridservice name="lbflow" command="/path/to/lbflow -i input.sim" description="Lattice Boltzmann sim."> <streams> <outstream name="stdout"/> <outstream name="stderr"/> </streams> <serviceData> <serviceDataElement name="status"/> <serviceDataElement name="exitCode"/> </serviceData> <inputfiles> <inputfile path="input.sim"/> </inputfiles> </gridservice> Figure 2 (left): Excerpt from XML configuration file for SGS server (right): Screenshot of SGSExplorer application • Streaming data between services in workflows • The original motivation behind the development of the Styx Grid Service was the need to handle large binary datasets in workflows. We wanted to be able to pipe data directly between services, analogous to using the pipe operator on a Unix command line, except that the data will be transferred across the Internet. The methodology is as follows: • We create a binary executable that writes data to its standard output (and maybe standard error too). • In the SGS namespace we create a virtual file that represents the standard output (Fig. 1). When clients read from this file over the network they get the data. • This file can be uniquely identified by a URL, e.g. styx://server:port/sgs1/instances/1/io/out. This URL is a pointer (or reference) to the stream and can be passed around by workflow engines. • Downstream services can download data from this URL. They do not know the difference between downloading from a live data stream and downloading from a static file. They can then pass the data to the standard input of another executable. Collaborative visualization The SGS architecture allows several users simultaneously to view live output from a running program. Any number of clients can download from the same stream at the same time. This makes the construction of collaborative visualization applications straightforward. Computational steering Some programs allow parameters to be changed while the executable is running, allowing the user to see the effects of changing the parameters in a live setting. This is known as "steering". One way to achieve this is to have the executable read the values of these parameters from local files, continuously polling the files for updates as the executable runs. In the SGS framework, these files can be manipulated via the Styx interface (see figure 1), allowing the computation to be steered remotely, usually simultaneously with visualizing the results. Figure 3: Computational steering of a Lattice Boltzmann simulation. The slider in the top right is used to dynamically vary the pressure gradient driving the flow. Wrapping Styx Grid Services as Web Services In order to interoperate with other remote service types, Styx Grid Services can be wrapped, for example as a Web Service. The inputs to the Web Service would specify the values of all the input parameters. The Web Service would then create a new SGS instance, start it running and return the URL to the root of the new service instance (e.g. styx://server:port/sgs1/instances/2/) as its output. This URL can then be passed to other services in a workflow. Downstream services can get the output from the SGS instance from styx://server:port/sgs1/instances/2/io/out. SGSs could similarly be wrapped as WS-Resources: this is currently being investigated by Andrew Harrison of Cardiff University. References [1] J. Blower, K. Haines and E. Llewellin, Data streaming, workflow and firewall-friendly Grid Services with Styx, Proceedings of the UK e-Science All Hands Meeting, September 2005 [2] J. Blower, K. Haines, and A. Santokhee: Composing workflows in the environmental sciences using Inferno, Proceedings of the UK e-Science All Hands Meeting, September 2004 [3] R. Pike, and D. M. Ritchie: The Styx architecture for distributed systems, http://www.vitanuova.com/inferno/papers/styx.html.

resc.rdg.ac.uk

resc.rdg.ac.uk

Presentation Transcript

resc.rdg.ac.uk