Hpsearch and naradabrokering workflow scripting and stream management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

Edinburgh December 3 2003. HPSearch and NaradaBrokering: Workflow Scripting and Stream Management. PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org.

Download Presentation

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hpsearch and naradabrokering workflow scripting and stream management

Edinburgh December 3 2003

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

PTLIU Laboratory for Community Grids

Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara

Indiana University, Bloomington IN 47404

http://www.hpsearch.org

http://www.naradabrokering.org

[email protected]


Backdrop

Backdrop

  • Workflow systems have several components

  • Development Environment – Graphical User Interface

  • Specification Language such as BPEL4WS

  • Some interface between specification and runtime (compiler?)

  • Run-time managing linkage of services, error handling and notification

  • This project contributes to

    • Procedural specification of workflow

    • Stream management part of run-time

  • Workflow is sufficiently complex that we ought to agree on a general architecture so we can each build parts and link together


Comments on standards

Comments on Standards

  • In this talk, workflow is synonymous with “Programming the Grid/Internet”

  • We never agreed on a programming model for the simpler case of “Programming a CPU” so not very likely we will agree on standards for workflow

  • We did roughly agree on standards within a particular language Fortran v. Java v C++ v C# v Lisp

  • We also had a little more agreement on common run-time than on languages but not complete


What to remember about this talk

What to remember about this talk

  • All streams (flow between ports of a web service) are handled by publish-subscribe messaging infrastructure

    • Allows robust data transfer with adaptive routing e.g. allows use of GridFTP

    • Supports full concurrency inter and intra streams

  • Data, Streams, Files, Web Services are manipulated by a scripting language analogous to Shell and Perl in UNIX

  • http://www.hpsearch.org has the details

  • Software will be included in open-source release of http://wwww.naradabrokering.org

    • NB Version 0.93 today; 1.0 February 04; 2.0 for SC04 includes HPSearch


Scripting environment i

Scripting Environment I

  • HPSearch is designed as a scripting interface to the Internet (Grid) using currently the Rhino implementation of Javascript

    • Could use Python, Perl

    • Called HPSearch because could access variables either by URI or by search interface (To Google Web Service) but this is not relevant here

  • x = ‘wsdl:http://156.56.104.155:8080/axis/services/Calculator.jws/add(5, 6)‘;

  • Or x = ‘wsdl:WSDL for WS/function(arguments)‘;

  • Returns x=11 if function adds its arguments

  • Could follow by y=x+1 setting y=12 etc.

  • Can access any data in this fashion and support normal capabilities supported in most languages (set x=data and use I/O)

    • Perhaps prefer all I/O to go through Web Services


Scripting environment ii

WS1

WS3

WS4

WS6

WS2

WS5

Script1

Script2

Script3

Scripting Environment II

  • So scripting environment can manipulate its own variables and methods as usual but can also invoke any web service with the wsdl: primitive

  • xpath: primitive evaluates an XPath query against a local variable defined (say by return from a Web service) as an XML instance

  • Can have multiple communicating scripting engines

This is scripting control; workflow is between web services


Naradabrokering

NaradaBrokering

Audio/Video

Conferencing Client

Computer

Modem

Server

Peers

NaradaBrokering Broker

Network

Minicomputer

Firewall

Laptop computer

Workstation

Peers

Audio/Video

Conferencing Client

PDA

Web Service B

Queues

Stream

Web Service A


Grid messaging substrate

Service

Consumer

SOAP+HTTPGridFTPRTP ….

Messaging Substrate

Any Protocol satisfying QoS

Grid Messaging Substrate

SOAP+HTTPGridFTPRTP ….

Standard client-server

style communication.

Consumer

Service

Substrate mediated

communication removes

transport protocol dependence.

Protocols have become overloaded e.g. MUST use UDP for A/V latency requirements but MUSTn’t use UDP as firewall will not support ………


Naradabrokering1

NaradaBrokering

  • Based on a network of cooperating broker nodes

    • Cluster based architecture allows system to scale in size

  • Originally designed to provide uniform software multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems.

  • Now has several core functions

    • Reliable order-preserving “Optimized” Message transport (based on performance measurement) in heterogeneous multi-link fashion with TCP, UDP, SSL, HTTP, and will add GridFTP

    • General publish-subscribe including JMS & JXTA and support for RTP-based audio/video conferencing

    • Distributed XML event selection using XPATH metaphor

    • QoS, Security profiles for sent and received messages

    • Interface with reliable storage for persistent events


Laudable features of naradabrokering

Laudable Features of NaradaBrokering

  • Is open sourcehttp://www.naradabrokering.org

  • Has end-point “plug-in” as well as standalone brokers

  • Will have a discovery service to find nearest brokers and manage topics

  • Does tunnel through many firewalls without requiring ports to be opened

  • Supports JXTA, JMS (Java Message Service) and more powerful native mode

  • Transit time < 1 millisecond per broker

  • Initial version of setup and broker network administration module

    • Currently expect to use HPSearch scripts to specify setup


Naradabrokering naturally supports

NaradaBrokering Naturally Supports

  • Filtering of events to support different client requirements (e.g,. PDA versus desktop, slow lines, different A/V codecs)

  • Virtualization of addressing, routing, interfaces

  • Federation and Mediation of multiple instances of Grid services as illustrated by

    • Composition of Gridlets into full Grids (Gridlets are single computers in P2P case)

    • JXTA with peer-group forming a Gridlet

  • Monitoring of messages for Service management and general autonomic functions

  • Fault tolerant data transport

  • Virtual Private Grid with fine-grain Security model


Naradabrokering communication

NaradaBrokering Communication

  • Applications interface to NaradaBrokering through UserChannels which NB constructs as a set of links between NB Brokers acting as “waystations” which may need to be dynamically instantiated

  • UserChannels have publish/subscribe semantics with XML topics

  • Links implement a single conventional “data” protocol.

    • Interface to add new transport protocols within the Framework

    • Administrative channel negotiates the best available communication protocol for each link

  • Different links can have different underlying transport implementations

    • Implementations in the current release include support for TCP,UDP, Multicast, SSL, RTP and HTTP.

    • GridFTP most interesting new protocol

    • Supports communication through proxies and firewalls such as iPlanet, Netscape, Apache, Microsoft ISA and Checkpoint.


Manipulating streams

Manipulating Streams

  • flow: primitive manages streams between Web services

  • There is service-oriented workflow where streams are typically implicit. Here HPSearch supports UNIX style pipe and tee and we have trivial examples

  • For stream-oriented, the streams are explicit. We have built a sophisticated system GlobalMMCS but it is currently not supported in HPSearch

  • HPSearch will become control engine for NaradaBrokering when streams are “just” message flows on the Grid. Here one would use NB discovery services – find streams – and monitor

    • In this view a client talking to a Web Service is workflow


Hpsearch flow example

y1

y2

z1

z2

x

HPSearch Flow Example

  • // The input file        

  • x = "file:///u/hgadgil/datafile.txt";

  •   // Reverses every line in the i/p e.g. abcd becomes dcba   

  • y1 = "156.56.104.155:5050";

  • // Computes the length of each line minus the last (\n or \r)

  • y2 = "156.56.104.155:6060";

  • // And finally the outputs...        

  • z1 = "file:///u/hgadgil/reversed.txt";        

  • z2 = "file:///u/hgadgil/length.txt";     

  • `flow: x &> (y1 | z1), (y2 | z2)`;

NaradaBrokering Queue

T Pipe Pipe


Another example

q

storage1

y2

z2

Another Example

  • `flow: x &> (y1|z1 &> p,(q|storage1)), (y2|z2|storage2)`;

  • Note this approach allows for example all workflow streams to use RMI, GridFTP, RTP – your or rather NaradaBrokering’s choice

y1

z1

p

x

storage2

NaradaBrokering Topic (Queue)


Stream oriented workflow

Stream–oriented Workflow

  • As in audio-video conferencing and multimedia file delivery where it’s the media streams that are the “point”

  • Services generate and transform streams but one thinks of streams going through services rather than services generating streams

  • Multi-cast streams where video from one client sent to all other participants in a collaborative session common

  • One thinks of a stream being published and participants subscribing to it.

Subscribe

Pub/Sub Queue

Publish


Xgsp web service mcu architecture

Session Server

XGSP-based Control

Media Servers

Filters

NaradaBrokering

All Messaging

Admire

SIP

H323

Access Grid

Native XGSP

XGSP Web Service MCU Architecture

Use Multiple Media servers to scale to many codecs and many

versions of audio/video mixing; should allow all e-Scientists to be connected

WebServices

NB Scales asdistributed

High Performance (RTP)and XML/SOAP and ..

Gateways convert to uniform XGSP Messaging

NaradaBrokering


Service oriented workflow i

Elastic Dislocation Inversion

Viscoelastic FEM

Viscoelastic Layered BEM

Elastic Dislocation

Pattern Recognizers

Fault Model BEM

Service-oriented Workflow I

  • As in follow of data between different simulation programs where one has a program (which becomes a Web service) view and data flow between programs often not explicitly interesting


Service oriented workflow ii

Service-oriented Workflow II

  • Initial input and output files identified with perhaps a visualization as output

  • In many implementations such as ours in earthquake example one writes and reads files for stream interface

    • Sometimes one wants the intermediate output files

  • AVS and such visualization and image processing systems have such a model using streams

  • Multicast not important per-se; use a publish/subscribe mechanism as it is fault-tolerant and higher performance and not because of multi-cast support


Streams and data

Streams and Data

  • Scripting engine can either define topics or find them out from NaradaBrokering discovery service

  • Run-time ensures that all I/O goes through NaradaBrokering

    • Note one either uses a proxy or builds NaradaBrokering interface into Web service

    • Proxy should be near Web Service as only NaradaBrokering “guarantees” firewall penetration, fault-tolerance, performance

    • NaradaBrokering needs improved discovery system

  • NaradaBrokering and Scripts are distributed so no central bottlenecks


Naradabrokering in practice

NB BrokerNetwork

Client

WS

NBEndpt

NBEndpt

NB Streams

NB BrokerNetwork

WS

Client

Proxy

Proxy

NBEndpt

NBEndpt

NB Streams

NaradaBrokering in practice

  • One can “best” insert NaradaBrokering end-point interface into each client or web service

  • But proxy model easiest for existing applications

“Native Communication” – cannot use added value of NBincluding fault tolerance. Current GridFTP Implementation


Entities in hpsearch

Entities in HPSearch

  • Each Script is a Web Service

  • Each Web Service, File, Web Page has a URI and can be accessed by a Script

    • HPSearch at its heart was URI’s bound to Javascript

  • Publish/Subscribe system defines topics which are the URI of streams. Note syntax is often

    • topic://Session URI/stream1 with classic hierarchical labeling

  • Scripts need discovery system to keep track of URI’s and in particular the session URI (which plays role of context) -- currently this is same as NaradaBrokering Discovery System

    • Pub/Sub Streams typically support conversations with related streams topic://Session URI/stream1/WS-A and topic://Session URI/stream1/WS-B to allow Web services A and B to interact


Publish subscribe topics

Publish/Subscribe Topics

  • One has “data” which has perhaps an intrinsic URI

  • For files and web pages, we have as well the location URL

  • I think Publish/Subscribe topic is like the URI for streams and it is instantiated as a particular queue (or set of queues) in NB

  • In NB Topics are integers (for performance), URI style or general XML instances

  • Note that session topic can be thought of as “context” for messages sent to topic as it provides intrinsic information as to meaning of stream (cf. OGSI; WS-Addressing WS-Context WS-Reliable Messaging and WS-Routing)

  • Topics for streams and sessions virtualize destination, routing and context


Role of pub sub queues

Role of Pub/Sub Queues

  • One can think of N/B as providing an operating service to transmit streams between end-points with various value-added capabilities

  • Messages are the units of a stream

  • Events are messages with time-stamps (which could be absent); so events are messages and vice versa

  • Streams are ordered collections of messages

    • NB manipulates streams and collections of streams

    • Delivery is guaranteed order preserving

  • NB provides a virtual stream desktop which you can use to manipulate streams in same way you manipulate files in conventional O/S


Multiple input and output ports

Multiple Input and Output Ports

  • We can deal with Web Services with multiple input and output using an array notation but the &> Tee and | Pipe notation get clumsy

  • So can use explicit notation such as

    • x.port[0].publish = NBTopicA;

    • y1.port[0].subscribe = NBTopicA;

    • y2.port[0].subscribe = NBTopicA;

  • This would also be natural way of implementing stream-oriented workflow

  • Errors and notifications would be easy in this syntax

    • notifyTOPIC = SessionTOPIC + ‘/notify’;

    • x.notify.publish = notifyTOPIC;

    • scriptasaWS.port[1].subscribe = notifyTOPIC;


Hpsearch administrative interface to nb

HPSearch Administrative Interface to NB

  • One can build administrative policies and procedures by flowing administrative and monitoring data to appropriate scripting engines

    • performanceTOPIC = SessionTOPIC + ‘/performance’;

    • nbws = NBDiscover(“aggregateperformancews”)

    • nbws.performancedata.publish = performanceTOPIC;

    • scriptasaWS.port[2].subscribe = performanceTOPIC;

    • Niftyperformanceanalyser(scriptasaWS.port[2]);

    • …….

  • This example pipes performance data from NaradaBrokering and spawns some analysis

  • NB provides for each link (broker to broker, broker to end-point) available bandwidth, used bandwidth, latency etc.


Other nb features to be added to hpsearch

Other NB Features to be added to HPSearch

  • Full details of available Brokers and Stable storage

  • Pending queue sizes

  • Message statistics – size, number per second, time since since last message – at brokers and end-points

  • Current stream sequence number at different parts of pipeline from source to destination

  • Heartbeat Information

  • Active Topics and list of publishers and subscribers (subject to security restrictions)

  • Fault tolerance statistics including those subscribed end-points which are “down”


Hpsearch and naradabrokering workflow scripting and stream management

Mean transit delay for message samples in

NaradaBrokering: Different communication hops

9

hop-2

hop-3

8

hop-5

7

hop-7

6

5

Transit Delay (Milliseconds)

4

3

2

1

0

100

1000

Message Payload Size (Bytes)

Pentium-3, 1GHz,

256 MB RAM

100 Mbps LAN

JRE 1.3 Linux


Hpsearch and naradabrokering workflow scripting and stream management

Average delays per packet for 50 video-clients

NaradaBrokering Avg=2.23 ms, JMF Avg=3.08 ms

60

NaradaBrokering-RTP

JMF-RTP

50

40

30

Delay (Milliseconds)

20

10

0

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Packet Number


Hpsearch and naradabrokering workflow scripting and stream management

Average jitter (std. dev) for 50 video clients.

NaradaBrokering Avg=0.95 ms, JMF Avg=1.10 ms

8

NaradaBrokering-RTP

JMF-RTP

7

6

5

4

Jitter (Milliseconds)

3

2

1

0

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Packet Number


  • Login