Towards a loosely coupled and scalable component set for scheduling bulk data copying across differe...
1 / 34

United Kingdom: - PowerPoint PPT Presentation

  • Uploaded on

Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs. David Meredith 1 , Stephen Crouch 2 , Peter Turner 3 , Gerson Galang 4 , Ming Jiang 5 , Hung Nguyen 6

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'United Kingdom:' - chesmu

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
United kingdom

Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs.

David Meredith1, Stephen Crouch2, Peter Turner3, Gerson Galang4, Ming Jiang5, Hung Nguyen6

1NGS, Science and Technology Facilities Council, Daresbury Labs, UK,

2OMII-UK, School of Electronics and Comp Sci, University of Southampton, UK,

3University of Sydney, Sydney, Australia,

4Victorian eResearch Strategic Initiative (VeRSI), Victoria, Australia,

5NGS, Science and Technology Facilities Council, Daresbury Labs, UK,

6University of Sydney, Sydney, Australia,

Australia (DataMINX)

United Kingdom:

Overview aims
Overview / Aims scheduling bulk data copying across different storage resources as fault tolerant batch jobs.

  • An open-source project developing a set of loosely coupled components for efficiently brokering data copies between a wide range of (potentially incompatible) storage resources as schedulable, fault-tolerant batch jobs (ftp, gridftp, srb, irods, sftp, file, webdav, srm?).

  • To scale from small embedded deployments to large distributed deployments through an expandable ‘worker-node pool’ controlled through message orientated middleware (MOM, JMS).

  • To maximize data access and transfer efficiency through the strategic placement and subscription of worker-nodes at or between particular data sources/sinks.

  • To be inherently asynchronous and side-step the bandwidth, concurrency and scalability concerns for clients in networks with limited capability relative to the direct connectivity between the source and sink.

  • Aims to address geographical-topological deployment concerns by allowing service hosting to be either centralized (as part of a shared service), or confined to a single institution or domain.

  • Adoption of established design patterns and open source components which are coupled with a proposal for an open standards based messaging protocol.

  • Employs a single port-type document-centric model, with service semantics defined solely by the message model.

United kingdom

DTS Features / Intentions 1 scheduling bulk data copying across different storage resources as fault tolerant batch jobs.

  • 1. Encourage a common messaging model

  • We are engaging with OGF in the definition of an open standard describing a bulk data copy activity with subsequent control and event messages. The aim is to provide a key foundation in addressing the challenges of data management. Ideally standards based; OGF engagement DMI, JSDL, also communications with Globus, Unicore, GridSAM developers (a longer term perspective).

  • Platform independence

  • Includes the worker agent that manages a bulk data copy activity, the message broker, the message channel adapters that enable the different transports and protocols, commons VFS.

  • Adopts well recognized Enterprise Integration Patterns

  • Described in Hohpe and Woolf (2003):Competing Consumers, Service Activator, Selective Consumer, Polling Consumer, Message Driven Consumer, Transport Channel Adapter, Header Based Router.

Dts features intentions 2
DTS Features / Intentions 2 scheduling bulk data copying across different storage resources as fault tolerant batch jobs.

  • Value in the correct framework choice – deploy out of the box features in remoting, scaling, batching:

    • Spring Batch; one of the only open source batch processing frameworks currently available (purportedly the only?). It provides many functions that are essential in batch processing.

    • Spring Integration; supports the EAI patterns identified by Hohpe and Woolf. Importantly it provides a set of inbound and outbound message-channel-adaptors for different integration options, both polling and message driven adapters (e.g. JMS subscription, file/directory polling, RMI, WS, email)

    • Message broker (e.g. Apache ActiveMQ or any JMS 1.2 message-channel MOM broker).

Buffering data via an intermediary when copying between incompatible resources protocols
Buffering data via an intermediary when copying between incompatible resources / protocols

Client provides single interface to different (potentially incompatible) storage resources, e.g. Srb GsiFtp, Ftp, Sftp, iRODS, file, Webdav.

Client brokers between storage resources when third-party transfer is not available.

File operations (list, upload, download, delete, rename)

Client e.g. Portal/Hermes

Get and Put, or Mem buffer Bit pipe

Authentication tokens (un/pw, x509?)



Client side intermediary
Client-Side Intermediary incompatible resources / protocols


Auth tokens only in memory on one computer.

Self contained and interactive.

Extensible for new and emerging resources/protocols.


Software is required that is capable of enacting a data copying activity between a variety of sources and sinks (bit pipe via byte streams or combined get/put).

The client must be constantly available throughout the duration of the transfer.

Buffering of large quantities of data introduces bandwidth and concurrency concerns for clients residing on networks with limited capability (e.g. wireless connectivity) relative to the direct connectivity between the source and sink.

Dts remotely placed worker a gents
DTS – Remotely Placed Worker incompatible resources / protocolsAgents

  • Aim: Strategically place intermediary software agent(s) (e.g. at different institutions, within a network, at a local source/sink) and remotely invoke an appropriate agent using a message router with a ‘Bulk Data Copy Activity’ executed as a fault tolerant batch process. Best practice: process data as close to where it resides as possible.

  • 3 Core DTS Components:

  • Batch/Worker Agent. Software that will mange a bulk data copy activity.Is a batch operation – automated processing of large volumes of information that is most efficiently processed without user interaction (fire + forget).

  • Common Message format that describes a data copying activity with subsequent control and event messages.

    • Lists data sources and sinks.

    • Transfer requirements.

    • User credentials.

  • Message Broker/Router for routing of messages to appropriate workers and scaling via the Competing Consumer pattern .

So that the recipient worker can access the data on behalf of the user.

Dts architecture simplified
DTS Architecture (Simplified) incompatible resources / protocols

Broker between remote sources and sinks


Meta-data system or data catalogue (ICAT) that provides list of data URLs and credentials. OR lightweight file operations directly interacting with source/sink (list, delete, rename)

Queue Channel

Data copy activity message.

Data copy: Get/Put or Bit pipe

Authentication tokens (un/pw, myproxy details)

DTS workers



Dts architecture simplified1
DTS Architecture (Simplified) incompatible resources / protocols

Broker between local source and remote sink (and vice-versa)


Message Bus is a combination of a messaging infrastructure, a common data model and command set to allow differentsystems to communicate through a shared set of interfaces (our message channels).

Facility Queue

Facility / Department Y


Home Lab

Facility / Department X


Deployment strategies

Deployment Strategies incompatible resources / protocols

Small– Local or embedded worker agent

Med – Single worker pool

Large – Multiple worker pools and message router

United kingdom

Source incompatible resources / protocols

Client (Service Activator)




1) Lightweight local worker deployment. The worker agent is invoked by a script or is integrated into an existing application. S = Submit message (bulk copy activity document), C = Control message, e = Event message.

Worker pool

















2) Distributed deployment with a single worker pool.

United kingdom

Worker pool A incompatible resources / protocols




JobQ Router


Worker pool B













ControlQ Router





3) Distributed deployment with a multiple worker pools.

Core component message router broker

Core Component incompatible resources / protocolsMessage Router / Broker

Schedule and route messages to strategically placed worker agents.

Scale with multiple agents using competing consumer pattern.


Scaling incompatible resources / protocols

  • How can the architecture scale for increasing loads ?

  • Scale Out: Competing Consumer Pattern

  • To scale horizontally (or scale out) means to add more nodes to a system.

  • Scale Up: Multi-process Service Activator

  • To scale vertically (or scale up) means to add resources and/or processes to a single node in a system.

Scale out competing consumer pattern
Scale Out – incompatible resources / protocolsCompeting Consumer Pattern

  • Only requirement is that the JMS client and consumer must be able to access the broker .

  • This provides location independence which enables scaling and clustering of services since multiple workers can be configured to pull messages from the same queue.

  • If the service may become overburden and falls behind in its processing, all that is needed is to turn-up a few more worker instances to listen to the queue.

  • Consumers do not have to coordinate with each other which improves resilience, since workers can be added and removed without affecting each other.

Queue depth ok

JMS client


Broker (Queue)

Worker (Consumer)

Basic architecture is repeatable – use multiple brokers and queues as required, (e.g. broker clusters, master slave brokers etc).

Message routing

  • How can the appropriate remote worker(s) be invoked: incompatible resources / protocols

  • How to invoke a worker(s) that resides at the data source and/or sink ?

  • How to invoke a worker(s) that is installed at my institution or within a specific network ?

  • How to target a specific worker ?

  • Multiple Destinations

  • Message Selectors

  • Hybrid Approach

Message Routing

Message routing multiple destinations
Message Routing: Multiple Destinations incompatible resources / protocols

Multiple static/administered queues can be configured on one broker in order to partition workers into different groupings.

Main Advantages: Queue depth is directly related to load. Therefore load balancing can be performed effectively since queues are not polluted with . DTS Should add new queues for different groupings (e.g. project queues, separate queues for different facilities).

Main Disadvantages: Changes are required on the broker to cater for new worker groupings (configuration of new administered queues). This does not provide a high level of decoupling between message producer and consumer since changes are required to the broker.

Worker groups

In DTS, multiple destinations are used to partition static queue consumer cluster groups, e.g. Request Q per facility, beam-line, project, institution etc.

Request Qa

JMS clients

Group A (Facility A)

Request Qb

Group B

(Project B)

Request Qc

Group C

(Institution C)


Message routing message selectors
Message Routing: Message Selectors incompatible resources / protocols

  • Message Selectors - workers can be ‘Selective Consumers‘ and clients can be ‘Specifying Producers’. A message selector is an expression based on SQL92 conditional syntax, e.g.

  • Facility=‘FacilityX‘ AND BeamLine=‘ProteinMX’ AND WorkerAccessKey=‘abcdefadsf_guuid'

  • Filtering is performed by the broker – it delivers only those messages that match the selective consumer’s criteria.

  • Importantly, workers can therefore decide which messages to process depending on their own selector statements.

  • Main benefit is that this approach is extensible: provides for a higher level of decoupling between message producer and receiver since clients and workers can be easily added without change to the broker.

  • Selectors are optional, this pattern can also be combined with multiple destination approach to route messages as required (hybrid approach).

  • Selectors can be used to perform fine-grained routing and route messages however you require, e.g.

  • Route to first available worker in a particular group that specifies a common/shared selector value, e.g. a common ‘groupID’ AND/OR ‘networkID’ AND/OR ‘facilityGroup’ AND/OR ‘domain’ AND/OR ‘GB limit’ etc…. (SQL).

  • Can route to a specific worker using a unique and opaque client identifier/access key, e.g. GUUID (this is ok since the broker performs filtering so different workers don’t see each others selectors). Specifying producer would need to persist this value between server re-starts/different sessions.


Request Q

Selective Consumers

Specifying Producers



Messages with selection values

Message routing hybrid approach
Message Routing: Hybrid Approach incompatible resources / protocols

Best approach is to use a combination of the message filtering approach and the multi-destination approach to suit your service instance requirements.

Each approach is not mutually exclusive and can be used together provided both patterns are catered for in your system.

Request Qa

Request Qb

Request response client worker conversation

Request Response incompatible resources / protocols(Client Worker Conversation)

ReplyTo header

Application ID exchange with message filtering

Temporary queues

Request response conversation
Request Response (Conversation) incompatible resources / protocols

Request message contains a Return Address that indicates where to send the reply.

Return Addressis added to the message header.

Consumer does not need to know where to send the reply, it can just ask the request.

Reply Channel 1

Reply Channel 2

Request Channel

Specifying Producers


Selective Consumer


Reply Channel 1

Reply Channel 2

Variations of this pattern depending on clients requirements:

Further expand the Message Filtering Approach to Exchange client and worker Application IDs. Client can also selectively consume response messages with its own client ID added to request header.

Temporary queue created by the client (lasts only for duration of client session).

Request response conversation using filtering
Request Response (Conversation) using Filtering incompatible resources / protocols

DTS Clients

DTS Workers

Q Consumer Cluster ‘facilityA’

JMS Message Headers

MessageID = guuidA

WorkerGroupID = facilityA

ClientID = DTSClient1

MDP Selective Consumer Pool

on WorkerGroupID = facilityA

NGS Portal (An App. Bounded to facilityA )

MDP Producer Pool

Connected to InvokeClientQ


MDP Selective Consumer Pool

on WorkerID = workerA

DTS Client1


MDP Producer Pool

Connected to JobSumitQ

JMS Message Headers

CorrelationID = guuidA

WorkerID = workerA

ClientID = DTSClient1


MDP Selective Consumer Pool on ClientID = DTSClient1


MDP Producer Pool

Connected to InvokeWorkerQ


Q Consumer Cluster ‘facilityB’

GridSAM (An App. Bounded to facilityB )

JMS Message Headers

CorrelationID = guuidA

WorkerID = workerA

ClientID = DTSClient1

(Exchange of client and worker Application IDs so that recipient worker and client can converse)


Request response conversation using filtering1
Request Response (Conversation) using Filtering incompatible resources / protocols

  • Each JMS client (worker and client) has a unique instance/application ID (clientID, workerID).

    • A client sends a job request and adds its own clientID to the headers (in conjunction with the other headers used in message selection, e.g. MessageID and WorkerGroupID).

    • Worker picks up a message and responds to an administered response queue (not a dynamic queue) via the ReplyTo header and itself returns its own WorkerID and forwards the given ClientID in the message header.

    • Client receives messages from the response queue and filters on ClientID.

    • Client can now converse with the recipient worker since both the client and worker have their respective IDs and can correlate messages on the original message ID using CorrelationID.

  • Using this approach only requires a limited number of administered queues: e.g. JobSumitQ, InvokeClientQ, InvokeWorkerQ .

  • Main benefit is that this approach is extensible: provides for a higher level of decoupling between message producer and receiver since clients and workers easily added without change to the broker.

  • Can also combine this approach with multiple channels as required (hybrid approach).

Core component batch worker agent

Core Component incompatible resources / protocolsBatch / Worker Agent

Enacts the Bulk Data Copy Activity as a fault tolerant batch job for copying between sources and sinks.

Scopes, checkpoints and restarts.

Batch worker agent
Batch / Worker Agent incompatible resources / protocols

  • Role is to enact the data copy activity according to the activity document, report status events and respond to control messages.

  • Copy activity is a batch processing task (automated processing of large volumes of information is most efficiently processed without user interaction).

  • DTS worker based on Spring Batch and Commons VFS (contract driven approach facilitates different implementations e.g. scripts / shelling out to command line client).

  • Spring Batch provides framework for functions that are essential in batch processing e.g. split/monitor/merge, logging/tracing, tx management, processing statistics, job pause and restart, skip, retry, check-pointing.

A Spring Bach implementation deals with breaking apart the business logic and sharing it efficiently between parallel processes or processors as step-jobs.

Core component message model

Core Component incompatible resources / protocolsMessage Model

Bulk Data Copy Activity Document.

Control Messages (stop, start, cancel)

Event Messages (faults, status, instance attributes)

Message model requirements
Message Model Requirements incompatible resources / protocols

  • Document Message

  • Bulk Data Copy Activity description

  • Captures all information required to connect to each source and sink URI and subsequently enact the activity.

  • Transfer requirements e.g. URI Properties, file selectors (reg-expression), scheduling (batch-window), retry count, source/sink alternatives, checksums?, sequential ordering? DAG?

  • Serialized user credentials.

  • Probably adopt/extend the Data End Point Reference (DEPR) construct from DMI. A specialized form of WS-Address element which does not mandate any particular URL/transport scheme, multiple <DataLocations/>

  • Control Messages

  • Interact with a state/lifecycle model (e.g. stop, resume, cancel)

  • Event Messages

  • Standard fault types and status updates

  • Information Model

  • To advertise the service capabilities / properties / supported protocols

Existing in scope specifications
Existing/In-Scope Specifications incompatible resources / protocols

Related Specifications

  • Job Submission Description Language (JSDL)

    • An activity description language for generic compute applications.

  • OGSA Data Movement Interface(DMI)

    • Low level schema for defining the transfer of bytes between and single source and sink.

  • JSDL HPC File Staging Profile (HPCFS)

    • Designed to address file staging not bulk copying.

  • OGSA Basic Execution Service (BES)

    • Defines a basic framework for defining and interacting with generic compute activities: JSDL + extensible state and information models.

  • Neither fully captures our requirements (this is not a criticism of these specs, they are designed to address their existing use-cases which only partially overlap with the requirements for a bulk data copy activity).


  • Condor Stork - based on Condor Class-Ads

  • Glite JDL (again based on a Class-Ads)

  • Not sure if Globus has/intends a similar definition in its new developments (e.g. SaaS) anyone ?

Jsdl data staging 1 and the hpc file staging profile
JSDL Data Staging 1 and the HPC File Staging Profile incompatible resources / protocols











<Credentials> … </Credentials>


define both the source and target within the same <DataStaging/> element which is permitted in JSDL.

However, the HPC File Staging Profile (Wasson et al. 2008), which is an extension to JSDL, limits the use of credentials to a single credential definition within a data staging element. Often, different credentials will be required for the source and the target.

Jsdl data staging 2
JSDL Data Staging 2 incompatible resources / protocols









<Credentials> … </Credentials>









<Credentials> … </Credentials>


Coupled staging elements; A source data staging element for fileA and a corresponding target element for staging out of the same file. By specifying that the input file is deleted after the job has executed, this example simulates the effect of a data copy from one location to another through the staging host.

No multiple data locations (alternative sources and sinks).

More elements required (e.g. transfer requirements, file selectors, uri properties).

Intended for compute and data staging, not really bulk data copying.

Ogsa dmi
OGSA DMI incompatible resources / protocols

The OGSA Data Movement Interface (DMI) (Antonioletti et al. 2008) defines a number of XML constructs for describing and interacting with a data transfer activity.

The data source and destination are each described separately with a Data End Point Reference (DEPRs), which is a specialized form of WS-Address element (Box et al. 2004).

In contrast to the JSDL data staging model, a DEPR facilitates the definition of one or more <Data/> elements within a <DataLocations/> element. This is used to define alternative locations for the data source and/or sink. In doing this, an implementation is then free to select between its supported protocols and retry different source/sink combinations from the available list. This improves resilience and the likelihood of performing a successful data transfer by matching protocols supported by the service.

Depr example
DEPR Example incompatible resources / protocols





<dmi:Data ProtocolUri=""



<other stuff/>


<dmi:Data ProtocolUri="urn:my-project:srm"



<other stuff/>






. . . Similar to above but for the sink . . .


Defines alternative locations for the data source and/or sink.

Dmi cont
DMI cont.. incompatible resources / protocols

There are some limitations:

DMI is intended to describe only a single data transfer operation between one source and one sink. To do several transfers, multiple invocations of a DMI service factory would be required to create multiple DMI service instances.

We require a single (atomic) message packet that wraps multiple transfers that can be delivery transacted, e.g. through a message routers.

Some of the existing constructs require extension / slight modification.

Therefore: DMI v2 strawman proposal at OGF to canvass some new extensions and to propose a new bulk-copy doc that builds on DMI.

Bulk data copy doc and jsdl integration
Bulk Data Copy Doc and JSDL Integration ? incompatible resources / protocols



<jsdl:JobIdentification ... />


<!-- Option a) Embed BulkDataCopy document -->

<other:BulkDataCopy ... />

<!-- If Basic Profile compliance is important -->








<!-- Option b) Stage-in BulkDataCopy document -->








Possible? options for integrating the proposed <BulkDataCopy/> document within JSDL; a) nesting within the <jsdl:Application/> element or b) staging-in of a <BulkDataCopy/> document as input for the named executable - why not ?