Bulk data copy generalization some dmi jsdl overlap this indeed might be out of scope of jsdl
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL) PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL). Extensibility options / possibly some new requirements for recursive file/dir copying between multiple sources and sinks ?. In-Scope. Job Submission Description Language (JSDL)

Download Presentation

Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bulk data copy generalization some dmi jsdl overlap this indeed might be out of scope of jsdl

Bulk Data Copy Generalization

Some DMI/JSDL overlap

(this indeed might be out of scope of JSDL)

Extensibility options / possibly some new requirements for recursive file/dir copying between multiple sources and sinks ?


In scope

In-Scope

  • Job Submission Description Language (JSDL)

    • An activity description language for generic compute applications.

  • OGSA Data Movement Interface(DMI)

    • Low level schema for defining the transfer of bytes between and single source and sink.

  • JSDL HPC File Staging Profile (HPCFS)

    • Designed to address file staging not bulk copying.

  • OGSA Basic Execution Service (BES)

    • Defines a basic framework for defining and interacting with generic compute activities: JSDL + extensible state and information models.

  • Others that I am sure that I have missed ! (…ByteIO)

  • Neither fully captures our requirements (not a criticism, they are designed to address their use-cases which only partially overlap with the requirements for our bulk data copy activity).

    Other

  • Condor Stork - based on Condor Class-Ads

  • Not sure if Globus has/intends a similar definition in its new developments (e.g. SaaS) anyone ? – I believe Ravi was originally supportive of a DMI for data transfers between multiple sources/sinks


Stork condor class ads

Stork – Condor Class Ads

Example of a Stork job request:

[ dest_url= "gsiftp://eric1.loni.org/scratch/user/";

arguments = ‐p 4 dbg ‐vb";

src_url = "file:///home/user/test/";

dap_type = "transfer";

verify_checksum = true;

verify_filesize = true;

set_permission = "755" ;

recursive_copy = true;

network_check = true;

checkpoint_transfer = true;

output = "user.out";

err = "user.err";

log = "userjob.log";

]

  • Purportedly the first batch scheduler for data placement and data movement in a heterogeneous environment . Developed with respect to Condor

  • Uses Condor’s ClassAd job description language and is designed to understand the semantics and characteristics of data placement tasks

  • Recent NSF funding to develop as a production service


Jsdl data staging 1 and the hpc file staging profile

JSDL Data Staging 1 and the HPC File Staging Profile

<jsdl:DataStaging>

<jsdl:FileName>fileA</jsdl:FileName>

<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>

<jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination>

<jsdl:Source>

<jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI>

</jsdl:Source>

<jsdl:Target>

<jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI>

</jsdl:Target>

<Credentials> … </Credentials>

</jsdl:DataStaging>

  • Define both the source and target within the same <DataStaging/> element which is permitted in JSDL.

  • The HPC File Staging Profile (Wasson et al. 2008), limits the use of credentials to a single credential definition within a data staging element. Different credentials will be required for the source and the target.

  • Maybe profile use of credentials within JSDL Source and Target ?


Bulk data copy generalization some dmi jsdl overlap this indeed might be out of scope of jsdl

<jsdl:DataStaging>

<jsdl:FileName>fileA</jsdl:FileName>

<jsdl:FilesystemName>MY_SCRATCH_DIR</jsdl:FilesystemName>

<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>

<jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination>

<jsdl:Source>

<jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI>

</jsdl:Source>

<Credentials> e.g. MyProxyToken </Credentials>

</jsdl:DataStaging>

<jsdl:DataStaging>

<jsdl:FileName>fileA</jsdl:FileName>

<jsdl:FilesystemName>MY_SCRATCH_DIR</jsdl:FilesystemName>

<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>

<jsdl:Target>

<jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI>

</jsdl:Target>

<Credentials> e.g. wsa:Username/password token </Credentials>

</jsdl:DataStaging>

Staging 2

  • Coupled staging elements; A source data staging element for fileA and a corresponding target element for staging out of the same file. By specifying that the input file is deleted after the job has executed, this example simulates the effect of a data copy from one location to another through the staging host.

  • No multiple data locations (alternative sources and sinks – we think this is kinda useful).

  • Some more (proprietary?) elements required (e.g. DMI transfer requirements, file selectors, URI connection properties).


Ogsa dmi

OGSA DMI

  • The OGSA Data Movement Interface (DMI) (Antonioletti et al. 2008) defines a number of elements for describing and interacting with a data transfer activity.

  • The data source and destination are each described separately with a Data End Point Reference (DEPRs), which is a specialized form of WS-Address element (Box et al. 2004).

  • In contrast to the JSDL data staging model, a DEPR facilitates the definition of one or more <Data/> elements within a <DataLocations/> element. This is used to define alternative locations for the data source and/or sink.

  • An implementation can select between its supported protocols and retry different source/sink combinations from the available list (improves resilience and the likelihood of performing a successful copy).

  • There are some limitations:

  • DMI is intended to describe only a single data copy operation between one source and one sink. To do several transfers, multiple invocations of a DMI service factory would be required to create multiple DMI service instances. We require a single (atomic) message packet that wraps multiple transfers (e.g. for routing through a message broker).

  • Some of the existing constructs require extension / slight modification.

  • Therefore: DMI/JSDL discussion at OGF to canvass some new possible? Extensions. Maybe build on DMI, and/or closer integration with JSDL data staging to describe a bulk copy activity.


Bulk data copy generalization some dmi jsdl overlap this indeed might be out of scope of jsdl

  • <other:BulkDataCopy>

  • <other:DataCopy id=“transfer1”> +

    • <dmi:SourceDataEPR>

    • <wsa:Address>http://www.ogf.org/ogsa/2007/08/addressing/none</wsa:Address>

    • <wsa:Metadata>

    • <dmi:DataLocations>

    • <dmi:DataProtocolUri="http://www.ogf.org/ogsadmi/2006/03/im/protocol/gridftp-v20"

    • DataUrl="gsiftp://example.org/name/of/the/dir/">

    • <dmi:Credentials><other:MyProxyToken/></dmi:Credentials>

    • <other:stuff/>

    • </dmi:Data>

    • <dmi:DataProtocolUri="urn:my-project:srm"

    • DataUrl="srm://example.org/name/of/the/dir/">

    • <dmi:Credentials><wsse:UsernameToken/></dmi:Credentials>

    • <other:stuff/>

    • </dmi:Data>

    • </dmi:DataLocations>

    • </wsa:Metadata>

    • </dmi:SourceDataEPR>

    • <dmi:SinkDataEPR> . . . Sink Details. . . </dmi:SinkDataEPR>

  • </other:DataCopy>

  • <dmi:TransferRequirements>       <dmi:StartNotBefore/> ?        <dmi:EndNoLaterThan/> ?        <dmi:StayAliveTime/> ?        <dmi:MaxAttempts/> ?  </dmi:TransferRequirements>

  • </other:BulkDataCopy>

  • Bulk DMI Draft

  • A pseudo-example

  • Some overlap with jsdl data staging

Source

(wsa:EndpointReference type)

Sink

(wsa:EndpointReference type)

DEPR defines alternativelocations for the data source and/or sink and each <Data/> nests its own credentials.

Transfer

Requirements (needs extending)


Bulk data copy and jsdl integration

Bulk Data Copy and JSDL Integration ?

<jsdl:JobDefinition>

<jsdl:JobDescription>

<jsdl:JobIdentification ... />

<jsdl:Application>

<!-- Possibility? a) Embed BulkDataCopy document -->

<other:BulkDataCopy ... />

<!-- Possibility? b) Stage BulkDataCopy doc and name copy agent -->

<jsdl-hpcpa:HPCProfileApplication>

<jsdl-hpcpa:Executable>/usr/bin/datacopyagent.sh<jsdl-hpcpa:Executable>

<jsdl-hpcpa:Argument>‘myBulkDataCopyDoc.xml’</jsdl-hpcpa:Argument> . . .

</jsdl-hpcpa:HPCProfileApplication>

</jsdl:Application>

<jsdl:Resources>

<jsdl:DataStaging>

<jsdl:FileName>myBulkDataCopyDoc.xm</jsdl:FileName> . . .

</jsdl:DataStaging>

</jsdl:Resources>

</jsdl:JobDescription>

</jsdl:JobDefinition>

Some (sketchy) integration options?

Possible? options for integrating the proposed <BulkDataCopy/> document within JSDL; a) nesting within the <jsdl:Application/> element or b) staging-in of a <BulkDataCopy/> document as input for the named executable? (ideas, advice…).


Or new staging requirements

Or New staging requirements ?

JSDL intended to be a generic compute activity description language.

Rather than use a separate document to describe a bulk data copy activity, is it better to suggest some JSDL extensions to cater for bulk copying ? (ideas, advice…)

Potentially a better route for more widespread adoption (e.g. existing BES implementations).

Other thoughts: Orchestration of copy activities / DAG ?


Bulk data copy generalization some dmi jsdl overlap this indeed might be out of scope of jsdl

Cancelled

Running:

Transferring

Pending

Finished

Suspend () Request

Resume ()Request

Failed:

Clean

Unclean

Unknown

Running:

Suspended

BES and DMI sub-state specialisations ?

  • Profile the OGSA BES state model for DMI sub-state specializations.

  • Adds optional DMI sub-state specializations. Client/service may only recognize the main BES states if necessary.

  • Suspend, resume, cancel.

  • Add DMI fault types?

Cancel ()

BES states

DMI based sub-states


Message model requirements

Message Model Requirements

  • Document Message

  • Bulk Data Copy Activity description

  • Capture all information required to connect to each source URI and sink URI and subsequently enact the data copy activity.

  • Transfer requirements, e.g. additional URI Properties, file selectors (reg-expression), scheduling parameters to define a batch-window, retry count, source/sink alternatives, checksums?, sequential ordering? DAG?

  • Serialized user credential definitions for each source and sink.

  • Control Messages

  • Interact with a state/lifecycle model (e.g. stop, resume, cancel)

  • Event Messages

  • Standard fault types and status updates

  • Information Model

  • To advertise the service capabilities / properties / supported protocols


  • Login