Data management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Data Management PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Data Management. Kelly Clynes Caitlin Minteer. Agenda. Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable File Transfer Data Replication Replication Location Service Logical File Data Replication Service OGSA. Globus Toolkit.

Download Presentation

Data Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data management

Data Management

Kelly Clynes

Caitlin Minteer


Agenda

Agenda

  • Globus Toolkit

  • Basic Data Management Systems

  • Overview of Data Management

  • Data Movement

  • Grid FTP

  • Reliable File Transfer

  • Data Replication

  • Replication Location Service

  • Logical File

  • Data Replication Service

  • OGSA


Globus toolkit

Globus Toolkit

  • Fundamental enabling technology for the "Grid"

  • Does not provide interactive client

  • Globus Toolkit provides:

    • Server implementation

    • Scriptable command line client

    • Set of development libraries


Basic data management systems

Basic Data Management Systems

  • GridFTP - A uniform, secure, high-performance interface to file-based storage systems on the Grid

  • OGSA-DAI - An OGSA interface for accessing XML and relational data stores


Overview of data management

Overview of Data Management

  • Two Basic Categories of Data Management

    • Data Movement

    • Data Replication


Data movement

Data Movement

  • Globus Reliable File Transfer (RFT) Service

  • Globus GridFTP Tools


Gridftp

GridFTP

  • What is GridFTP?

    • Global Grid Forum

    • GFD.020, RFC 959, RFC 2228, RFC 2389, etc.

    • based upon the Internet FTP protocol

  • Why use GridFTP?

    • Secure

    • Robust

    • Fast

    • Efficient


Gridftp1

GridFTP

  • What does GridFTP Provide?

    • multiple data channels for parallel transfers

    • partial file transfers

    • third-party (direct server-to-server) transfers

    • reusable data channels

    • command pipelining.


Gridftp2

GridFTP

  • Wish to make data available to others

    • Install server on a host that can access the data

    • Make sure that there is an appropriate Data Storage Interface (DSI)

    • To access data that other have made available, you need a GridFTP

      • To add access to files stored behind the GridFTP servers, you need a custom client


Gridftp in gt 4 0

GridFTP in GT 4.0

  • Globus Code

  • Backwards Compatible

  • Stripping support

  • IPV6 support included

  • Modular


Reliable file transfer

Reliable File Transfer

  • Third-party multi-file transfers

  • Exponential back-off on failure

  • Optional use of parallel streams and TCP buffer size tuning

  • Recursive directory transfer


Reliable file transfer service

Reliable File Transfer Service

  • Not a web service protocol

  • Requires that the client maintain an open socket connection to the server throughout the transfer

  • Service interfaced based on web services protocols that persist the transfer state in reliable storage


Reliable file transfer1

Reliable File Transfer

  • Web Service Resource Framework

  • “job – scheduler”

  • You simply:

    • Provide a list of source and destination URLS

      • The service writes your job destination in DB and the moved the files on your behalf

    • Service methods are provided for querying the transfer status


Rft in action

RFT in Action

Grid Service Container

Registry

RFT Factory


Rft in action1

RFT in Action

Grid Service Container

Registry

RFT Factory

Client


Rft in action2

RFT in Action

Grid Service Container

RFT Factory

Client


Rft in action3

RFT in Action

Grid Service Container

RFT Factory

RFT Service Instance

- Start the Instance

- Deserialize XML to Java

- Write Request via JDBC

- Persist Service State

Client


Rft in action4

RFT in Action

Grid Service Container

RFT Factory

RFT Service Instance

- Start the Instance

- Deserialize XML to Java

- Write Request via JDBC

- Persist Service State

Client


Rft in action5

RFT in Action

  • Service is OGSI compliant

  • Uses existing GridFTP protocols and tools to execute 3rd Party Transfer for the user

  • Provides extensive state transition notification

RFT ServiceInstance

GridFTPServer

GridFTPServer


Data replication

Data Replication

  • Replica Location Service (RLS)

    • Provides the ability to keep track of one or more copies or replicas

    • Helpful for users ore applications that need to find where existing files are located on grid

    • Services register files in RLS when files are created


Replica location service

Replica Location Service

  • Provides a framework for tracking the physical locations of data that has been replicated.

  • RLS maps logical names to physical names.

  • Replication of data can:

    • reduce access latency

    • improve data locality

    • increase robustness, scalability and performance for distributed applications.

  • An RLS typically does not operate in isolation, but functions as one component of a data grid architecture.


Replica location service1

Replica Location Service

  • RLS may consist of multiple servers at different sites

  • Able to increase over all scale of system and store more mappings then if just on single, centralized catalog


Replica location service2

Replica Location Service

  • Logical File – unique identifier for the contents of a file

  • Physical File – location of a copy of the file on a storage system

  • User can provide a logical file name to an RLS

  • User can query an RLS server to find name associated with a particular physical file location


Logical file

Logical File

Associations between a logical file name and three replicas on different storage sites


Rls example

RLS Example

Laser Interferometer Gravitational Wave Observatory

  • Detect the existence of gravitational waves

    • Produces millions of data files

    • Eight other sites need to access files

      • Ten physical locations

  • RLS servers at each site


Rls example1

RLS Example

  • Requests the file from LIGO’s Data Management System

    • Lightweight Data Replicator (LDR)

  • LDR queries Replica Location Service for local copy

    • If no local copy is found, returns where it is in the Grid

    • Request to copy file to local storage

    • Registers new copy to local RLS


Higher level data services

Higher Level Data Services

  • Combines two existing data management components

    • RFT

    • RLS


Data replication service drs

Data ReplicationService (DRS)

  • Provides a pull – based replication capability

  • Built on top of two GT data management components

    • Reliable File Transfer

    • Replica Location Service


Data replication service

Data ReplicationService

  • Function

    • To ensure that a specified set of files exist on a storage site

  • Begins by querying RLS to discover where the desired files exist

  • After files are located, creates a transfer request that is executed by RFT

  • DRS then registers the new replicas with RLS


Data replication service1

Data ReplicationService

  • Implemented as a web service and complies with the Web Service Resource Framework

  • When request is received

    • Creates a WS – resource

      • Used to maintain state about each file being replicated

      • Including which operations on the file have failed


Data management

OGSA

  • Open Grid Services Architecture (OGSA)

  • Product of the Grid community

  • Service oriented

  • Provides a pure Java data service framework for accessing and integrating data resources


Data management

OGSA

  • Defines a set of core capabilities and behaviors:

    • uniform exposed service semantics (the Grid service);

    • defines standard devices for creating, naming, and discovering temporary Grid service instances;

    • provides location transparency and multiple protocol bindings for service instances;

    • supports integration with underlying native platform facilities.


Ogsa and gt4

OGSA and GT4

  • Allows multiple data resources to be accessed through a single service.

  • The listResources() operation

  • The data service resource identifiers can then be used by a client to obtain metadata and other information

  • Access to data service resource metadata is provided by an implementation of the WS-ResourceProperties specification.

  • getVersion() operation.

  • A WSRF version of the OGSA-DAI GridDataTransport portType supports asynchronous data delivery between data services.


Summary

Summary

  • GridFTP

  • RLS

  • RFT

  • DRS

  • OGSA


Resources

Resources

  • http://www.globus.org/alliance/publications/papers/ogsa.pdf

  • http://www.globus.org/toolkit/data/

  • http://www.globus.org/toolkit/data/gridftp/

  • http://www.globus.org/toolkit/data/rft/

  • http://www.globus.org/toolkit/data/rls/


  • Login