Data management in grid comparative analysis of storage systems in wlcg
Download
1 / 22

Data management in grid. Comparative analysis of storage systems in WLCG. - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Data management in grid. Comparative analysis of storage systems in WLCG. Really Two Data Problems. The amount of data High-performance tools needed to manage the huge raw volume of data Store it Move it Measure in terabytes, petabytes, and ??? The number of data files

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data management in grid. Comparative analysis of storage systems in WLCG.' - lars-martinez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Really two data problems
Really Two Data Problems systems in WLCG.

  • The amount of data

    • High-performance tools needed to manage the huge raw volume of data

      • Store it

      • Move it

    • Measure in terabytes, petabytes, and ???

  • The numberof data files

    • High-performance tools needed to manage the huge number of filenames

      • 1012filenames is expected soon

      • Collection of 1012 of anything is a lot to handle efficiently


Data questions on the grid
Data Questions on the Grid systems in WLCG.

  • Questions for which you want Grid tools to address

  • Where are the files I want?

  • How to move data/files to where I want?


Data intensive applications

Medical and biomedical: systems in WLCG.

Image processing (digital X-ray image analysis)

Simulation for radiation therapy

Climate studies

Physics:

High Energy and other accelerator physics

Theoretical physics, lattice calculations of all sorts

Material sciences

Data intensive applications


Lhc as a data source
LHC as a data source systems in WLCG.

500 MB/sec

15 PB/year

15 years


A Model Architecture for Data Grids systems in WLCG.

Attribute Specification

Replica Catalog

Metadata Catalog

Application/

Data Management

System

Multiple Locations

Logical Collection and Logical File Name

Selected

Replica

Replica

Selection

MDS

SRM commands

Performance

Information and

Predictions

Disk Cache

TapeLibrary

Disk Array

Disk Cache

Replica Location 1

Replica Location 2

Replica Location 3


Srm main concepts
SRM: Main concepts systems in WLCG.

Space reservations

Dynamic space management

Pinning file in spaces

Support abstract concept of a file name: Site URL

Temporary assignment of file names for transfer: Transfer URL

Directory management and authorization

Transfer protocol negotiation

Support for peer to peer request

Support for asynchronous multi-file requests

Support abort, suspend, and resume operations

Non-interference with local policies


Storage properties
Storage properties systems in WLCG.

  • Access Latency (ONLINE, NEARLINE, OFFLINE)

  • Retention Policy (REPLICA, OUTPUT, CUSTODIAL)


Use cases
Use cases systems in WLCG.

Access Latency (ONLINE, NEARLINE, OFFLINE)

Retention Policy (REPLICA, OUTPUT, CUSTODIAL)


Logical file name lfn
Logical File Name (LFN) systems in WLCG.

Also called a User Alias,

In case the LCG File Catalog is used the LFNs are organized in a hierarchical directory-like structure, and they will have the following format: lfn:/grid/<MyVO>/<MyDirs>/<MyFile>


Site url and transfer url
Site URL and Transfer URL systems in WLCG.

Provide: Site URL (SURL)

URL known externally – e.g. in Replica Catalogs

e.g. srm://ibm.cnaf.infn.it:8444/dteam/test.10193

Get back: Transfer URL (TURL)

Path can be different from SURL – SRM internal mapping

Protocol chosen by SRM based on request protocol preference

e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/sto1/dteam/test.10193

One SURL can have many TURLs

Files can be replicated in multiple storage components

Files may be in near-line and/or on-line storage

In a light-weight SRM (a single file system on disk) SURL may be the same as TURL except protocol


Third party transfer

Site B systems in WLCG.

Site A

Third party transfer

  • Controller can be separate from src/dest

Client

Control channels

Server

Server

Data channel

Lecture 4: Grid Data Management


Going fast parallel streams

Site B systems in WLCG.

Site A

Going fast – parallel streams

  • Use several data channels

Control channel

Server

Data channels

Lecture 4: Grid Data Management


Interoperability in srm v2 2
Interoperability in SRM v2.2 systems in WLCG.

CASTOR

dCache

Disk

DPM

BeStMan

BNL

SLAC

LBNL

xrootd

Client

User/application

SRB(iRODS)

SDSC

SINICA

LBNL

EGEE


Total online space share
Total Online Space Share systems in WLCG.


Popularity
Popularity systems in WLCG.


Castor architecture

TPDAEMON systems in WLCG.

(PVR)

CASTOR Architecture

CUPV

VDQM

server

NAME

server

RFIO

Client

VDQM

server

NAME

server

STAGER

RTCPD

RTCPD

(TAPE

MOVER)

RFIOD

(DISK

MOVER)

VOLUME

manager

MSGD

DISK POOL


Basic dcache design
Basic dCache Design systems in WLCG.


  • - DPM config systems in WLCG.

  • - All requests (SRM, transfers…)

  • - Namespace

  • - Authorization

  • - Replicas

Very important

to backup !

Standard Storage Interface

Store physical files

DPM

Can all be installed

on a single machine


Eos what is it
EOS: What is it ... systems in WLCG.

  • Easy to use standalone disk-only storage for user

  • and group data with in-memory namespace

  • – Few ms read/write open latency

  • – Focusing on end-user analysis with chaotic access

  • – Based on XROOT server plugin architecture

  • – Adopting ideas implemented in Hadoop, XROOT,

  • Lustre et al.

  • – Running on low cost hardware

  • • no high-end storage

  • – At CERN: Complementary to CASTOR


Eos access protocol
EOS: Access Protocol systems in WLCG.

  • EOS uses XROOT as primary file access protocol

  • – The XROOT framework allows flexibility for

  • enhancements

  • • Protocol choice is not the key to performance as

  • long as it implements the required operations

  • – Client caching matters most

  • • Actively developed, towards full integration in ROOT

  • (rewrite of XRootD client at CERN)

  • • SRM and GridFTP provided as well

  • – BeStMan, GridFTP-to-XROOT gateway


Thank you
Thank you systems in WLCG.

Grid, Storage and SRM. OSG

Managed Data Storage and Data Access Services for Data Grids. M. Ernst, P. Fuhrmann, T. Mkrtchyan DESY

J. Bakken, I. Fisk, T. Perelmutov, D. Petravick Fermilab

dCache. Dmitry Litvintsev, Fermilab. OSG Storage Forum, September 21, 2010

GridFTP: File Transfer Protocol in Grid Computing Networks. Caitlin Minteer

Light weight Disk Pool Manager status and plans. Jean-Philippe Baud, IT-GD, CERN

Storage and Data Management in EGEE, Graeme A Stewart1, David Cameron, Greig A Cowan and Gavin McCance

and many others


ad