data management in grid comparative analysis of storage systems in wlcg
Download
Skip this Video
Download Presentation
Data management in grid. Comparative analysis of storage systems in WLCG.

Loading in 2 Seconds...

play fullscreen
1 / 22

Data management in grid. Comparative analysis of storage systems in WLCG. - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Data management in grid. Comparative analysis of storage systems in WLCG. Really Two Data Problems. The amount of data High-performance tools needed to manage the huge raw volume of data Store it Move it Measure in terabytes, petabytes, and ??? The number of data files

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data management in grid. Comparative analysis of storage systems in WLCG.' - lars-martinez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
really two data problems
Really Two Data Problems
  • The amount of data
    • High-performance tools needed to manage the huge raw volume of data
      • Store it
      • Move it
    • Measure in terabytes, petabytes, and ???
  • The numberof data files
    • High-performance tools needed to manage the huge number of filenames
      • 1012filenames is expected soon
      • Collection of 1012 of anything is a lot to handle efficiently
data questions on the grid
Data Questions on the Grid
  • Questions for which you want Grid tools to address
  • Where are the files I want?
  • How to move data/files to where I want?
data intensive applications
Medical and biomedical:

Image processing (digital X-ray image analysis)

Simulation for radiation therapy

Climate studies

Physics:

High Energy and other accelerator physics

Theoretical physics, lattice calculations of all sorts

Material sciences

Data intensive applications
lhc as a data source
LHC as a data source

500 MB/sec

15 PB/year

15 years

slide6

A Model Architecture for Data Grids

Attribute Specification

Replica Catalog

Metadata Catalog

Application/

Data Management

System

Multiple Locations

Logical Collection and Logical File Name

Selected

Replica

Replica

Selection

MDS

SRM commands

Performance

Information and

Predictions

Disk Cache

TapeLibrary

Disk Array

Disk Cache

Replica Location 1

Replica Location 2

Replica Location 3

srm main concepts
SRM: Main concepts

Space reservations

Dynamic space management

Pinning file in spaces

Support abstract concept of a file name: Site URL

Temporary assignment of file names for transfer: Transfer URL

Directory management and authorization

Transfer protocol negotiation

Support for peer to peer request

Support for asynchronous multi-file requests

Support abort, suspend, and resume operations

Non-interference with local policies

storage properties
Storage properties
  • Access Latency (ONLINE, NEARLINE, OFFLINE)
  • Retention Policy (REPLICA, OUTPUT, CUSTODIAL)
use cases
Use cases

Access Latency (ONLINE, NEARLINE, OFFLINE)

Retention Policy (REPLICA, OUTPUT, CUSTODIAL)

logical file name lfn
Logical File Name (LFN)

Also called a User Alias,

In case the LCG File Catalog is used the LFNs are organized in a hierarchical directory-like structure, and they will have the following format: lfn:/grid/<MyVO>/<MyDirs>/<MyFile>

site url and transfer url
Site URL and Transfer URL

Provide: Site URL (SURL)

URL known externally – e.g. in Replica Catalogs

e.g. srm://ibm.cnaf.infn.it:8444/dteam/test.10193

Get back: Transfer URL (TURL)

Path can be different from SURL – SRM internal mapping

Protocol chosen by SRM based on request protocol preference

e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/sto1/dteam/test.10193

One SURL can have many TURLs

Files can be replicated in multiple storage components

Files may be in near-line and/or on-line storage

In a light-weight SRM (a single file system on disk) SURL may be the same as TURL except protocol

third party transfer

Site B

Site A

Third party transfer
  • Controller can be separate from src/dest

Client

Control channels

Server

Server

Data channel

Lecture 4: Grid Data Management

going fast parallel streams

Site B

Site A

Going fast – parallel streams
  • Use several data channels

Control channel

Server

Data channels

Lecture 4: Grid Data Management

interoperability in srm v2 2
Interoperability in SRM v2.2

CASTOR

dCache

Disk

DPM

BeStMan

BNL

SLAC

LBNL

xrootd

Client

User/application

SRB(iRODS)

SDSC

SINICA

LBNL

EGEE

castor architecture

TPDAEMON

(PVR)

CASTOR Architecture

CUPV

VDQM

server

NAME

server

RFIO

Client

VDQM

server

NAME

server

STAGER

RTCPD

RTCPD

(TAPE

MOVER)

RFIOD

(DISK

MOVER)

VOLUME

manager

MSGD

DISK POOL

slide19

- DPM config

  • - All requests (SRM, transfers…)
  • - Namespace
  • - Authorization
  • - Replicas

Very important

to backup !

Standard Storage Interface

Store physical files

DPM

Can all be installed

on a single machine

eos what is it
EOS: What is it ...
  • Easy to use standalone disk-only storage for user
  • and group data with in-memory namespace
  • – Few ms read/write open latency
  • – Focusing on end-user analysis with chaotic access
  • – Based on XROOT server plugin architecture
  • – Adopting ideas implemented in Hadoop, XROOT,
  • Lustre et al.
  • – Running on low cost hardware
  • • no high-end storage
  • – At CERN: Complementary to CASTOR
eos access protocol
EOS: Access Protocol
  • EOS uses XROOT as primary file access protocol
  • – The XROOT framework allows flexibility for
  • enhancements
  • • Protocol choice is not the key to performance as
  • long as it implements the required operations
  • – Client caching matters most
  • • Actively developed, towards full integration in ROOT
  • (rewrite of XRootD client at CERN)
  • • SRM and GridFTP provided as well
  • – BeStMan, GridFTP-to-XROOT gateway
thank you
Thank you

Grid, Storage and SRM. OSG

Managed Data Storage and Data Access Services for Data Grids. M. Ernst, P. Fuhrmann, T. Mkrtchyan DESY

J. Bakken, I. Fisk, T. Perelmutov, D. Petravick Fermilab

dCache. Dmitry Litvintsev, Fermilab. OSG Storage Forum, September 21, 2010

GridFTP: File Transfer Protocol in Grid Computing Networks. Caitlin Minteer

Light weight Disk Pool Manager status and plans. Jean-Philippe Baud, IT-GD, CERN

Storage and Data Management in EGEE, Graeme A Stewart1, David Cameron, Greig A Cowan and Gavin McCance

and many others

ad