1 / 22

Data management in grid. Comparative analysis of storage systems in WLCG.

Data management in grid. Comparative analysis of storage systems in WLCG. Really Two Data Problems. The amount of data High-performance tools needed to manage the huge raw volume of data Store it Move it Measure in terabytes, petabytes, and ??? The number of data files

Download Presentation

Data management in grid. Comparative analysis of storage systems in WLCG.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data management in grid. Comparative analysis of storage systems in WLCG.

  2. Really Two Data Problems • The amount of data • High-performance tools needed to manage the huge raw volume of data • Store it • Move it • Measure in terabytes, petabytes, and ??? • The numberof data files • High-performance tools needed to manage the huge number of filenames • 1012filenames is expected soon • Collection of 1012 of anything is a lot to handle efficiently

  3. Data Questions on the Grid • Questions for which you want Grid tools to address • Where are the files I want? • How to move data/files to where I want?

  4. Medical and biomedical: Image processing (digital X-ray image analysis) Simulation for radiation therapy Climate studies Physics: High Energy and other accelerator physics Theoretical physics, lattice calculations of all sorts Material sciences Data intensive applications

  5. LHC as a data source 500 MB/sec 15 PB/year 15 years

  6. A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application/ Data Management System Multiple Locations Logical Collection and Logical File Name Selected Replica Replica Selection MDS SRM commands Performance Information and Predictions Disk Cache TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3

  7. SRM: Main concepts Space reservations Dynamic space management Pinning file in spaces Support abstract concept of a file name: Site URL Temporary assignment of file names for transfer: Transfer URL Directory management and authorization Transfer protocol negotiation Support for peer to peer request Support for asynchronous multi-file requests Support abort, suspend, and resume operations Non-interference with local policies

  8. Storage properties • Access Latency (ONLINE, NEARLINE, OFFLINE) • Retention Policy (REPLICA, OUTPUT, CUSTODIAL)

  9. Use cases Access Latency (ONLINE, NEARLINE, OFFLINE) Retention Policy (REPLICA, OUTPUT, CUSTODIAL)

  10. Logical File Name (LFN) Also called a User Alias, In case the LCG File Catalog is used the LFNs are organized in a hierarchical directory-like structure, and they will have the following format: lfn:/grid/<MyVO>/<MyDirs>/<MyFile>

  11. Site URL and Transfer URL Provide: Site URL (SURL) URL known externally – e.g. in Replica Catalogs e.g. srm://ibm.cnaf.infn.it:8444/dteam/test.10193 Get back: Transfer URL (TURL) Path can be different from SURL – SRM internal mapping Protocol chosen by SRM based on request protocol preference e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/sto1/dteam/test.10193 One SURL can have many TURLs Files can be replicated in multiple storage components Files may be in near-line and/or on-line storage In a light-weight SRM (a single file system on disk) SURL may be the same as TURL except protocol

  12. Site B Site A Third party transfer • Controller can be separate from src/dest Client Control channels Server Server Data channel Lecture 4: Grid Data Management

  13. Site B Site A Going fast – parallel streams • Use several data channels Control channel Server Data channels Lecture 4: Grid Data Management

  14. Interoperability in SRM v2.2 CASTOR dCache Disk DPM BeStMan BNL SLAC LBNL xrootd Client User/application SRB(iRODS) SDSC SINICA LBNL EGEE

  15. Total Online Space Share

  16. Popularity

  17. TPDAEMON (PVR) CASTOR Architecture CUPV VDQM server NAME server RFIO Client VDQM server NAME server STAGER RTCPD RTCPD (TAPE MOVER) RFIOD (DISK MOVER) VOLUME manager MSGD DISK POOL

  18. Basic dCache Design

  19. - DPM config • - All requests (SRM, transfers…) • - Namespace • - Authorization • - Replicas Very important to backup ! Standard Storage Interface Store physical files DPM Can all be installed on a single machine

  20. EOS: What is it ... • Easy to use standalone disk-only storage for user • and group data with in-memory namespace • – Few ms read/write open latency • – Focusing on end-user analysis with chaotic access • – Based on XROOT server plugin architecture • – Adopting ideas implemented in Hadoop, XROOT, • Lustre et al. • – Running on low cost hardware • • no high-end storage • – At CERN: Complementary to CASTOR

  21. EOS: Access Protocol • EOS uses XROOT as primary file access protocol • – The XROOT framework allows flexibility for • enhancements • • Protocol choice is not the key to performance as • long as it implements the required operations • – Client caching matters most • • Actively developed, towards full integration in ROOT • (rewrite of XRootD client at CERN) • • SRM and GridFTP provided as well • – BeStMan, GridFTP-to-XROOT gateway

  22. Thank you Grid, Storage and SRM. OSG Managed Data Storage and Data Access Services for Data Grids. M. Ernst, P. Fuhrmann, T. Mkrtchyan DESY J. Bakken, I. Fisk, T. Perelmutov, D. Petravick Fermilab dCache. Dmitry Litvintsev, Fermilab. OSG Storage Forum, September 21, 2010 GridFTP: File Transfer Protocol in Grid Computing Networks. Caitlin Minteer Light weight Disk Pool Manager status and plans. Jean-Philippe Baud, IT-GD, CERN Storage and Data Management in EGEE, Graeme A Stewart1, David Cameron, Greig A Cowan and Gavin McCance and many others

More Related