Large scale grid enabled dcache based storage system and service challenge practice at bnl
Download
1 / 27

Large scale, Grid-enabled, dCache-based Storage System and Service Challenge Practice at BNL - PowerPoint PPT Presentation


  • 271 Views
  • Updated On :

Large scale, Grid-enabled, dCache-based Storage System and Service Challenge Practice at BNL. Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab Nov. 12-18 2005, SUPERCOMPUTING 2005, Seattle. Outline. Background

Related searches for Large scale, Grid-enabled, dCache-based Storage System and Service Challenge Practice at BNL

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Large scale, Grid-enabled, dCache-based Storage System and Service Challenge Practice at BNL' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Large scale grid enabled dcache based storage system and service challenge practice at bnl l.jpg

Large scale, Grid-enabled, dCache-based Storage System and Service Challenge Practice at BNL

Zhenping (Jane) Liu

RHIC/ATLAS Computing Facility, Physics Department

Brookhaven National Lab

Nov. 12-18 2005, SUPERCOMPUTING 2005, Seattle


Outline l.jpg
Outline Service Challenge Practice at BNL

  • Background

  • Large scale, Grid-enabled, dCache-based Disk Storage System at BNL

    • Features

    • System architecture

    • Usage of the system

    • Long-term plan

  • Service Challenge Practice


Background l.jpg
Background Service Challenge Practice at BNL

  • BNL RHIC/ATLAS Computing Facility

    • The tier-1 computing center for USATLAS

    • Huge goals

      • The operation of a persistent, production-quality Grid capable of marshalling computing resources and data resources for USATLAS project.

    • Challenge for providing storage services

      • Local and grid-based access to very large datasets in a reliable, cost-efficient and high-performance manner.


Solution for grid enabled storage element at bnl l.jpg
Solution for Grid-enabled Storage Element at BNL Service Challenge Practice at BNL

  • Software

    • dCache (a product of DESY and FNAL)

      • Free

  • Hybrid hardware solution – Cost efficient

    • Majority of dCache servers share resources with a large amount of worker nodes on Linux Farm.

      • Utilize idle disks on worker node

      • Each worker node acts as both storage and compute node.

    • Dedicated good servers for a small amount of servers (which are more critical)


Usatlas dcache system at bnl l.jpg
USATLAS dCache system at BNL Service Challenge Practice at BNL

  • A large grid-enabled storage element in productive-quality

    • Very large disk Cache (dCache) system:

      • 336 servers, 150TB disk space.

    • Providing services to store and access very large amounts of data sets for all local and grid ATLAS users.

      • In production service since November 2004.

    • Reliable, cost-efficient and high-performance manner

    • Grid-enabled (SRM, GSIFTP) Storage Element in the context of OSG and LCG


Usatlas dcache system at bnl cont l.jpg
USATLAS dCache system at BNL (Cont.) Service Challenge Practice at BNL

  • Features

    • Distributed disk caching system as a front-end for Mass Storage System

    • High performance

    • Reliability

    • Support of various access protocols

    • Cost efficient solution

    • Scalability

    • Flexible system tuning


Distributed disk caching system l.jpg
Distributed disk caching system Service Challenge Practice at BNL

  • Distributed disk caching system as a front-end for Mass Storage System (BNL HPSS).

    • Simulating an “infinite” space with tape as backend

    • Allows transparent access to large amount of data files distributed on disk pools or stored on tape.

      • Provides the users with one unique name-space for all the data files.

      • Clever selection mechanism to determine whether the file is already stored on one or more disks or on tape.


High performance l.jpg
High performance Service Challenge Practice at BNL

  • High performance data I/O throughput.

    • Direct client – disk (pool) and disk (pool) – tape (HPSS) connection.

    • High aggregated data I/O

  • Significantly improves the efficiency of connected tape storage systems, through caching, i.e. gather & flush, and scheduled staging techniques.

  • Optimized backend tape prestage batch system.


Reliability l.jpg
Reliability Service Challenge Practice at BNL

  • Load balanced and fault tolerant

    • Automatic load balancing using cost metric and inter pool transfers.

    • Dynamically replicate files upon detection of hot spot.

    • Allow multiple distributed administrative servers for each type

      • e.g., read pools, write pools, access points (doors -- DCAP doors, SRM doors, GridFTP doors).


Support of various access protocols l.jpg
Support of various access protocols Service Challenge Practice at BNL

  • Local access protocol: DCAP (posix like)

  • GsiFTP data transfer protocol

    • Secure Wide Area data transfer protocol

  • Storage Resource Manager Protocol (SRM) - Provide SRM based storage element


Cost efficient l.jpg
Cost efficient Service Challenge Practice at BNL

  • Free software

  • Hybrid hardware model

    • Majority of dCache servers share resources with a large amount of worker nodes on Linux Farm.

      • Utilizing low-cost, locally-mounted disk space on the computing farm

    • Dedicated good servers for a small amount of servers (which are more critical)


Scalability l.jpg
Scalability Service Challenge Practice at BNL

  • High Scalability

    • Distributed Movers and Access Points (Doors)

    • Highly distributed Storage Pools

    • Direct client – disk (pool) and disk (pool) – tape (HPSS) connection.


Flexible system tuning l.jpg
Flexible system tuning Service Challenge Practice at BNL

  • The system determines the source or destination storage pool based on

    • storage group

    • network mask of clients

    • I/O direction

    • “CPU” load

    • disk space

    • configuration


System architecture l.jpg
System architecture Service Challenge Practice at BNL

  • see the next slide


Slide15 l.jpg

GridFTP Clients Service Challenge Practice at BNL

DCap

Clients

SRM Clients

Data Channel

Control Channel

Oak Ridge Batch system

DCap doors

GridFTP doors

SRM door

Write pools

Read pools

HPSS

Pnfs Manager

Pool Manager

DCache System


Usage of the system l.jpg
Usage of the system Service Challenge Practice at BNL

  • Total amount of datasets (only production data counted)

    • 110TB Production data stored (as of 11/03/2005)

  • Exhibiting high performance during a series of Service Challenges and US ATLAS production runs.


Users and use pattern l.jpg
Users and use pattern Service Challenge Practice at BNL

  • Clients from BNL on-site

    • Local analysis application

    • Grid production jobs submitted to BNL

    • Other on-site users

  • Off-site grid users

    • GridFTP clients

      • Grid production jobs submitted to remote sites

      • Other grid users

    • SRM clients


Long term plan l.jpg
Long-term plan Service Challenge Practice at BNL

  • To build petabyte-scale grid-enabled storage element

    • Use petabyte-scale disk space on thousands of farm nodes to hold most recently used data in disk.

      • Altas experiment run will generate data volumes each year on the petabyte scale.

    • HPSS as tape backup for all data.


Long term plan cont l.jpg
Long-term plan (Cont.) Service Challenge Practice at BNL

  • dCache as grid-enabled distributed storage element solution.

  • Issues need to be investigated

    • Is dCache scalable to very large clusters (thousands of nodes)?

    • Will network I/O be a bottleneck for a very large cluster?

    • Monitoring and administration of petabyte scale disk storage system.


Service challenge l.jpg
Service challenge Service Challenge Practice at BNL

  • Service Challenge

    • To test the readiness of the overall computing system to provide the necessary computational and storage resources to exploit the scientific potential of the LHC machine.

  • SC2

    • Disk-to-disk transfer from CERN to BNL

  • SC3 throughput phase

    • Disk-to-disk transfer from CERN to BNL

    • Disk-to-tape transfer from CERN to BNL

    • Disk-to-disk transfer from BNL to Tier-2 centers


Sc2 at bnl l.jpg
SC2 at BNL Service Challenge Practice at BNL

  • Testbed dCache

    • Four dCache pool servers with 1 Gigabit WAN network connection.

  • Meet the performance/throughput challenges (disk-to-disk transfer rate at 70~80MB/sec from CERN to BNL).


One day data transfer of sc2 l.jpg
One day data transfer of SC2 Service Challenge Practice at BNL


Sc3 throughput phase l.jpg
SC3 throughput phase Service Challenge Practice at BNL

  • Steering: FTS Control: SRM Transfer protocol:GridFTP

  • Production dCache system was used with network upgrade to 10 Gbps between USATLAS storage system and BNL BGP router

  • Disk-to-disk transfer from CERN to BNL

    • Achieved rate at 100~120MB/sec with peak rate at 150MB/sec (sustained for one week)

  • Disk-to-tape transfer from CERN to BNL HPSS

    • Achieved Rate: 60MB/sec (sustained for one week)

  • Disk-to-disk transfer testing from BNL to tier-2 centers

    • tier-2 centers: BU, UC, IU, UTA

    • Aggregated transfer rate at 30MB~40MB/sec


Slide25 l.jpg
SC3 Service Challenge Practice at BNL


Top daily averages for dcache sites l.jpg
Top daily averages for dCache sites Service Challenge Practice at BNL


Links l.jpg
Links Service Challenge Practice at BNL

  • BNL dCache user guide website

    • http://www.atlasgrid.bnl.gov/dcache/manuals/

  • USATLAS tier-1 & tier-2 dCache systems.

    • http://www.atlasgrid.bnl.gov/dcache_admin/

  • USATLAS dCache workshop

    • http://agenda.cern.ch/fullAgenda.php?ida=a055146