INFN-T1 site report

INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28th October 2009

Overview Infrastructure Network Farming Storage

Infrastructure

Tier1 2005 vs Tier1 2009

UPS up to 3,8 MW 1.4 MW 1.4 MW 1 MW 15000 V 1.2 MW Chillers

Mechanical and electrical surveillance

Network

RAL PIC TRIUMPH LHC-OPN dedicated link 10Gb/s T0-T1 (CERN) T1-T1 (PIC,RAL,TRIUMPH) 7600 • T1-T1’s (BNL,FNAL,TW-ASGC,NDGF) • T1-T2’s • CNAF General purpose LHC-OPN CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s • Storage Servers • Disk Servers • Castor Stagers Storage Devices Fiber Channel 2x1Gb/s Extreme Summit450 Extreme Summit400 Extreme Summit400 Extreme Summit450 Extreme Summit450 4x1Gb/s Cisco NEXUS 7000 2x10Gb/s Worker Nodes SAN 2x1Gb/s In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s to 10 Gb/s or 2x10Gb/s 4x1Gb/s Worker Nodes Worker Nodes INFN CNAF TIER1 Network WAN GARR 10Gb/s 10Gb/s 2x10Gb/s Exterme BD10808 4x10Gb/s Exterme BD8810

Farming

New tender • 1U Twin solution with these specs: • 2 Intel Nehalem E5520 @2.26GHz • 24GB RAM • 2x 320 GB SATA HD @7200 rpm, • 2x 1Gbps Ethernet • 118 twin, reaching 20500 HEP-SPEC, measured on SLC44 • Delivery and installation foreseen within 2009

Computing resources • Including machines from new tender, INFN-T1 computing power will reach 42000 HEP-SPEC within 2009 • Further increase within January 2010 will bring us to 46000 HEP-SPEC • Within may 2010, we will reach 68000 HEP-SPEC (as we pledged to WLCG) • This basically will triple current computing power

Resource usage per VO

KSI2K pledged vs used

New accounting system • Grid, local and overall job visualization • Tier1/Tier2 separation • Several parameters monitored • avg and max RSS, avg and max Vmem added in latest release • KSI2K/HEP-SPEC accounting • WNoD accounting • Available at: http://tier1.cnaf.infn.it/monitor • Feedback welcome to: farming@cnaf.infn.it

New accounting: sample picture

GPU Computing (1) • We are investigating GPU computing • NVIDIA Tesla C1060, used for porting software and performing comparison tests • https://agenda.cnaf.infn.it/conferenceDisplay.py?confId=266, meeting with Bill Dally (chief scientist and vice president of NVIDIA).

GPU Computing (2) • Applications currently tested: • Bioinformatics: CUDA-based paralog filtering in Expressed Sequence Tag clusters • Physics: Implementing a second order electromagnetic particle in cell code on the CUDA architecture • Physics: Spin-Glass Monte Carlo Simulations • First two apps showed more than 10x increase in performance!!

GPU Computing (3) • We plan to buy 2 more workstations in 2010, with 2 GPU each. • We wait for the FERMI architecture, foreseen for spring 2010 • We will continue the activities currently ongoing and will probably test some monte carlo simulations for superB • We plan to test selection and shared usage of GPUs via grid

Storage

2009-2010 tenders • Disk tender requested • Baseline: 3.3 PB raw (~ 2.7 PB-N) • 1st option: 2.35 PB raw (~ 1.9 PB-N) • 2nd option: 2 PB raw (~ 1.6 PB-N) • Options to be requested during Q2 and Q3 2010 • New disk in production ~ end of Q1 2010 • 4000 tapes (~ 4 PB) acquired with library tender • 4.9 PB needed beginning of 2010 • 7.7 PB probably needed by half 2010

Castor@INFN-T1 • To be upgraded to 2.1.7-27 • 1 Srm v 2.2 end-points available • Supported protocols: rfio, gridftp • Still cumbersome to manage • requires frequent intervention in the Oracle db • Lack of management tools • CMS migrated to StoRM for D0T1

WLCG Storage Classes at INFN-T1 today • Storage Class – offer different levels of storage quality (e.g. copy on disk and/or on tape) • DnTm =n copies on disk and m copies on tape • Implementation of 3 Storage Classes needed for WLCG (but usable also by non-LHC experiments) • Disk0-Tape1 (D0T1) or “custodial nearline” • Data migrated to tapes and deleted from disk whenstaging area full • Space managed by system • Disk is only a temporary buffer • Disk1-Tape0 (D1T0) “replica online” • Data kept on disk: no tape copy • Space managed by VO • Disk1-Tape1(D1T1) “custodial online” • Data kept on disk AND one copy kept on tape • Space managed by VO (i.e. if disk is full, copy fails)‏ CurrentlyCASTOR CurrentlyGPFS/TSM+ StoRM

YAMSS: present status • Yet Another Mass Storage System • Scripting and configuration layer to interface GPFS&TSM • Can work driven by StoRM or stand-alone • Experiments not using the SRM model can work with it • GPFS-TSM (no StoRM) interface ready • Full support for migrations and tape ordered recalls • StoRM • StoRM in production at INFN-T1 and in other centres around the world for “pure” disk access (i.e. no tape) • integration with YAMSS for migrations and tape ordered recalls ongoing (almost completed) • Bulk migrations and recalls tested with a typical use case (stand-alone YAMSS, without StoRM) • Weekly production workflow of the CMS experiment

Why GPFS&TSM Tivoli Storage Manager (developed by IBM) is a tape oriented storage manager widely used (also in HEP world, e.g. FZK) Built-in functionality present in both products to implement backup and archiving from GPFS. The development of a HSM solution is based on the combination of features of GPFS (since v.3.2) and TSM (since v.5.5). Since GPFS v.3.2 the new concept of “external storage pool” extends use of policy driven Information Lifecycle Management (ILM) to tape storage. External pools are real interfaces to external storage managers, e.g. HPSS or TSM HPSS very complex (no benefits in this sense compared to CASTOR)

4 GridFTP servers (4x2 Gbps) 6 NSD servers (6x2 Gbps) on LAN 20x4 Gbps 8x4Gbps SAN ~ 500 TB for GPFS on CX4-960 4 Gbps FC HSM STA HSM STA HSM STA 3x4 Gbps 8 tape drives T10KB: - 1 TB per tape, - 1 Gbps per drive 8x4 Gbps 3x4 Gbps TAN db TSM server YAMSS: hardware set-up 25

YAMSS: validation tests • Concurrent access in r/w to MSS for transfers and from farm • StoRM not used in these tests • 3 HSM nodes serving 8 T10KB drives • 6 drives (at maximum) used for recalls • 2 drives (at maximum) used for migrations • Order of 1GB/s of aggregated traffic • ~550 MB/s from tape to disk • ~100 MB/s from disk to tape • ~400 MB/s from disk to the computing nodes (not shown in this graph) 26

Questions?

INFN-T1 site report