1 / 22

INFN-T1 Infrastructure, Network, and Storage Report

This report provides an overview of the current status of INFN-T1's infrastructure, network, data management and storage, farming, and projects and activities.

celestev
Download Presentation

INFN-T1 Infrastructure, Network, and Storage Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff

  2. Outline • Infrastructure • Network • Data management & Storage • Farming • Projects and activities Andrea Chierici

  3. Infrastructure

  4. Current status • Working with 2 power lines but only still only 1 under UPS • Fix of second line delayed • Tender open for storage tank for rainwater • Rerouted water pipes that run below Tier1 floor • Installed surveillance system with 12 cameras and environment monitoring • Setup access control to data center • No turnstile Andrea Chierici

  5. Network

  6. Current Status • LHCOPN+LHCONE shared physical link: now upgraded to 2x 100Gb/s • Upgraded LHCOPN: dedicated link to CERN is now 2x 100Gb/s • A Next-Generation Firewall has been installed on General IP link • Palo Alto PA-5250 Andrea Chierici

  7. Network diagram General IP Next Generation Firewall LHC OPN/ONE 2x10Gb/s PA 5250 Firewall Desk Resources 100Gb/s 100Gb/s Cisco7600 VPC Link Nexus 9516 Nexus 9516 Storage disk servers are connected at 2x 100Gb/s. MostRecent Computing resources 4x40Gb/s Every disk server or farming switch connected to both core switches Nexus 7018 3x40Gb/s (6x40Gb/s in total) Old “single homed” resources Andrea Chierici

  8. Data Management & Storage

  9. Storage resources • Disks: • 2019 pledge: 39 PB • Currently used: 30 PB • Tapes: • 2019 pledge: 89PB • Currently used 60.6PB • Currently installed 69PB • Extendible up to 84PB • Tender for the new library has been published • The library is foreseen to be up and running by Fall 2019 Andrea Chierici

  10. 2018 Storage installation • Installation of the second part of 2017-2018 tender completed in Feb. 2019 • 3 OceanStor 18000v5 systems, ~11.52PB of usable space • 804x 6TB NL-SAS disks each system • 12x 900GB SSD each system • 4 controllers for each system • 4x FDR IB (2x 56Gbps) each contr. • 12 servers (2x FDR IB, 2x 100GbE) Andrea Chierici

  11. Storage in Production • 2 DDN SFA 12K (FDR IB) - 10240 TB • 4 DELL MD3860f (FC16) - 2304 TB • 2 DELL MD3820f (FC16) – SSD for metadata • Huawei OS6800v5 (FC16) - 5521 TB • Huawei OS18000v5 (FDR IB), - 19320 TB Total (On-line): 37385 TB Pledge 2019: 38721 TB Delta: -1336 TB Kind of «Thinprovisioning» via quotas for ~40 collaborations living on shared FS Andrea Chierici

  12. Farming

  13. Running job trend Andrea Chierici

  14. Computing resources • Farm power: approx. 410 KHS06 • 2017 tender power consumption 22 KW vs an average of 13 • CINECA partition has too many cores, hard to use 100%. Reduced job slots to 62 • 2019 tender not out yet: 30 KHS06 • Migration to centos7 complete • If VO does not support centos7, singularity can be used to run on sl6 Andrea Chierici

  15. Deployment of condor pilot • Small test instance running • used to preparepuppetconfiguration classes • New hardware purchasedat end of 2018 will be used to install production CE and Condor-manager • Final solution will be a mix of real and virtual machines • Activity is a priority of farming group • LSF licenses expired 31 dec 2018 • Still usable, but no updates or patches can be applied after that date Andrea Chierici

  16. Projects and activities

  17. Puppet status • INFN-T1 running with puppet v5 • All local modules are compatible • Next step is to update common modules (coming from PuppetLabs) • Once all modules are updated, foreman will be updated to latest • Tests to migrate to puppet v6 will begin immediately after Andrea Chierici

  18. Cloud@CNAF • Extending existing cloud instance (SDDS) to T1 resources • Since the SDDS area is separated, most of the services are duplicated: 2 logical regions • Testbed is working with basic services • All the configurations are tested and automated through puppet • Now moving to production with a small fraction of resources using basic services Andrea Chierici

  19. HPC Farm • 2 clusters • Older: 25 nodes with dual Intel v3 CPUs, InfiniBand interconnection, GPFS shared FS, 4 Nvidia K40 and 4 Nvidia V100 • New: 20 nodes with dual intel v4 CPUs, Omnipath interconnection, GPFS shared FS, 4 Nvidia V100 • Used mainly by LHC VOs, CERN accelerator physics group and local INFN users Andrea Chierici

  20. CDF LTDP • CNAF provides the maintenance of the CDF RUN-2 dataset (4 PB) collected during 2001-2011 (stored on tapes). • 140 TB of CDF data lost during 2017 flood have been successfully re-transferred from FNAL to CNAF via GridFTP protocol. • The «Sequential Access via Metadata» (SAM) data handling tool (developed at FNAL) has been installed on a dedicated SL6 server for CDF data management. • The SAM station performs a real time validation of the checksum stored in an Oracle database • The Oracle CDF database also stores information about specific dataset locations and metadata. • Recent tests showed that analysis jobs using software installed on CVMFS and requesting delivery of files stored on CNAF tapes, work properly. Andrea Chierici

  21. ISMS Area • CNAF got ISO27001 certification in 2017 • Systematic approach to manage sensitive information so that it remains secure (in the sense of confidentiality, integrity and availability). It includes people, processes and IT systems by applying a risk management process • 2 racks right now implement an ISMS • Several scientific collaborations interested • Harmony (Big data in hematology) • AAC (the largest Italian organization for cancer research) • IRST Meldola (Research on cancer) Andrea Chierici

  22. Questions?

More Related