1 / 12

Summary of the HEPiX Spring 2013 Meeting

Summary of the HEPiX Spring 2013 Meeting. Arne Wiebalck Luca Mascetti Luis Fernandez Alvarez CERN ITTF May 17, 2013. HEPiX – www.hepix.org. Global organization of service managers and support staff providing computing facilities for HEP community

shanna
Download Presentation

Summary of the HEPiX Spring 2013 Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary of the HEPiX Spring 2013 Meeting Arne Wiebalck Luca Mascetti Luis Fernandez Alvarez CERN ITTF May 17, 2013

  2. HEPiX – www.hepix.org • Global organization of service managers and support staff providing computing facilities for HEP community • Participating sites include BNL, CERN, DESY,FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF … • Meetings are held twice per year • Spring: Europe, Autumn: U.S./Asia • Exchange of experiences, reports on recent work,work in progress & future plans • Usually no showing-off Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 2

  3. Outline • Miscellaneous, Site reports, Storage (Arne) • IT infrastructure, Computing (Luca) • Virtualization, Networking & Security (Luis) Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 3

  4. HEPiX Spring 2013 • May 15-19 at CNAF, INFN, Bologna (IT) • Very well organized, pretty rich program • Network access: eduroam (Thanks to CS for last minute support!) • 83 registered participants • Administrative hurdles (& illnesses) prevented better participation • Europe: 69, U.S./Canada: 8, Asia: 5, Australia: 1 (CERN: 15) • ~70 presentations from 40 institutes • 3 BoF sessions (OpenAFS/IPv6, CMDBuild, Energy efficiency) • Many offline discussions • Sponsors: WD, DDN, IBM, E4, and Univa Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 4

  5. Next HEPiX Meetings • Autumn 2013 • U Michigan, Ann Arbor, MI, U.S. • Oct 28 – Nov 1, 2013 • Spring 2014 • LAPP, Annecy, France • May 19 – May 23, 2014 • Autumn 2014 • several options, not yet decided Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 5

  6. Updates from the WGs • IPv6 • IPv4 address shortage becoming a serious issue soon • distributed testbed has been set up, more and more sitesjoining, constant testing (file transfer) • Tools & Software Survey, “problematic” applications identified • http://indico.cern.ch/contributionDisplay.py?contribId=35&sessionId=2&confId=220443 • Storage • WG terminated • Summary report at Ann Arbor meeting • Benchmarking • No new SPEC benchmark • Application/benchmark discrepancies become worrying(used for purchases) • Configuration Management • New WG led by Ben Jones (CERN) and Yves Kemp (DESY) Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 6

  7. Some Trends • Batch system reviews everywhere • BNL, CERN, GridKA, NERSC, … • Univa GridEngine seems to take the lead • WNs with HT • Broad use of cloud services & virtualization • Private clouds almost everywhere (mostly OpenStack) • Idle VM detection (FNAL), EC2 spot pricing (BNL) • Puppet taking the lead for configuration mgmt • But: no monoculture expected • Interest in Ceph for VM storage • ASGC, BNL, CERN, RAL, … • At an early stage everywhere Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 7

  8. Site Reports (1) • Storage/File Systems • Lustre sites happy, GSI: 8PB, home-made access control • NFS on BlueArc (BNL: almost 1PB of disk space, home+scratch) • GlusterFS mentioned once • Tape • Mostly Sun SL8500s, some IBM, but also Spectra T-Finity (UiO) • FNAL encountered excessive write errors on new tapes:Contaminated with debris during manufacturingSolution: f/w upgrade and change of manufacturing process • Tape access optimization: BNL’s developed tape scheduler in HPSS • Authentication • FNAL looking into consolidation of authentication setup:MIT Kerberos + CA, two separate AD domainsPlan to be presented a next HEPiX Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 8

  9. Site Reports (2) • Software • SL5 still the mostly used OS for compute clusters (move to SL6 planned) • BNL successfully uses ORACLE/ksplice (rebootless kernel patching) since about 2 years on their production clusters • Hardware • Dell systems dominate (PowerEdge R410, R510, R720, C6220, MD3260…)Not only in the U.S. • Infrastructure • NERSC computing facilities will be relocated from Oakland to the new CRT building in BerkeleyFirst systems will move 1Q2015, last will stay until 4Q2016 • Networking • Jumbo frames on LAN are being tried at several sites Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 9

  10. Storage (1) • Track dominated by CERN presentations (7/11) • Mostly reported already on previous ITTF presentations(AFS, CASTOR/EOS, RAID optimizations), or future ones (Ceph) • DPHEP initiative and its impact for HEPiX • Long-term data management a site responsibility • Techniques and policies need cross-site coordination • BoF Session on “OpenAFS & IPv6” • Many sites regard AFS as one of their core services, value itsrobustness and plan to continue using it in the future • Various options to deal with the IPv6 situation were discussed,but not the lack of support is not regarded as a burning issue(at least right now) • The need to gather more information was identified (use cases,traffic maps, prices for an implementation, …) to take an informed decision (before or at next HEPiX) • Peter van der Reest (DESY) and Arne Wiebalck (CERN) to follow up Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 10

  11. Storage (2) • Storage Architecture at CNAF Tier1 • 11PB on disk; 16PB on tape • >10k processes at 20GB/s (LAN) • Few but big, dedicated, replicated storage systems • GPFS + TSM • Whole stack (DDN storage backend nodes, I/O servers,metadata servers, gridFTP servers, StoRM servers,HSM servers) replicated for each experiment • Manageability problems • Huge building blocks (compared to yearly growth) • Small config changes (can) affect performance • Storage re-balancing takes effort and (can) affect performance • “Slow disk” problem: faulty disks (can) affect performance • Evaluating alternatives • Multiport SAS arrays with s/w-RAID? • RAIN (simple EOS-like replication regarded as too expensive)? • EMC Isilon (NAS w/ IB interconnect) under investigation Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 11

  12. Questions? Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 12

More Related