Hepix report
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

HEPiX Report PowerPoint PPT Presentation


  • 52 Views
  • Uploaded on
  • Presentation posted in: General

HEPiX Report. Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT Technical Forum/Computing Seminar 11 November 2011. Outline. Meeting organisation ; site reports ( Helge Meinhard ) Networking and security; computing; cloud, grid, virtualisation ( Edoardo Martelli )

Download Presentation

HEPiX Report

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hepix report

HEPiX Report

Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT

Technical Forum/Computing Seminar11 November 2011


Outline

Outline

  • Meeting organisation; site reports (HelgeMeinhard)

  • Networking and security; computing; cloud, grid, virtualisation(EdoardoMartelli)

  • Storage; IT infrastructure (Giuseppe Lo Presti)

  • 20 years of HEPiX (HelgeMeinhard)

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix

HEPiX

  • Global organisation of service managers and support staff providing computing facilities for HEP

  • Covering all platforms of interest (Unix/Linux, Windows, Grid, …)

  • Aim: Present recent work and future plans, share experience, advise managers

  • Meetings ~ 2 / y (spring in Europe, autumn typically in North America)

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix autumn 2011 1

HEPiX Autumn 2011 (1)

  • Held 24 – 28 October at Simon Fraser University, Vancouver, BC, Canada

    • Hosted jointly by TRIUMF, SFU, University of Victoria

    • Excellent local organisation

      • Steven McDonald and his team proved up to expectations for 20th anniversary meeting

    • Nice auditorium

    • Vancouver: very vivid city, all kinds and classes of restaurants. Nice parks, mountains within easy reach…

      • Banquet at 1’100 m altitude in the snow, Grizzly bears not far

  • Special session at the occasion of HEPiX’ 20th anniversary

  • Sponsored by a number of companies

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix autumn 2011 2

HEPiX Autumn 2011 (2)

  • Format: Pre-defined tracks with conveners and invited speakers per track

    • Extremely rich, interesting and packed agenda

    • Judging by number of submitted abstracts, no real hot spot: 8 infrastructure, 8 Grid/clouds/virtualisation, 7 network and security, 6 storage, 4 computing… plus 17 site reports

    • Special track on 20th anniversary with 5 contributions

    • Some abstracts submitted late (Thu/Fri before meeting!), planning difficult

  • Full details and slides:http://indico.cern.ch/conferenceDisplay.py?confId=138424

  • Trip report by Alan Silverman available, too http://cdsweb.cern.ch/record/1397885

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix autumn 2011 3

HEPiX Autumn 2011 (3)

  • 98 registered participants, of which 10/11 from CERN

    • Cass, Lefebure, Lo Presti, Martelli, Meinhard, Rodrigues Moreira, Salter, Schröder, (Silverman), Toebbicke, Wartel

    • Many sites represented for the first time: Canadian T2s, Melbourne, Ghent, Trieste, Wisconsin, Frascati, …

    • Vendor representation: AMD, Dell, RedHat

    • Compare with GSI (spring 2011): 84 participants, of which 14 from CERN; Cornell U (autumn 2010): 47 participants, of which 11 from CERN

    • Record attendance for a North American meeting!

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix autumn 2011 4

HEPiX Autumn 2011 (4)

  • 55 talks, of which 15 from CERN

    • Compare with GSI: 54 talks, of which 13 from CERN

    • Compare with Cornell U: 62 talks, of which 19 from CERN

  • Next meetings:

    • Spring 2012: Prague (April 23 to 27)

    • Autumn 2012: Beijing (hosted by IHEP; date to be decided, probably 2nd half of October)

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Site reports 1 hardware

Site reports (1): Hardware

  • CPU servers: same trends

    • 12...48 core boxes, AMD and Intel mentioned equally frequently, 2...4 GB/core. Some nodes with 128 GB, even 512 GB

    • Quite a number of problems reported with A-brand suppliers and their products

  • Disk servers

    • Still a number of problems in interplay of RAID controllers with disk drives – controllers throwing perfectly healthy drives

    • Severeness of disk drive supply not yet known at HEPiX

  • Tapes

    • A number of sites mentioned T10kC in production (preferred over LTO at major sites such as FNAL)

    • LTO very popular, many sites investigating (or moving to) LTO5

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Site reports 2 software

Site reports (2): Software

  • OS

    • Quite some sites mentioned migration to RHEL 6 / SL 6

      • FNAL hired replacement for Troy Dawson

      • Triggers bug in Nehalem sleep states

    • Windows 7 is in production at many sites

    • Exots: Tru64, Solaris; CentOS

  • Storage

    • Lustre: used at at least 7 sites

    • CVMFS mentioned in at least 6 site reports (of 17)

    • EOS at CMS T1 at FNAL – they are quite happy

    • NFS: GSI getting out; BNL reported bad results with NFS 4.1 tests using Netapp and Bluearc

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Site reports 3 software cont d

Site reports (3): Software (cont’d)

  • Batch schedulers

    • Grid Engine rather popular. All but IN2P3 going for UNIVA version. In fact, not much mention about Oracle this time at all…

    • Some (scalability?) problems with PBSpro / Torque-MAUI, negative comments about PBSpro support

    • Condor, SLURM mentioned – mostly positively

  • Virtualisation

    • Many sites experimenting with KVM, XEN on its way out (often linked with SL5 to SL6 migration)

    • Some very aggressive use of virtualisation (gatekeepers, AFS servers, Condor and ROCKS masters, Lustre MGS, …)

  • Service management

    • FNAL, PDSF migrating from Remedy to Service-now

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Site reports 4 infrastructure

Site reports (4): Infrastructure

  • Infrastructure

    • Cube prototype for FAIR: 2 storeys, 96 racks, PUE 1.07

    • LBNL data centre construction hindered by lawsuits

  • Configuration management

    • Puppet mentioned a number of times

    • Chef, cfengine2/3 used as well

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Site reports 5 miscellaneous

Site reports (5): Miscellaneous

  • Tendency is multidisciplinary labs

    • More focus on HPC and GPU than in HEP

  • IP telephony / VoIP mentioned at least twice

  • Business continuity is a hot topic for major sites

    • Dedicated track at next meeting

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Hepix report

Report from HEPiX 2011:

Computing, Networking, Security, Clouds, Virtualization

Geneva – 11th November 2011

[email protected]


Hepix report

Computing

14


Amd interlagos

AMD Interlagos

New AMD 16 cores processor : Interlagos

Interlagos with Bulldozer design: two parallel threads, extended instruction set, power efficiency (unused core are switched off), best value per unit.

Better to add cores rather than Hz: 50% more performance require 3 times the power.

Evolution:

2005: 2 cores 1,8-3.2GHz 7-13 Gflops, 95W

2007: 3 cores 1.9-2.5GHz 30-20 Gflops, 95W

2008: 4 cores 2.5-2.9GHz 40-46Gflops 95W

2009: 6 cores 1.8-2.8GHz 43-67Gflops 95W

2010: 8-12 cores, 1.8-2.6G 58-120Gflops 105W

15


Intel sandy bridge dell stampede

Intel Sandy Bridge/Dell Stampede

DELL is building Stampede. It will be among the top ten of supercomputers

Commissioned by TACC (Texas Advancedd Comp Centre); 27.5M USD from NSF.

10petaflops peak. 12800 Intel Sandy Bridge. 272TB of memory. 14PB of storage, with 150GBps Lustre file system.

Intel Sandy Bridge can execute one floating point instruction per clock cycle. Will be available 2012Q1.

Intel MIC architecture: Many cores with many threads per core.

HEPspec: AMD Interlagos has slower single core speed, but the total processor power is higher (16 cores vs 8 of Sandy bridge).

16


Cpu benchmarking at gridka

CPU benchmarking at Gridka

Presented the new generation of chip AMD Interlagos (16 core) and Intel Sandy Bridge (8cores)

Benchmarking for tenders is difficult because performance vary depending on the version of the software used and on the OS type (32 or 64 bits).

17


Observations

Observations

While the aggregated computing capacity of processors are increasing, the single core is getting slower. Thus, single thread application will be executed slower than before. To take advantage of new processors, the applications have to be rewritten to support multi thread.

CPU power more abundant than disk space and network bandwidth.

18


Hepix report

Networking and Security

19


Lhcone

LHCONE

The WLCG computing model is changing and moving towards a full mesh correlation of the sites.

LHCONE is the network dedicate which will interconnect Tier1s and major Tier2s and Tier3s.

LHCONE is a network built on top of Open Exchange Points interconnected by long distance links provided by R&E Network Operators.

A work in progress

20


Ipv6 at cern and fzu

IPv6 at CERN and FZU

IPv6 deployment has started at CERN and FZU

IPv6 is still lacking some functionality, but it will be necessary.

Changes to management tools will require time and money

It's not only a matter of the network department: developers, sys admins, operations will have to act.

21


Hepix ipv6 wg

HEPiX IPv6 WG

16 groups from Europe and the US and one experiment (CMS) have joined the WG.

Testbed activity: an IPv6 VO hosted by INFN have been created with five connected sites. Test of grid data transfer will start next month. If OK, CMS will do data transfer tests from December.

Gap analysis activity: the WG will perform a gap analysis about readiness of grid applications. A survey is being prepared.

Collaboration with EGI for a source code checker.

22


Computer security

Computer Security

Attackers are becoming professionals, motivated by profits.

Trust is being compromised:

- Certification Authorities compromised

- Social networks used to drive to malicious sites

- Popular web sites used to spread infections

- Governments using spying softwares

Smartphones easier to compromise than personal computer

HEP is also a target: CPU power needed to coin bitcoins.

Primary infection vector: stolen accounts.

23


Computer security1

Computer Security

Attackers are becoming professionals, motivated by profits.

Trust is being compromised:

- Certification Authorities compromised

- Social networks used to drive to malicious sites

- Popular web sites used to spread infections

- Governments using spying softwares

Smartphones easier to compromise than personal computer

HEP is also a target: CPU power needed to coin bitcoins.

Primary infection vector: stolen accounts.

24


Ipv6 security

IPv6 Security

IPv6 has many security weakness:

- by design: it was designed when many IPv4 weaknesses hadn't been yet exploited.

- by implementation: many stacks are still partially implemented; specs and RFCs are often inconsistent.

- by configuration: with dual-stack, using two protocols at the same time may help to evade packet inspection.

The huge address space is more difficult to control or block.

Everything will have to be verified.

25


Observations1

Observations

Jefferson Lab was hacked: undetected for 6 weeks, offline for 2 weeks, long time to go back to full speed

Lot of interest on LHCONE

Most (all) of the new servers come with 10G NIC; thus lot of sites are buying “cheap”, high density 10G switches.

No mention of 40G or 100G

Not many planning for IPv6, although lot of interest.

26


Hepix report

Grids, Clouds and Virtualization

27


Clouds and virtualization

Clouds and Virtualization

Presented several tools for cloud management:

- Cloudman

- OpenNebula

- Eucalyptus

- Openstack

Lxcloud: several tools and hypervisors evaluated (OpenNebula, Openstack, LSF, Amazon EC2)

Clouds and virtualization at RAL: Hyper-V was chosen. Now evaluating OpenStack and StratusLab

Virtualization WG: working on policy and tools for image distribution.

28


Observations2

Observations

No clear best/preferred tool

Many activities on going

29


Hepix report

Thank you

30


Hepix fall 2011 highlights

HEPiX Fall 2011 Highlights

IT infrastructure

Storage

Giuseppe Lo Presti / IT-DSS

CERN, November 11th, 2011


It infrastructure

IT Infrastructure

  • 8 Talks

    • CERN Computing Facilities

    • Deska, a Fabric Management Tool

    • Scientific Linux

    • SINDES: Secure file storage and transfer

    • Use of OCS for hw/sw inventory

    • Configuration Management at GSI

    • Hardware failures at CERN

    • TSM Monitoring at CERN

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Cern cc

CERN CC

  • An overview of the current status and plans of the CC

    • Cooling issues (similarly to most sites): addressed for now by increasing room temperature and using outside fresh air

      • Estimated gain: ~GWh per year!

    • Civil engineering works well advanced, to finish by December 2011

    • Some water leaks…

      • Luckily without any serious consequence to equipment

    • Large scale hosting off-site: call for tender is out

  • And an overview of the most common failures in the CC

    • Largely dominating: hard drive failures

      • MTTF measured at 320 khours, specs say 1.2 Mhours

    • A rather long debate after the talk…

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Fabric management

Fabric Management

  • Different solutions in different centres…

    • DESKA at FZU, Prague

    • Chef at GSI, Darmstadt

    • OCS for hardware inventory at CERN

      • (I know, not exactly fitting the same scope)

  • Same issue: no one has a clean solution to be happy with

    • Complains span from missing features to scalability issues

    • What follows is an overview of some of the software used at different centres

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Fabric management1

Fabric Management

  • DESKA: a language to describe hw configuration

    • Based on PostgreSQL + PgPython + git for version control

    • CLI, Python binding

    • Not yet deployed, concerns about being ‘too’ flexible

      • You can describe pretty much anything, what is the real effort in describing a CC?

  • Chef at GSI

    • A ‘buzzword bingo’

    • Based on the Ruby language

      • Sysadmins were trained

    • Tried on real life on a brand new batch cluster

  • OCS: an external tool being adopted at CERN to do inventory of computing resources

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Scientific linux

Scientific Linux

  • A “standard” update on SL releases and usage

  • Starting with a quote from Linux Format

    • if it’s good enough for CERN, it’s good enough for us

      • Well, kind of…

  • People

    • Troy Dawson left Fermilab to join RedHat

    • Two new members have joined the team

  • SL 6.1 released in July

  • Overall world-wide usage greatly increasing

    • Mostly SL5, SL6 ramping up, SL3(!) still used

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Secure file storage and transfer

Secure file storage and transfer

  • With SINDES, the Secure INformationDElivery System

  • New version 2

    • To overcome shortcomings with current version

      • E.g. lack of flexibility in authorizations

  • A number of new features

    • E.g. plug-ins for authentication and authorization, versioning

  • To be deployed at CERN during 2012

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Storage and file systems

Storage and File Systems

  • 6 Talks

    • Storage at TRIUMF

    • EMI, the 2nd year

    • Storage WG update

    • Migrating from dCache to Hadoop

    • CASTOR and EOS at CERN

    • CVMFS update

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Storage at triumf

Storage at TRIUMF

  • Disk: 2.1 PB Usable (ATLAS)

  • Tape: 5.5 PB on LTO4 & LTO5 cartridges

    • Using IBM high-density library

    • Quite painful experience during 2010, issues with tapes inventory only fixed after IBM released firmware in Oct 2010

  • Optimizing tape read performance

    • Tapeguy, in-house development

    • Reorders staging requests to minimize mounts

      • Provided they’re large enough. Not always the case…

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Emi status

EMI Status

  • A (partially political) update on EMI by P. Fuhrmann

  • Goal: bringing together different existing grid/DM middleware's, and ensure long term operations

    • However, long term planning still not clear

  • First release just out

  • Highlights on Data Management

    • pNFS is a ‘done deal’

    • WebDAV frontend for LFC and SEs with http redirects

      • Completely ignoring the SRM semantics

    • dCache labs (preliminary): a data access abstraction layer to plug in any storage

      • Working on a proof of concept with Hadoop

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Storage wg update

Storage WG Update

  • Goal: compare storage solutions adopted by HEP

  • Report about recent (October 2011) tests at FZK

    • AFS, NFS, xroot, Lustre, GPFS

    • Use cases: taken from ATLAS and CMS

    • Disclaimer: moving target!

  • Andrei provides details on the setup for each FS

  • Results: quite a number of plots

    • Xroot recovering the (previous) gap

    • CMS use case is CPU bound client-side

  • Next candidate to test: Swift (OpenStack), probably HDFS, …

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Migrating from dcache to hdfs

Migrating from dCache to HDFS

  • Report about the experience at a Tier2

    • UW Madison, part of US CMS

    • 1.1 PB usable storage

  • Very happy with dCache, still willing to migrate to Hadoop

    • And a technical opportunity came in Spring 2011: migrating to Hadoop in less time than converting dCache to Chimera?

    • Many constraints: being rollback capable, idempotent, online to the maximum extent…

    • Exploiting the Hadoop FUSE plugin

  • Took 2 months, one day downtime

    • Now ‘happy’, and able to leverage experience in cloud computing when hiring

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


Castor and eos at cern

CASTOR and EOS at CERN

  • Recap on strategy: CASTOR for the Tier0, EOS for end-user analysis

  • Recent improvements in CASTOR

    • Transfer Manager for disk scheduling

    • Buffered Tape Marks for improving tape migration

  • EOS is being moved into a production service

    • A review of the basic design principles

    • Ramping up installed capacity, migrating CASTOR pools

  • A few comments on EOS

    • J.Gordon: “It seems you like doing many things from scratch”…

    • Support for SRM/BeStMan

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


To conclude

To conclude…

Vancouver Downtown view from Grouse Mountains HEPiX Banquet, October 27th, 2011

HEPiX Fall 2011 Highlights – Giuseppe Lo Presti


20 th anniversary 1

20th anniversary (1)

  • Banquet on Thursday night

    • Warm thanks to Alan for 20 years of pivotal role for HEPiX (“HEPiX elder statesman”)

  • 5 talks on Friday morning – quite some early HEPiX attendants present

    • Alan Silverman: HEPiX from the beginning

    • Les Cottrell: Networking

    • Thomas Finnern: HEPi-X-perience

    • Rainer Toebbicke: 20 years of AFS at CERN

    • Corrie Kost: A personal overview of computing

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


20 th anniversary 2

20th anniversary (2)

  • HEPiX from the beginning – Alan Silverman

    • Learning from previous experience of HEP-wide collaboration on VM and VMS

    • Parallel meetings in Europe and North America until 1995

    • Windows (HEPNT) joined 1997

    • HEPiX working groups: HEPiX scripts; AFS; large cluster SIG; mail; X11 scripts; security; benchmarking; storage; virtualisation; IPv6

    • Another success story: adoption of Scientific Linux HEP-wide

    • Alan’s personal rankings:

      • Most western meeting(s): Vancouver (not much more so than SLAC)

      • Most eastern meeting: Taipei

      • Most northern meeting: Umeå

      • Most southern meeting: Rio (most dangerous as well…)

      • Most secure meeting: BNL

      • Most exotic meeting: Taipei

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


20 th anniversary 3

20th anniversary (3)

  • Alan’s conclusion: HEPiX gives value to the labs for the money they spend

  • Michel Jouvin, current European co-chair: “HEPiX is healthy after 20 years with plenty of topics to discuss for the next 20!”

HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


  • Login