Comparative Evaluation of HEP Worker Node Computing Models and Performance Metrics

Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it

Computing model Labm Uni x grid for a regional group Unia CERNTier 1 Lab a UK USAFNAL Tier-1 Tier3 physics department France USABNL Unin Tier-2 Japan CERN Tier 0 Italy  Labc Germany Labb grid for a physics study group  Uniy  Unib Desktop michele michelotto - INFN PD

Computing Needs • Tape Storage: • Very Easy: events  Terabyte • Disk Storage • Easy again: events  Terabyte • (1000x1000 or 1024x1024?) • RAID protected or raw size? • Computing Power • Tricky: Event/sec? Sim or Reco? • MIPS, CernUnit, MHz, Spec, SI2K…. michele michelotto - INFN PD

T1 + T2 cpu budget - LHC michele michelotto - INFN PD

FZK Measurement • In 2001 SPEC with gcc was 80% of the average pubblished data • In 2006 the gap was much wider michele michelotto - INFN PD

The SI2K inflaction • The main problems with SI2000 in our community: it is not proportional to HEP codes performance (as it was) • You can buy processors with huge SI2K number but with a smaller increase in real performances michele michelotto - INFN PD

Nominal SI vs real SI • SI2K results for the last generation processor affected by inflation • So CERN (and FZK) started to use a new currency: SI2K measured with “gcc”, the gnu C compiler and using two flavour of optimization • High tuning: gcc –O3 –funroll-loops–march=$ARCH • Low tuning: gcc –O2 –fPIC –pthread michele michelotto - INFN PD

Nominal SI vs real SI • CERN Proposal: Use as site rating the “Real SI” obtained by SI measured with gcc-low and increased by 50% • Actually this make sense only for a short period of time and for the last generation of processor • Run n copies in parallel • Where n is the number of cores in the worker node • To take in account the drop in performance of a multicore machine when fully loaded. michele michelotto - INFN PD

Too many SI2K • Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz • SI2K nominal: 2929 – 3089 (min – max) • SI2K sum on 4 cores: 11716 - 12536 • SI2K gcc-low: 5523 • SI2K gcc-high: 7034 • SI2K gcc-low + 50%: 8284 michele michelotto - INFN PD

Which is the better? • I started to measure performances of HEP codes on several machines • The goal was to find a “commercial mantained” benchmark to replace SI2K • I compared HEP code with • SI2K pubblished results • SI2K measured with gcc and “CERN” tuning • SI2006 and SI2006 rate pubblished results • SI2006 and SI2006 with gcc4 (32 and 64 bit) michele michelotto - INFN PD

CMS sw SIM and Pythia • CMS Montecarlo simulation (32bit) and Pythia (64bit) show the same performance once normalized • Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour • SI2K pubbished does not match HEP sw • SI2K cern better but not as good as SI2006 michele michelotto - INFN PD

Babar TierA Results • If you normalize by core and clock all new processors have the same performance • Doubling the older generation cpu • SI2006 matches this pattern (pubblished and gcc ratio constant) • SI2000-cern better than SI2K nominal • SI2000 clearly doesn’t work michele michelotto - INFN PD

Many gaps • Easy to find SPEC pubblished result • But only for new machines • Difficult to measure: • Not easy to have machine on loan from Server reseller or producer • Not easy to borrow machine from colleagues • Always for short periods of time • A SPEC run can last 15-20 hours • Need a set of dedicated worker node to make SPEC and HEP application measurement michele michelotto - INFN PD

Cache • In the 80’s the latency (3-10 clock time) • Now latency is 1000s of clock time • Importance of the cache architecture • 1st level, 2nd level, 3rd level • Cache latency • Cache bandwidth • Shared or exclusive? michele michelotto - INFN PD

4 core processor michele michelotto - INFN PD

Intel 54xx michele michelotto - INFN PD

AMD 4core michele michelotto - INFN PD

Load transactional Performance don’t drop in the new 4core processor Clovertown drop wrt Harpwertown A dual core processor keeps only up to Load3 michele michelotto - INFN PD

Perf/watt • AMD Barcelona at 65nm Performance per watt similar to INTEL xeon at 45nm michele michelotto - INFN PD

Cache behaviour • 54xx has lower latency even with bigger cache • The 3 processors behave very differently in the 4MB e 64MB range • If your (HEP) application works in this range you will see a big change of performance changing processor michele michelotto - INFN PD

Memory intel vs amd • Access time very similar • At 1GB (tipical footprint of HEP application) the new AMD behave better • But the new are Xeon 54xx much better than the 53xx michele michelotto - INFN PD

Mem intel vs amd • Who is faster? • It depends on the block size • On the red zones Intel is better. • On the green zone AMD is better michele michelotto - INFN PD

Cache behaviour • We need to study the behaviour of tipical HEP application • Simulation, event generation, Reconstruction, Analysis • To understand how to write more efficient application michele michelotto - INFN PD

Power issues • Power consumption change from one processor to another • Clock, High-K dielectric, Active Power Managements, Clock throttling michele michelotto - INFN PD

Power consumption michele michelotto - INFN PD

An HEP data center • Need to make measurement of Power usage for HEP application • Example: a big Tier2 with 500 boxes needs 100kW • Like the whole CED of INFN Padova • About 800 MWh in one year • Energy cost 0.12 Euro per kWh  Energy bills of 100 kEuro/year • A 10% improvement on Power efficiency means 10 kEuro/year savings • And savings on the infrastructure (power distribution, UPS, Cooling) michele michelotto - INFN PD

Power meter • Need a device to measure Voltage and Current • And logging capabilities • E.g. Fluke 1735 michele michelotto - INFN PD

Financial request • Need to buy a new worker node each time a new processor is released in the dual proc market segment • Only if significantly new features are presents • One or two each for INTEL and AMD per year • 4 kEuro each (dual proc, 2GB/core, 1disk) • 2 box to start with michele michelotto - INFN PD

Manpower • Padova: • Michele Michelotto (Primo Tecnologo) 70% • Alberto Crescente (CTER) 30% • Roberto Ferrari (CTER) 30% • Ferrara: • Alberto Gianoli (Primo Tecnologo): 20% • Bologna: • Franco Brasolin (CTER): 20% michele michelotto - INFN PD

Milestone • 2009 • Undestand SPEC 2006. Propose a new benchmark to replace SI2K • Measure the performance of the current architectures for Montecarlo SIM (evt/sec vs SPEC) • 2009/2010 • Power performances • 2010 • Cache profiling michele michelotto - INFN PD

Question? michele michelotto - INFN PD

Backup slides • Backup Slides michele michelotto - INFN PD

SI2K frozen • SI2K is the benchmark used up to now to measure the computing power of all the HEP experiments • Computing power requested by experiment • Computing power provided by a Tier-[0,1,2] • SI2K is the nickname for SPEC CPU Int 2000 benchmark • Came after Spec89, Spec Int 92 and Spec Int 95 • Declared obsolete by SPEC in 2006 • Replaced by SPEC with CPU Int 2006 michele michelotto - INFN PD

Transition problem • Impossible to find SPEC Int 2000 pubblished results for the new processors (e.g. the not so new Clovertown 4-core) • Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) • E.g. Old P4 Xeon, P4, AMD 2xx • You can’t convert from SI2000 to SI2006 but the ratio for x86 architecture is in the 137 – 172 range michele michelotto - INFN PD

Even more • Actually all the gcc results in the previous slide are on i386 (32bit) • if you would like to know how your code is running on 64 bit machine, you can measure Specint INT 2000 with gcc on x86_64. • So the worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz • SI2K nominal: 2929 – 3089 (min – max) • SI2K on 4 cores: 11716 - 12536 • SI2K gcc-low: 6021 • SI2K gcc-high: 6409 • SI2K gcc-low + 50%: 9031 michele michelotto - INFN PD

Atlas • Here 100% is Xeon5160 • Few results for SI2006+gcc but no diff from CMS and babar • Few results also from SI2006 pubblished because of several old architectures • SI2K+gcc not bad • SI2K pubblished heavily overstimate new Xeon • Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) michele michelotto - INFN PD

Comparative Evaluation of HEP Worker Node Computing Models and Performance Metrics

Comparative Evaluation of HEP Worker Node Computing Models and Performance Metrics

Presentation Transcript

Project Name: Project Location: Project Purpose:

Project Auditing Project Termination

URFinancials Project Project Review

URFinancials Project Project Review

Project Triples Project

Project Explorer Project

Project Project name

PROJECT MANAGEMENT Project Termination

Project Title Project #

Project Title Project #

Project Title Project #

PROJECT NAME PROJECT LEADER

Project 5 Final Project

Project Project name

Project Name - Project Kickoff

Project Name: Project Location: Project Purpose: