Status of reference workloads, packaging and benchmarking

HOW Workshop, 20-03-2019 Status of reference workloads, packaging and benchmarking Domenico Giordano Michele Michelotto Andrea P. Sciabà

Outline • Reference workloads • Containerization of workloads • Benchmarking applications

Reference workloads • All LHC experiments provided and maintain instructions to manually submit jobs representative of the main workloads • Simulation, digitisation, reconstruction, … • But analysis still missing • Instructions are currently in Google docs • Also in gitlab for some of them • The initial goal was to have fixed references to be used for performance studies • They became natural candidates for a HEP benchmark

Description • ALICE (link) (GitLab) • A simple p-p simulation job using Geant3 • ATLAS (link) • A ttbar simulation job using Geant4 (GitLab) • A digitization + reconstruction job • A DxAOD derivation job • CMS (link) • Generation + simulation of ttbar events (GitLab) • Digitization and trigger simulation with premixed pile-up • Reconstruction job producing AODSIM and MINIAODSIM • LHCb (link) • Generation + simulation using Geant4 of D*(→π(D0→Kπ))πππ events (GitLab)

Summary of performance metrics https://cernbox.cern.ch/index.php/apps/files/?dir=/__myprojects/wlcg-cost-model/Workloads/sciaba& Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

Status of benchmarking in WLCG • Still using HS06 since a decade even if scaling with “real” LHC applications is poor • First experiences with SPEC CPU 2017 show that it is no better in this respect • DB12 proposed at some point, but found to be inadequate • Dominated by front-end calls and branch prediction units in the Python interpreter code M. Alef (KIT)

Quantitative comparison • Used Trident (see next talk) to quantitatively compare how the CPU is used by LHC applications vs. synthetic benchmarks • Percentage of time spent in front-end, vs back-end, vs bad speculation • Used as coordinates • Measured L1 distance in 3D space • Memory transactions and bandwidth usage

Hierarchical clustering of benchmarks • All LHC applications clustered together (apart from ALICE gensim) • Rather far from clusters consisting in HS06 or SPEC CPU 2017 benchmarks • Strong argument in favour of building an LHC benchmark using LHC applications

Benchmark requirements • Requirements • No need for network connectivity • Not too large package • Easy to use • Reproducible results • Ingredients • SW repository (CVMFS) • Input data (event and conditions data) • An orchestrator script per workload • Sets the environment • Runs the application • Parses the output to generate scores • Packaging • Containing also the needed SW from CVMFS and input data

Repository and Continuous Integration (CI) • A suite of HEP benchmarks requires stable procedures to build and distribute the benchmarking workloads. We have realized an effective and user-friendly infrastructure, leveraging • CVMFS Trace and Export [5] utilities to export the workloads’ software from cvmfs to local • GitLab CI/CD for fully automated continuous integration • GitLab Registry for container distribution • Experts from the Experiments focus on providing the HEP workloads: software, data, result parser • Experts on benchmarking focus on running the containers and profiling the compute resources • Two container solutions offered: Docker and Singularity • Simple instructions to run the benchmark and produce the results in JSON format • $ docker run -v /some_path:/results $IMAGE$ singularity run -B /some_path:/results docker://$IMAGE

Build Procedure • Starting from Gitlab repo of orchestrator and CI scripts • (1) Build interim HEP WL image • (2) Generate cvmfs traces, running the HEP WL in the container, with cvmfs mounted • (3) Export the accesses cvmfs files to a local folder in the container (standalone) • (4) Validate the standalone container and publish the image (Docker and Singularity) 3 4 2 1

Current status of benchmark • All (GEN-)SIM workloads available as containers • https://gitlab.cern.ch/hep-benchmarks/hep-workloads • https://twiki.cern.ch/twiki/bin/view/HEPIX/HEP-Workloads • DIGI and RECO are WIP • Will also include a ROOT analysis job • Useful to select hardware for analysis workloads • Support from experiment experts • On containerizing the workloads • On how to best extract a score {"copies":10 , "threads_x_copy":4 , "events_x_thread" : 100 , "througput_score": {"score": 0.9910, "avg": 0.0991, "median": 0.0990, "min": 0.0985, "max": 0.1000} , "CPU_score": {"score": 0.2497, "avg": 0.0250, "median": 0.0249, "min": 0.0248, "max": 0.0253} , "app": "CMSSW_10_2_9 GEN-SIM TTBAR”}

Open Data files • The experiments must provide data files to be inserted in the containers • E.g. snapshot of the conditions database to run GEN and SIM • Minimum bias event in the DIGI benchmark • These files should be OPEN ACCESS if we need to distribute the benchmark outside HEPiX/WLCG • E.g. to the vendors that want to participate to a tender for the procurements of future worker nodes • This is a policy issue, not a technical one

Conclusions • Common effort between performance modelling and cost model working group, and the HEPiX benchmarking working group • From simple performance studies to a working and already usable LHC benchmarking suite • Next steps • Add remaining workloads • Finalize the score calculations • Start using for real-world benchmarking

Status of reference workloads, packaging and benchmarking

Status of reference workloads, packaging and benchmarking

Presentation Transcript

Academic Workloads

Type of Workloads

Status of SEY-R benchmarking with 25 ns data

GNSS Reference Station Network Applications – Status and Vision

Benchmarking A Networked Backup System Using Natural Workloads

Bioinformatics Applications and Workloads

Benchmarking in Community Colleges: Status of Two National Projects

Status of the International Celestial Reference Frame

Status of the Reference Link

The Cloud Workloads Archive: A Status Report

Types of Workloads

Status of Reference Network Simulations

Managing Processes and Workloads

SiD DBD Benchmarking Current Status I

Workloads

Neutron Benchmarking WG Status Update

Vouch By Reference status report

Status of SiD Benchmarking

Academic Workloads and TRAC

IHEPCCC/HEPiX Benchmarking WG Status Report

IHEPCCC/HEPiX Benchmarking WG Status Update

Sequoia RFP and Benchmarking Status