Converting ASGARD into a MC Farm for Particle Physics
This document outlines the conversion of the ASGARD system into a Monte Carlo (MC) Farm tailored for particle physics research. It details the three main components of the Beowulf cluster architecture—CPU nodes, network configuration, and file servers—and discusses the allocation of budget to optimize performance for different job types. Additionally, it examines the challenges faced with file server speed, data management, and the need for high bandwidth and capacity. The upgrade strategies are elaborated to enhance parallel processing capabilities and ensure efficient data handling.
Converting ASGARD into a MC Farm for Particle Physics
E N D
Presentation Transcript
Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ
Beowulf Concept Three Main Components: Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Fileserver Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Fileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ?? Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : CPU-bound jobs with limited I/O and interCPU communication ~80% ~10% ~10% [ ASGARD , HREIDAR-I ] Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : Jobs with high interCPU communication needs: (Parallel Proc.) ~50% ~40% ~10% [ HREIDAR-II ] Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : Jobs with high I/O needs or large datasets: (Data Analysis) ~50% ~10% ~40% Adrian Biland, IPP/ETHZ
Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ] Adrian Biland, IPP/ETHZ
Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, … Adrian Biland, IPP/ETHZ
Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day ! Adrian Biland, IPP/ETHZ
Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch Adrian Biland, IPP/ETHZ
ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Adrian Biland, IPP/ETHZ
ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Needed: Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Adrian Biland, IPP/ETHZ
ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adrian Biland, IPP/ETHZ
ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed … Adrian Biland, IPP/ETHZ