1 / 45

LHC experimental data: F rom today’s Data Challenges to the promise of tomorrow

LHC experimental data: F rom today’s Data Challenges to the promise of tomorrow. B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic Training CERN. Computing Infrastructure and Technology. Day 2 Academic Training CERN 12-16 May 2003

Download Presentation

LHC experimental data: F rom today’s Data Challenges to the promise of tomorrow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic Training CERN

  2. Computing Infrastructure and Technology Day 2 Academic Training CERN 12-16 May 2003 Bernd Panzer-Steindel CERN-IT

  3. tasks, requirements, boundary conditions component technologies building farms and the fabric into the future Outline Bernd Panzer-Steindel CERN-IT

  4. Questions • Before building a computing infrastructure • some questions need to be answered : • what are the tasks ? • what is the dataflow ? • what are the requirements ? • what are the boundary conditions ? Bernd Panzer-Steindel CERN-IT

  5. Processed Data Data Acquisition Event Simulation Physics analysis Raw Data Event Summary Data Interactive physics analysis Interactive physics analysis Interactive physics analysis Experiment dataflow Event reconstruction High Level Trigger selection, reconstr. Bernd Panzer-Steindel CERN-IT

  6. Tasks Detector channel digitization Level 1 and Level 2 Trigger Event building High Level Trigger Simulated data production (Monte Carlo) Online data processing Data storage Offline data reprocessing Offline data analysis Interactive data analysis and visualization Data calibration Physics result Bernd Panzer-Steindel CERN-IT

  7. Dataflow Examples Tape server Disk server DAQ scenario for 2008 100 GB/s CPU server CPU intensive 50 GB/s 5 GB/s 2 GB/s 1 GB/s Re-processing Analysis Central Data Recording MC production + pileup Online filtering Online processing Bernd Panzer-Steindel CERN-IT

  8. Requirements and Boundaries (I) • The HEP applications require integer processor performance and less floating point performance •  choice of processor type, benchmark reference • Large amount of processing and storage needed, but optimization • is for aggregate performance , not the single tasks • + the events are independent units  many components, moderate demands on the single components, coarse grain parallelism Bernd Panzer-Steindel CERN-IT

  9. Requirements and Boundaries (II) • the major boundary condition is cost, • staying within the budget envelope • + maximum amount of resources •  commodity equipment, best price/performance values • ≠ cheapest ! • take into account reliability, functionality and • performance together == total-cost-of-ownership • basic infrastructure , environment • availability of space, cooling and electricity Bernd Panzer-Steindel CERN-IT

  10. Component technologies • processor • disk • tape • network and packaging issues Bernd Panzer-Steindel CERN-IT

  11. Coupling of building blocks Physical and logical coupling Hardware Software CPU Disk Motherboard, backplane, Bus, integrating devices (memory,Power supply, controller,..) Operating system, driver Storage tray, NAS server, SAN element PC Network (Ethernet, fibre channel, Myrinet, ….) Hubs, switches, routers Cluster Grid middleware Wide area network World wide cluster Level of complexity Batch system, load balancing, Control software, Hierarchical Storage Systems Bernd Panzer-Steindel CERN-IT

  12. Processors • focus on integer price/performance (SI2000) • PC mass market INTEL and AMD • price/performance optimum is changing frequently • between the two • weak point of AMD : heat protection, heat production • current CERN strategy is to use INTEL processors Bernd Panzer-Steindel CERN-IT

  13. Price/performance evolution Bernd Panzer-Steindel CERN-IT

  14. Industry tries now to fulfill Moore’s Law Bernd Panzer-Steindel CERN-IT

  15. Processor packaging 1U rack mounted case • best price/performance per node comes today • with dual processors and desk side cases • processors are only 25-30% of the box costs •  mainboard, memory, power-supply, case, disk desk side case • thin units can be up to 30% more expensive •  cooling and space • today a typical configuration is : 2 x 2.4 GHz PIV processors, 1 GB memory, 80 GB disk, fast ethernet  about two ‘versions’ behind == 2.8 GHz , 3 Ghz are available but don’t give a good price/performance value • one has to add 10% of the box costs for infrastructure ( racks, cabling, network, control system) Bernd Panzer-Steindel CERN-IT

  16. S P A C E Computer center Experiment control room Bernd Panzer-Steindel CERN-IT

  17. Problems • seeing effects of market saturation for desktops + moving into the laptop direction we are currently using “desktop+” machines  more expensive to use server CPU’s • Moore’s Second Law : the cost of a fabrication facility increases at an even greater rate as the transistor density (doubling every 18 month) • current fabrication plants cost : ~ 2.5 billion $ • (INTEL profit in 2002 : 3.2 billion $) • heat dissipation, currently heat production increases linear with performance  tera herz transistors (2005 - ), reduce leakage currents  power saving processors BUT careful to compare effective performance measures for mobile computing do not help in case of 100% CPU utilization 24*7 operation Bernd Panzer-Steindel CERN-IT

  18. Processor performance (SpecInt2000) per Watt 18 16 PIII 0.25 14 PIII 0.18 12 PIV 0.18 10 PIV 0.13 SpecInt2000/Watt Itanium 2 0.18 8 PIV Xeon 0.13 6 4 2 0 0 1000 2000 3000 Frequency [MHz] Processor power consumption Heat production Bernd Panzer-Steindel CERN-IT

  19. Basic infrastructure • Electricity and cooling • large investments necessary, long planning and implementation period • we use today about 700 KW in the center, upgrade to 2.5 MW has started • i.e. 2.5 for electricity + 2.5 for cooling • need extra buildings, will take several years and costs up to 8 million SFr • this infrastructure evolves not linear but in larger step functions • much more complicated for the experimental areas with their space and access limitations Bernd Panzer-Steindel CERN-IT

  20. Disk storage • density improving every year (doubling every ~14 month) • single stream speed (sequential I/O) increasing considerably (up to 100 MB/s) • transactions per second (random I/O, access time) very little improvement • (factor 2 in 4 years, from 8 ms to 4 ms) • data rates drop considerably when moving from sequential to random I/O online/offline processing works with sequential streams • analysis using random access patterns and multiple,parallel sequential streams =~ random access • disks come in different ‘flavours’, connection type to the host • same hardware with different electronics SCSI, IDE, fiber channel • different quality selection criteria  MTBF (Mean-Time-Between-Failure) • mass market == lower values Bernd Panzer-Steindel CERN-IT

  21. Disk performance Bernd Panzer-Steindel CERN-IT

  22. Price/performance evolution Bernd Panzer-Steindel CERN-IT

  23. Storage density evolution Bernd Panzer-Steindel CERN-IT

  24. Storage packaging • 10-12 IDE disks are attached to a RAID controller inside a modified PC with a larger housing, connected with gigabit ethernet to the network  NAS Network Attached Storage good experience with this approach, current practice • alternatives : • SAN Storage Area Networks • based on disks directly attached to a fiber channel network • iSCSI SCSI commands via IP, disk trays with iSCSI • controller attached to ethernet • R&D, evaluations  advantages of SAN versus NAS • which would justify the higher costs factor 2-4 • not only the ‘pure’ costs per GB of storage •  throughput, reliability, manageability, redundancy Bernd Panzer-Steindel CERN-IT

  25. PCI 120 – 500 MB/s PCI-X 1 – 8 GB/s for disk servers coupling of disks, processor, memory and network defines the performance + LINUX Bernd Panzer-Steindel CERN-IT

  26. Tape storage • not a mass market, aimed at backup (write once - read never) • we need high throughput reliable under constant read/write stress • need automated reliable access to a large amount of data  large robotic installations  major players are IBM and StorageTek (STK) • improvements are slow, not comparable with processors or disks trends ; current generation : 30 MB/s tape drives with 200 GB cartridges • two types of read/write technologies : • helical scan  “video recorder” complicated mechanics • linear scan  “audio recorder” simpler, density lower • linear is prefered, had some bad experience with helical scan • disk and tape storage prices are getting closer factor 2-3 difference Bernd Panzer-Steindel CERN-IT

  27. Network • commodity Ethernet 10 / 100 / 1000 / 10000 Mbits/s sufficient in the offline world and even partly in the online world (HLT) level1 triggers need lower latency times • special network , Cluster interconnect: Myrinet 1,2,10 Gbits/s GSN 6.4 Gbits/s • infiniband 2.5 Gbits/s * 4 (12) • storage network fiber channel 1 Gbits/s , 2 Gbits/s very high performance with low latency, small processor ‘footprint’, small market , expensive Bernd Panzer-Steindel CERN-IT

  28. “Exotic” technology trends • nano technology (carbon nanotubes) • molecular computing, (kilohertz plastic processors, single molecule switches) • biological computing, (DNS computing) • quantum computing, (quantum dots, ion traps, few qbits only) • very interesting and fast progress in the last years, but far away from any commodity production • less fancy  game machines (X-Box, GameCube, Playstation 2) • advantage : large market (>10 billion $), cheap high power nodes • disadvantage : little memory, networking capabilities • graphics cards several times the raw power of normal CPUs • not easy to use in our environment Bernd Panzer-Steindel CERN-IT

  29. Technology evolution exponential growth rates everywhere Bernd Panzer-Steindel CERN-IT

  30. Building farms and the fabric Bernd Panzer-Steindel CERN-IT

  31. Building the Farm CPU server + Fiber Channel Interface + tape drive == Tape server Processors  “desktop+” node == CPU server CPU server + larger case + 6*2 disks == Disk server Bernd Panzer-Steindel CERN-IT

  32. Software ‘glue’ • management of the basic hardware and software : • installation, configuration and monitoring system • (from the European Data Grid project) • management of the processor computing resources : • Batch system (LSF from Platform Computing) • management of the storage (disk and tape) : CASTOR • (CERN developed Hierarchical Storage Management • system) Bernd Panzer-Steindel CERN-IT

  33. Generic model of a Fabric to external network application servers network tape servers disk servers Bernd Panzer-Steindel CERN-IT

  34. Today’s schematic network topology WAN Gigabit Ethernet, 1000 Mbit/s Backbone Multiple Gigabit Ethernet, 20 * 1000 Mbit/s Gigabit Ethernet, 1000 Mbit/s Disk Server Tape Server Fast Ethernet, 100 Mbit/s CPU Server Bernd Panzer-Steindel CERN-IT

  35. LCG Testbed Structure GigaBit Gigabit Ethernet Fast Ethernet 100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE 64 disk server 200 FE cpu server 1 GB lines Backbone Routers 3 GB lines 20 tape server 3 GB lines 8 GB lines 100 GE cpu server 36 disk server 100 FE cpu server Bernd Panzer-Steindel CERN-IT

  36. Computer center today • Main fabric cluster (Lxbatch/Lxplus resources) •  physics production for all experiments • Requests are made in units of Si2000 •  1000 CPU server, 160 disk server, ~ 950000 Si2000, ~ 100 TB •  50 tape drives (30MB/s, 200 GB cart.) • 10 silos with 6000 slots each == 12 PB capacity • Benchmark,performance and testbed clusters • (LCG prototype resources) •  computing data challenges, technology challenges, • online tests, EDG testbeds, preparations for the LCG-1 • production system, complexity tests •  500 CPU server, 100 disk server, ~390000 Si2000, ~ 50 TB Bernd Panzer-Steindel CERN-IT

  37. General Fabric Layout New software , new hardware (purchase) R&D cluster (new architecture and hardware) Development cluster GRID testbeds Certification cluster Main cluster ‘en miniature’ Benchmark and performance cluster (current architecture and hardware) Service control and management (e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc Main fabric cluster old current new 2-3 hardware generations 2-3 OS/software versions 4 Experiment environments Bernd Panzer-Steindel CERN-IT

  38. View of different Fabric areas Installation Configuration + monitoring Fault tolerance Automation, Operation, Control Infrastructure Electricity, Cooling, Space Batch system (LSF, CPU server) Storage system (AFS, CASTOR, disk server) Network Benchmarks, R&D, Architecture GRID services !? Prototype, Testbeds Purchase, Hardware selection, Resource planning Coupling of components through hardware and software Bernd Panzer-Steindel CERN-IT

  39. Into the future Bernd Panzer-Steindel CERN-IT

  40. Considerations • current state of performance, functionality and reliability is good • and • technology developments look still promising • more of the same for the future !?!? How can we be sure that we are following the right path ? How to adapt to changes ? Bernd Panzer-Steindel CERN-IT

  41. Strategy • continue and expand the current system BUT do in parallel : • R&D activities SAN versus NAS, iSCSI, IA64 processors, …. • technology evaluations infiniband clusters, new filesystem technologies,….. • Data Challenges to test scalabilities on larger scales “bring the system to it’s limit and beyond “ we are very successful already with this approach, especially with the “beyond” part  Fridays talk • watch carefully the market trends Bernd Panzer-Steindel CERN-IT

  42. CERN computer center 2008 • Hierarchical Ethernet network tree topology (280 GB/s) • ~ 8000 mirrored disks ( 4 PB) • ~ 3000 dual CPU nodes (20 million SI2000) • ~ 170 tape drives (4 GB/s) • ~ 25 PB tape storage  all numbers : IF exponential growth rate continues ! The CMS High Level Trigger will consist of about 1000 nodes with 10 million SI2000 !! Bernd Panzer-Steindel CERN-IT

  43. CPU Server Tomorrow’s schematic network topology WAN 10 Gigabit Ethernet, 10000 Mbit/s Backbone Multiple 10 Gigabit Ethernet, 200 * 10000 Mbit/s 10 Gigabit Ethernet, 10000 Mbit/s Gigabit Ethernet, 1000 Mbit/s Disk Server Tape Server Bernd Panzer-Steindel CERN-IT

  44. Summary • quite confident in the technological evolution • quite confident in the current architecture • LHC computing is not a question of pure technology  efficient coupling of components, hard- + software  commodity is a must for cost efficiency boundary conditions are important market development can have large effects Bernd Panzer-Steindel CERN-IT

  45. Day 1 (Pierre VANDE VYVRE) Outline, main concepts Requirements of LHC experiments Data Challenges Day 2 (Bernd PANZER) Computing infrastructure Technology trends Day 3 (Pierre VANDE VYVRE) Data acquisition Day 4 (Fons RADEMAKERS) Simulation, Reconstruction and analysis Day 5 (Bernd PANZER) Computing Data challenges Physics Data Challenges Evolution Tomorrow Bernd Panzer-Steindel CERN-IT

More Related