Computing Plans in CMS

Ian Willers CERN Computing Plans in CMS

The Problem and Introduction • Data Challenge – DC04 • Computing Fabric – Technologies evolution • Conclusions

CERN The Problem event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis

CERN – Tier 0 2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps 622 Mbps Uni n Lab a Tier2 Uni b Lab c   Department  Desktop Regional Centres – a Multi-Tier Model

Computing TDR Strategy Technologies Evaluation and evolution Estimated Available Resources (no cost book for computing) • Physics Model • Data model • Calibration • Reconstruction • Selection streams • Simulation • Analysis • Policy/priorities… • Computing Model • Architecture (grid, OO,…) • Tier 0, 1, 2 centres • Networks, data handling • System/grid software • Applications, tools • Policy/priorities… Iterations / scenarios Required resources Validation of Model DC04 Data challenge Copes with 25Hz at 2x10**33 for 1 month Simulations Model systems & usage patterns • C-TDR • Computing model (& scenarios) • Specific plan for initial systems • (Non-contractual) resource planning

The Problem and Introduction • Data Challenge – DC04 • Proposed Computing Fabric • Conclusions

DC04 Calibration challenge Calibration sample T1 T2 Calibration Jobs Starting Now. “True” DC04 Feb, 2004 TAG/AOD (replica) Replica Conditions DB T2 DC04 Analysis challenge MASTER Conditions DB Fake DAQ (CERN) T0 T1 Replica Conditions DB 25Hz 1.5MB/evt 40MByte/s 3.2 TB/day DC04 T0 challenge CERN disk pool ~40 TByte (~20 days data) 1st pass Recon- struction Event streams Higgs DST TAG/AOD (20 kB/evt) T2 25Hz 1MB/evt raw 25Hz 0.5MB reco DST TAG/AOD (replica) Pre Challenge Production HLT Filter ? T2 Event server Higgs background Study (requests New events) 50M events 75 Tbyte Disk cache Archive storage CERN Tape archive SUSY Background DST CERN Tape archive Data Challenge DC04

T1 Calibration sample T2 Calibration Jobs TAG/AOD (replica) Replica Conditions DB T2 MASTER Conditions DB Fake DAQ (CERN) T0 T1 Replica Conditions DB 25Hz 2MB/evt 50MByte/s 4 Tbyte/day CERN disk pool ~40 TByte (~10 days data) 1st pass Recon- struction Event streams Higgs DST TAG/AOD (10-100 kB/evt) 25Hz 1MB/evt raw 25Hz 0.5MB reco DST T2 TAG/AOD (replica) HLT Filter ? PCP T2 Event server Higgs background Study (requests New events) 50M events 75 Tbyte Disk cache Archive storage CERN Tape archive SUSY Background DST CERN Tape archive DC04 Calibration challenge DC04 Analysis challenge DC04 T0 challenge

Computer farm DAG job job job job Pre–Challenge Production with/without GRID User’s Site (or grid UI) Resources Production Manager defines assignments RefDB Physics Group asks for official dataset shell scripts Local Batch Manager JDL EDG Scheduler Site Manager starts an assignment CMS/ LCG-0 MCRunJob DAGMan User starts a private production Chimera VDL Virtual Data Catalogue Planner

HEP Computing • High Throughput Computing • throughput rather than performance • resilience rather than ultimate reliability • long experience in exploiting inexpensive mass market components • management of very large scale clusters is a problem

CPU Servers

CPU capacity - Industry • OpenLab study of 64 bit architecture • Earth Simulator • number 1 computer in top 500 • made in Japan by NEC • peak speed of 40 Tflops • leads Top 500 list by almost a factor 5 • performance of Earth Simulator equals sum of next 12 computers • the Earth Simulator runs at 90% (vs. 10-60% for PC farms) efficiency • Gordon Bell warned “Off-the-shelf supercomputing is a dead end”

Earth Simulator

Cited problems with farms used as supercomputers • Lack of memory bandwidth • Interconnect latency • Lack of interconnect bandwidth • Lack of high performance (parallel) I/O • High cost of ownership for large scale systems • For CMS - does this matter?

LCG Testbed Structure used 100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE 64 disk server 200 FE cpu server 1 GB lines Backbone Routers 3 GB lines 20 tape server 3 GB lines 8 GB lines 100 GE cpu server 36 disk server 100 FE cpu server

HEP Computing • Mass Storage model • data resides on tape – cached on disk • light-weight private software for scalability, reliability, performance • petabyte scale object persistency database products

Mass Storage

Mass Storage - Industry • OpenLab – StorageTek 9940B drives driven by CERN at 1.1 GB/s • Tape only for backup • Main data stored on disks • Google example

Disk Storage

Disks – Commercial trends • Jobs accessing files over the GRID • GRID copied files to sandbox • new proposal for file access from GRID • OpenLab – IBM 28TB TotalStorage using iSCSI disks • iSCSI: SCSI over the Internet • OSD: Object Storage Device = Object Based SCSI • Replication gives security and performance

File Access via Grid • Access now takes place in steps: • find site where file resides using replica catalogue • check if the file is on tape or on disk, if only on tape move to disk • if you cannot open a remote file, copy the file to the worker node and use local I/O • open the file

Object Storage Device

Big disk, slow I/O tricks Sequential faster than random Always read from start to finish Hot Data Cold Data

Network trends • OpenLab: 755MB/s over 10 Gbps Ethernet • CERN/Caltech land speed record holders (in Guinness Book of Records) • CERN to Chicago: iPv6 single stream, 983 Mbps • Sunnyvale to Geneva: iPv4 multiple streams, 2.38 Gbps • Network Address Translation, NAT • IPv6: IP address depletion, efficient packet handling, authentication, security etc.

Port Address Translation • PAT - A form of dynamic NAT that maps multiple unregistered IP addresses to a single registered IP address by using different ports • Avoids iPv4 problems of limited addresses • Mapping can be done dynamically so adding nodes easier • Therefore easier to management of farm fabric?

iPv6 • iPv4: 32-bit address space assigned • 67% for USA • 6% for Japan • 2% for China • 0.14% for India • iPv6: 128-bit address space • No longer need for Network Address Translation, NAT?

Conclusions • CMS faces an enormous challenge in computing • short term data challenges • long term developments within commercial and scientific world • The year 2007 is still four years away • enough for a completely new generation of computing technologies to appear • New inventions may revolutionise computing • CMS depends on this progress to make our computing possible and affordable

Computing Plans in CMS