1 / 43

CMS Distributed Data Analysis Challenges

CMS Distributed Data Analysis Challenges. Claudio Grandi on behalf of the CMS Collaboration. Outline. CMS Computing Environment CMS Computing Milestones OCTOPUS: CMS Production System 2002 Data productions 2003 Pre-Challenge production (PCP03) PCP03 on grid 2004 Data Challenge (DC04)

demont
Download Presentation

CMS Distributed Data Analysis Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS Distributed Data Analysis Challenges Claudio Grandi on behalf of the CMS Collaboration ACAT'03 - KEK

  2. Outline CMS Computing Environment CMS Computing Milestones OCTOPUS: CMS Production System 2002 Data productions 2003 Pre-Challenge production (PCP03) PCP03 on grid 2004 Data Challenge (DC04) Summary ACAT'03 - KEK

  3. CMS Computing Environment ACAT'03 - KEK

  4. CMS computing context • LHC will produce 40 million bunch crossing per second in the CMS detector (1000 TB/s) • The on-line system will reduce the rate to 100 events per second (100 MB/s raw data) • Level-1 trigger: hardware • High level trigger: on-line farm • Raw data (1MB/evt) will be: • archived on persistent storage (~1 PB/year) • reconstructed to DST (~0.5 MB/evt) and AOD (~20 KB/evt) • Reconstructed data (and part of raw data) will be: • distributed to computing centers of collaborating institutes • analyzed by physicists at their own institutes ACAT'03 - KEK

  5. CMS Data Production at LHC 40 MHz (1000 TB/sec) Level 1 Trigger 75 KHz (50 GB/sec) High Level Trigger 100 Hz (100 MB/sec) Data Recording & Offline Analysis ACAT'03 - KEK

  6. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center CMS Distributed Computing Model ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center 2.5-10 Gbps Tier 2 ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Physics data cache 0.1 to 10 Gbps Workstations Tier 4 ACAT'03 - KEK

  7. CMS software for Data Simulation • Event Generation • Pythia and other generators • Generally Fortran programs. Produce N-tuple files (HEPEVT format) • Detector simulation • CMSIM (uses GEANT-3) • Fortran program. Produces Formatted Zebra (FZ) files from N-tuples • OSCAR (uses GEANT-4 and the CMS COBRA framework) • C++ program. Produces POOL files (hits) from N-tuples • Digitization (DAQ simulation) • ORCA (uses the CMS COBRA framework) • C++ program. Produces POOL files (digis) from hits POOL files or FZ • Trigger simulation • ORCA • Reads digis POOL files • Normally run as part of the reconstruction phase ACAT'03 - KEK

  8. CMS software for Data Analysis • Reconstruction • ORCA • Produces POOL files (DST and AOD) from hits or digis POOL files • Analysis • ORCA • Reads POOL files in (hits, digis,) DST, AOD formats • IGUANA (uses ORCA and OSCAR as back-end) • Visualization program (event display, statistical analysis) ACAT'03 - KEK

  9. CMS software: ORCA & C. Zebra files with HITS Pythia CMSIM (GEANT3) HEPEVT Ntuples Other Generators OSCAR/COBRA (GEANT4) ORCA/COBRA Hit Formatter • Data Simulation  Merge signal and pile-up Digis Database (POOL) Hits Database (POOL) ORCA/COBRA Digitization IGUANA Interactive Analysis Database (POOL) ORCA Reconstruction or User Analysis • Data Analysis Ntuples or Root files ACAT'03 - KEK

  10. CMS Computing Milestones ACAT'03 - KEK

  11. CMS computing milestones • DAQ TDR (Technical Design Report) • Spring-2002 Data Production Software Baselining • Computing & Core Software TDR • 2003 Data Production (PCP04) • 2004 Data Challenge (DC04) • Physics TDR • 2004/05 Data Production (DC05) • Data Analysis for physics TDR • “Readiness Review” • 2005 Data Production (PCP06) • 2006 Data Challenge (DC06) Commissioning ACAT'03 - KEK

  12. LHC1E34 Average slope =x2.5/year LHC2E33 DC06 Readiness DC05 LCG TDR DC04Physics TDR DAQTDR Size of CMS Data Challenges 1999: 1TB – 1 month – 1 person 2000-2001: 27 TB – 12 months – 30 persons 2002: 20 TB – 2 months – 30 persons 2003: 175 TB – 6 months – <30 persons ACAT'03 - KEK

  13. World-wide Distributed Productions CMS Production Regional Centre CMS Distributed Production Regional Centre ACAT'03 - KEK

  14. CMS Computing Challenges • CMS Computing challenges include: • production of simulated data for studies on: • Detector design • Trigger and DAQ design and validation • Physics system setup • definition and set-up of analysis infrastructure • definition of computing infrastructure • validation of computing model • Distributed system • Increasing size and complexity • Tightened to other CMS activities • provide computing support for all CMS activities ACAT'03 - KEK

  15. OCTOPUSCMS Production System ACAT'03 - KEK

  16. Dataset metadata RLS POOL Computer farm JDL Grid (LCG) Scheduler LCG DAG DPE DAGMan (MOP) Job metadata job Push data or info Chimera VDL Virtual Data Catalogue job job Planner Pull info job OCTOPUS Data Production System Phys.Group asks for a new dataset Production Manager defines assignments RefDB shell scripts Data-level query Local Batch Manager BOSS DB Job level query McRunjob + plug-in CMSProd Site Manager starts an assignment ACAT'03 - KEK

  17. Remote connections to databases • Metadata DB are RLS/POOL, RefDB, BOSS DB User Interface Worker Node Job input Job input Job Wrapper (job instru- mentation) User Job Job output Job output Journal writer Journal Catalog Journal Catalog Remote updater Asynchronous updater Metadata DB Direct connection from WN ACAT'03 - KEK

  18. Job production MCRunJob • Modular: produce plug-in’s for: • reading from RefDB • reading from simple GUI • submitting to a local resource manager • submitting to DAGMan/Condor-G (MOP) • submitting to the EDG/LCG scheduler • producing derivations in the Chimera Virtual Data Catalogue • Runs on the user (e.g. site manager) host • Defines also the sandboxes needed by the job • If needed, the specific submission plug-in takes care of: • preparing the XML POOL catalogue with input files information • moving the sandbox files to the worker nodes CMSProd ACAT'03 - KEK

  19. Job Metadata management Job parameters that represent the job running status are stored in a dedicated database: • when did the job start? • is it finished? but also: • how many events did it produce so far? BOSS is a CMS-developed system that does this extracting the info from the job standard input/output/error streams • The remote updater is based on MySQL • Remote updater are being developed now based on: • R-GMA (still has scalability problems) • Clarens (just started) ACAT'03 - KEK

  20. Dataset Metadata management Dataset metadata are stored in the RefDB: • by what (logical) files is it made of? but also: • what input parameters to the simulation program? • how many events have been produced so far? Information may be updated in the RefDB in many ways: • manual Site Manager operation • automatic e-mail from the job • remote updaters based on R-GMA and Clarens (similar to those developed for BOSS) will be developed Mapping of logical names to physical file names will be done on the grid by RLS/POOL ACAT'03 - KEK

  21. 2002 Data Productions ACAT'03 - KEK

  22. 2002 production statistics • Used Objectivity/DB for persistency • 11 Regional Centers, more than 20 sites, about 30 site managers • Spring 2002 Data production • Generation and detector simulation: 6 million events in 150 physics channels • Digitization: >13 million events with different configuration (luminosity) • about 200 KSI2000 months • more than 20 TB digitized data • Fall 2002 Data production • 10 million events, full chain (small output) • about 300 KSI2000 months • Also productions on grid! ACAT'03 - KEK

  23. Spring 2002 production history CMSIM 1.5 million events per month ACAT'03 - KEK

  24. CMS/EDG Stress Test on EDG testbed & CMS sites Top-down approach: more functionality but less robust, large manpower needed USCMS IGT Production in the US Bottom-up approach: less functionality but more stable, little manpower needed Fall 2002 CMS grid productions 1.2 million events in 2 months 260,000 events in 3 weeks  ACAT'03 - KEK

  25. 2003 Pre-Challenge Production ACAT'03 - KEK

  26. PCP04 production statistics • Started in july. Supposed to end by Xmas. • Generation and simulation: • 48 million events with CMSIM • 50 150 KSI2K s/event, 2000 KSI2K months • ~ 1MB/event, 50 TB • hit-formatting in progress. POOL format reduces size of a factor of 2! • 6 million events with OSCAR • 100  200 KSI2K s/event, 350 KSI2K months (in progress) • Digitization just starting • need to digitize ~70 million events. Not all in time for DC04! Estimated: • ~30-40 KSI2K s/event, ~950 KSI2K months • ~1.5 MB/event, 100 TB • Data movement to CERN • ~1TB/day for 2 months ACAT'03 - KEK

  27. PCP 2003 production history CMSIM 13 million events per month ACAT'03 - KEK

  28. PCP04 on grid ACAT'03 - KEK

  29. Running on Grid2003 ~ 2000 CPU’s Based on VDT1.1.11 EDG VOMS for authentication GLUE Schema for MDS Information Providers MonaLisa for monitoring MOP for production control - Dagman and Condor-G for specification and submission - Condor-based match-making process selects resources Remote Site 1 Batch Queue Master Site GridFTP DAGMan Condor-G MCRunJob mop_submitter GridFTP Remote Site N Batch Queue GridFTP US DPE production system US DPE Production on Grid2003 MOP System ACAT'03 - KEK

  30. Performance of US DPE USMOP Regional Center - 7.7 Mevts pythia: ~30000 jobs ~1.5min each, ~0.7 KSI2000 months - 2.3 Mevts cmsim: ~9000 jobs ~10hours each, ~90 KSI2000 months ~3.5 TB data CMSIM Now running OSCAR productions ACAT'03 - KEK

  31. CMS/LCG-0 testbed CMS/LCG-0 is a CMS-wide testbed based on the LCG pilot distribution (LCG-0), owned by CMS • joint CMS – DataTAG-WP4 – LCG-EIS effort • started in june 2003 • Components from VDT 1.1.6 and EDG 1.4.X (LCG pilot) • Components from DataTAG (GLUE schemas and info providers) • Virtual Organization Management: VOMS • RLS in place of the replica catalogue (uses rlscms by CERN/IT) • Monitoring: GridICE by DataTAG • tests with R-GMA (as BOSS transport layer for specific tests) • no MSS direct access (bridge to SRB at CERN) About 170 CPU’s, 4 TB disk • Bari Bologna Bristol Brunel CERN CNAF Ecole Polytechnique Imperial College ISLAMABAD-NCP Legnaro Milano NCU-Taiwan Padova U.Iowa Allowed to do CMS software integration while LCG-1 was not out ACAT'03 - KEK

  32. Dataset metadata CE CE CE CE CMS software CMS software CMS software CMS software SE Job metadata Push data or info WN Pull info CMS/LCG-0 Production system OCTOPUS installed on User Interface CMS software (installed on Computing Elements as RPM’s) RefDB SE RLS User Interface JDL Grid (LCG) Scheduler McRunjob + ImpalaLite SE Grid Information System (MDS) SE CE BOSS DB SE ACAT'03 - KEK

  33. CMS/LCG-0 performance CMS-LCGRegional Center based on CMS/LCG-0 0.5 Mevts “heavy” pythia: ~2000 jobs ~8hours each, ~10 KSI2000 months 1.5 Mevts cmsim: ~6000 jobs ~10hours each, ~55 KSI2000 months ~2.5 TB data Inefficiency estimation: • 5% to 10% due to sites’ misconfiguration and local failures • 0% to 20% due to RLS unavailability • few errors in execution of job wrapper • Overall inefficiency: 5% to 30% Pythia + CMSIM Now used as a play-ground for CMS grid-tools development ACAT'03 - KEK

  34. Data Challenge 2004(DC04) ACAT'03 - KEK

  35. 2004 Data Challenge • Test the CMS computing system at a rate which corresponds to the 5% of the full LHC luminosity • corresponds to the 25% of the LHC startup luminosity • for one month (February or March 2004) • 25 Hz data taking rate at a luminosity of 0.2 x 1034 cm-2s-1 • 50 million events (completely simulated up to digis during PCP03) used as input • Main tasks • Reconstruction at Tier-0 (CERN) at 25 Hz (~40 MB/s) • Distribution of DST to Tier-1 centers (~5 sites) • Re-calibration at selected Tier-1 centers • Physics-groups analysis at the Tier-1 centers • User analysis from the Tier-2 centers ACAT'03 - KEK

  36. T1 Calibration sample T2 Calibration Jobs TAG/AOD (replica) Replica Conditions DB T2 MASTER Conditions DB Fake DAQ (CERN) T0 T1 Replica Conditions DB 25Hz 1.5MB/evt 40MByte/s 3.2 TB/day 1st pass Recon- struction Event streams Higgs DST TAG/AOD (20 kB/evt) T2 TAG/AOD (replica) PCP T2 Event server Higgs background Study (requests New events) 50M events 75 Tbyte 1TByte/day 2 months CERN Tape archive SUSY Background DST DC04 Calibration challenge DC04 Analysis challenge CERN disk pool ~40 TByte (~20 days data) DC04 T0 challenge 25Hz 1MB/evt raw 25Hz 0.5MB reco DST HLT Filter ? Disk cache Archive storage CERN Tape archive ACAT'03 - KEK

  37. Tier-0 challenge • Data serving pool to serve digitized events at 25Hz to the computing farm with 20/24 hour operation. • 40 MB/s • Adequate buffer space (at least 1/4 of the digi sample in the disk buffer). • Pre-staging software. File locking while in use, buffer cleaning and restocking as files have been processed • Computing Farm: approximately 400 CPU’s • jobs running 20/24 hours. 500 events/job, 3 hour/job • Files in buffer locked till successful job completion • No dead-time can be introduced to the DAQ. Latencies must be no more than of order 6-8 hours • CERN MSS: ~50 MB/s archiving rate • archive ~ 1.5 MB * 25 Hz raw data (digis) • archive ~0.5 MB * 25 Hz reconstructed events (DST) • File catalog: POOL/RLS • Secure and complete catalog of all data input/products • Accessible and/or replicable to the other computing centers ACAT'03 - KEK

  38. Data distribution challenge • Replication of the DST and part of raw data at one or more Tier-1 centers • possibly using the LCG replication tools • foreseen some event duplication • At CERN~3 GB/s traffic without inefficiencies (about 1/5 at Tier-1) • Tier-0 catalog accessible by all sites • Replication of calibration samples (DST/raw) to selected Tier-1 • Transparent access of jobs at the Tier-1 sites to the local data whether in MSS or on disk buffer • Replication of any Physics-Groups (PG) data produced at the Tier-1 sites to the other Tier-1 sites and interested Tier-2 sites • Monitoring of Data Transfer activites • e.g. with MonaLisa ACAT'03 - KEK

  39. Calibration challenge • Selected sites will run calibration procedures • Rapid distribution of the calibration samples (within hours at most) to the Tier-1 site and automatically scheduled jobs to process the data as it arrives. • Publication of the results in an appropriate form that can be returned to the Tier-0 for incorporation in the calibration “database” • Ability to switch calibration “database” at the Tier-0 on the fly and to be able to track from the meta-data which calibration table has been used. ACAT'03 - KEK

  40. Tier-1 analysis challenge • All data distributed from Tier-0 safely inserted to local storage • Management and publication of a local catalog indicating status of locally resident data • define tools and procedures to synchronize a variety of catalogs with the CERN RLS catalog (EDG-RLS, Globus-RLS, SRB-Mcat, …) • Tier-1 catalog accessible to at least the “associated” Tier-2 centers • Operation of the physics-group (PG) productions on the imported data • “production-like” activity • Local computing facilities made available to Tier-2 users • Possibly via the LCG job submission system • Export of the PG-data to requesting sites (Tier-0, -1 or -2) • Registration of the data produced locally to the Tier-0 catalog to make them available to at least selected sites • possibly via the LCG replication tools ACAT'03 - KEK

  41. Tier-2 analysis challenge • Point of access to computing resources of the physicists • Pulling of data from peered Tier-1 sites as defined by the local Tier-2 activities • Analysis on the local PG-data produces plots and/or summary tables • Analysis on distributed PG-data or DST available at least at the reference Tier-1 and “associated” Tier-2 centers. • Results are made available to selected remote users possibly via the LCG data replication tools. • Private analysis on distributed PG-data or DST is outside DC04 scope but will be kept as a low-priority milestone • use of a Resource Broker and Replica Location Service to gain access to appropriate resources without knowing where the input data are • distribution of user-code to the executing machines • user-friendly interface to prepare, submit and monitor jobs and to retrieve results ACAT'03 - KEK

  42. Summary of DC04 scale Tier-0 • Reconstruction and DST production at CERN • 75 TB Input Data • 180 KSI2K months = 400 CPU @24 hour operation (@500 SI2K/CPU) • 25TB Output data • 1-2 TB/Day Data Distribution from CERN to sum of T1 centers Tier-1 • Assume all (except CERN) “CMS” Tier-1’s participate • CNAF, FNAL, Lyon, Karlsruhe, RAL • Share the T0 output DST between them (~5-10TB each) • 200 GB/day transfer from CERN (per T1) • Perform scheduled analysis group “production” • ~100 KSI2K months total = ~50 CPU per T1 (24 hrs/30 days) Tier-2 • Assume about 5-8 T2 • may be more… • Store some of PG-data at each T2 (500GB-1TB) • Estimate 20 CPU at each center for 1 month ACAT'03 - KEK

  43. Summary • Computing is a CMS-wide activity • 18 regional centers, ~ 50 sites • Committed to support other CMS activities • support analysis for DAQ, Trigger and Physics studies • Increasing in size and complexity • 1 TB in 1 month at 1 site in 1999 • 170 TB in 6 months at 50 sites today • Ready for full LHC size in 2007 • Exploiting new technologies • Grid paradigm adopted by CMS • Close collaboration with LCG and EU and US grid projects • Grid tools assuming more and more importance in CMS ACAT'03 - KEK

More Related