1 / 13

CMS test of EDG Testbed

CMS test of EDG Testbed. C. Charlot / LLR-École Polytechnique. Production MC CMS Objectifs Résultats Conclusions et perspectives. CMS jobs description. CMKIN : MC Generation of the proton-proton interaction for a physics channel (dataset)

gmarion
Download Presentation

CMS test of EDG Testbed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS test of EDG Testbed C. Charlot / LLR-École Polytechnique Production MC CMS Objectifs Résultats Conclusions et perspectives Réunion DataGrid France, Lyon, fév. 2003

  2. CMS jobs description CMKIN : MC Generation of the proton-proton interaction for a physics channel (dataset) CMSIM: Detailed simulation of the CMS detector, processing the data produced during the CMKIN step CMKIN Job CMSIM Job Write to Grid Storage Element Read from Grid Storage Element Write to Grid Storage Element Output data Output data Grid Storage * PIII 1GHz 512MB, 46.8 SI95 Réunion DataGrid France, Lyon, fév. 2003

  3. CMS EDG CE CE CE CE parameters CMS software CMS software CMS software CMS software Job output filtering Runtime monitoring SE JDL data registration Push data or info WN Pull info CMS production components interfaced to EDG middleware SE RefDB BOSS DB Workload Management System SE UI IMPALA/BOSS input data location SE CE Replica Manager SE Réunion DataGrid France, Lyon, fév. 2003

  4. Accessing information BOSS DB Workload Management System boss SQL dg-job-status CMS UI Workstation Logging & Bookkeeping edg-replicamanager-xxx Replica Manager Replica Catalog Information System (MDS) Réunion DataGrid France, Lyon, fév. 2003

  5. Main Objectives • Verify the portability of CMS production environment in Grid environment • Assess the robustness of the EDG middleware in a production environment • Produce data for physics studies • As part of a production involving non grid sites • Target was 1M events (250k initialy) Réunion DataGrid France, Lyon, fév. 2003

  6. Technical objectives and choices • Job submission • Use 4 UIs, 4 RBs, 1 RB per UI • Proved usefull, 512 jobs limitation • Easily possible to switch to another RB • Some UIs close to their RB, others far • Resubmission disabled • Data management • Jobs writting data to a dedicated SE, jobs writing data to close SE • Dedicated UI configurations (jdl creation in impala) • Replication of cmsim output to CERN (offline) • Also monitored by Boss • Two sites with MSS • Two MSS systems • One site with direct MSS interface, the other with additional SE MSS enabled Réunion DataGrid France, Lyon, fév. 2003

  7. Technical objectives and choices • Data management (cont’d) • Two sites with MSS • One UI sending jobs using MSS interface • Dedicated rc.conf, dedicated stage command • Thus replication between • Disk => MSS • MSS => MSS (although not direct) • Only one RC • One logical collection per UI Réunion DataGrid France, Lyon, fév. 2003

  8. Sites and resources • Global services highly distributed • CERN: top MDS, two RBs • CNAF: RC, RB • CC-IN2P3: EDG software repository, CMS software repository • Marseille: user authorization • NIKHEF: CMS VO • IC: RB • Submission sites distributed as well • Bologna, LLR, Padova, IC • Core sites • CERN, CC-IN2P3, CNAF, NIKHEF, RAL • CMS added sites • Legnaro, Padova, Imperial College, LLR(waouh!) • Added on the fly during the test Réunion DataGrid France, Lyon, fév. 2003

  9. Phenomenology of problems • Problems related with information system • Highly instable, • An important source of Aborted jobs • low submission rate adopted, in particular for cmsim • Change to dbII (1.4.0) during the test • Much better • Still pbs when one GRIS hangs • Replica Management and Catalog limitations • RC slowing, getting stuck by too many access (short jobs) • Limitation in number of (lengthy) entries • Split into several collections • High rate of failure of edg-rm commands • Network problems • Time out in InputSandbox transfer from UI to RB • Problems related to job submission • See next slide Réunion DataGrid France, Lyon, fév. 2003

  10. EDG reasons of failure (categories) Preliminary analysis of pre Xmas (1.4.0) Réunion DataGrid France, Lyon, fév. 2003

  11. CMS/EDG Summary of Stress TestPreliminary Analysis After Stress Test – Jan 03 Short jobs Long jobs After Stress Test – Jan 03 Réunion DataGrid France, Lyon, fév. 2003

  12. Nb. of evts time CMS use of the system (Statistics) • Part of the CMS official production • Production testbed • (3 weeks period) SEs CEs Réunion DataGrid France, Lyon, fév. 2003

  13. Main results and observations from CMS work • RESULTS • Could distribute and run CMS s/w in EDG environment • Could increase resources by adding CMS sites on the fly • Generated ~250K events for physics with ~10,000 jobs in 3 week period • OBSERVATIONS • Were able to quickly add new sites to provide extra resources • Fast turnaround in bug fixing and installing new software • Test was labour intensive (since software was developing and the overall system was fragile) • WP1 At the start there were serious problems with long jobs- recently improved • WP2 Replication Tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory • WP3 The Information System based on MDS performed poorly with increasing query rate • The system is sensitive to hardware faults and site/system mis-configuration • The user tools for fault diagnosis are limited • EDG 2.0 should fix the major problems (see talks by R Jones and E Laure) providing a system suitable for full integration in distributed production Réunion DataGrid France, Lyon, fév. 2003

More Related