1 / 13

Problem of Application Job Monitoring in GRID Systems

Problem of Application Job Monitoring in GRID Systems. V. Kalyaev ( kalyaev@theory.sinp.msu.ru ), A. Kryukov ( kryukov@theory.sinp.msu.ru ) SINP MSU, Moscow. A.Kryukov NEC-2003, Varna, 15-20 September. Outlook. Introduction Impala/McRunJob solution

klohmann
Download Presentation

Problem of Application Job Monitoring in GRID Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problem of Application Job Monitoring in GRID Systems V. Kalyaev (kalyaev@theory.sinp.msu.ru), A. Kryukov (kryukov@theory.sinp.msu.ru) SINP MSU, Moscow A.Kryukov NEC-2003, Varna, 15-20 September

  2. Outlook • Introduction • Impala/McRunJob solution • GRID and Application Job Monitoring • Conclusion A.Kryukov NEC-2003, Varna, 15-20 September

  3. Introduction: Job Monitoring in GRID In the GRID there are some monitoring facilities. However, these facilities just fixed general status of jobs: • Scheduled • Running • Canceled • Finished It is completely insufficient for complex applications. A.Kryukov NEC-2003, Varna, 15-20 September

  4. What is Application Job Monitoring? Let us consider very simple example: CMSIM. Summary information of the program is a number of generated events. The knowledge of this number can be used by user for diagnostic of the process of generation of events. So, it is very important to supply user some specific information from application in real-time mode. A.Kryukov NEC-2003, Varna, 15-20 September

  5. MC Event Simulation for LHC(on CMS example) • Simulation of physical events • Pythia • Detector simulation • GEANT-3/4 • Digitization (overlap, noise) • ORCA • Reconstruction • ORCA A.Kryukov NEC-2003, Varna, 15-20 September

  6. MySQL server JOB MySQL client JOB MySQL client Impala/McRunJob scheme • Insecurity. • User have to know where information is. • Predefine type of monitoring information. A.Kryukov NEC-2003, Varna, 15-20 September

  7. MC event generation with GRID GRIDMiddleWare PC farm RB PC farm A.Kryukov NEC-2003, Varna, 15-20 September

  8. MySQL server MC event generation with GRID GRIDMiddleWare PC farm RB PC farm A.Kryukov NEC-2003, Varna, 15-20 September

  9. Application Job Monitoring Scheme UI WN RB CE atm-user-register atm-job-wrapper atm-job-register Original job atm-jdl-parser edg-job-submit monitor ATM DB atm-job-register-c Allowed user DB atm-register-s Allowed job DB atm-user-register-c Job status DB atm-user-register-s atm-job-monitor-s A.Kryukov NEC-2003, Varna, 15-20 September

  10. Job status DB Authentication Application Job Monitoring: Web Interface Web Server Web Client A.Kryukov NEC-2003, Varna, 15-20 September

  11. JDL Example Executable = “atm-wrapper”; StdOutput = “aliroot.out”; StdError = “aliroot.err”; InputSandbox = {“atm-wrapper”,“start_aliroot2.sh”,” rootrc”,”grun2.C”,”Confiig.C”}; OutputSandbox = {“aliroot.err”,”alirot.out”,”galice.root”}; RetryCount = 10; Arguments = -id=123 –password=567 –site=test.domain /bin/sh start_aliroot.sh 3.02.04 3.07.01; Requirements = Member(other.RunTimeEnvironment,”ALICE-3.07.01”); The old JDL file is converted to new one automatically. A.Kryukov NEC-2003, Varna, 15-20 September

  12. Problems of IO Buffering • If a program send to standard output something like “completed 20 from 200 events”, then output buffer will complete after 20 hours of work. • modify code to invoke IO buffer flush • forbid use of IO buffer. A.Kryukov NEC-2003, Varna, 15-20 September

  13. Conclusions • Security • GSI • User can monitor his jobs only. • Monitoring information • In current realization – standard output. • There is Web interface for authorize access to application job status • We plan to re-implement the scheme by using OGSA/Globus3. A.Kryukov NEC-2003, Varna, 15-20 September

More Related