1 / 13

Evolution of BOSS, a tool for job submission and tracking

Evolution of BOSS, a tool for job submission and tracking. W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang. Imperial College London. Introduction to BOSS. Previous features and usage. New functionality. Reengineering of the design.

Download Presentation

Evolution of BOSS, a tool for job submission and tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang. Imperial College London

  2. Introduction to BOSS. Previous features and usage. New functionality. Reengineering of the design. Current status and plans. Outline

  3. Batch Object Submission System. See Previous talk at CHEP03, monitoring track, THET001. A tool for batch job submission, real time monitoring and book keeping. Interfaced to many schedulers both local and grid. Utilizes relational database for persistency. Full logging and bookkeeping information stored. Job commands: submit, kill, query and output retrieval. Can define custom job types which allows specify monitoring unique to the submitted application. Introduction

  4. Used in CMS MC production for 4 years. Prototype CMS distributed analysis system (GROSS) based on BOSS and later new analysis system using BOSS. Last year it was decided that the BOSS architecture needed to be redesigned in order to meet the changing requirements of CMS computing. BOSS in CMS computing Production / analysis tool BOSS Logging & bookkeeping monitoring

  5. User specifies job - parameters including: Executable name. Executable type - turn on customized monitoring. Output files to retrieve (for sites without shared file system and grid). User tells Boss to submit jobs specifying scheduler i.e. PBS, LSF, SGE, Condor, LCG, GLite etc.. Job consists of job wrapper, Real time monitoring service and users executable. V3.x workflow I Wrapper BOSS Scheduler farm node farm node boss submit boss query boss kill BOSS DB

  6. V3.x workflow II BOSS DB journal #!/usr/bin/perl $i = 0; while($i<3){ sleep(1); $i++; print "counter $i\n"; } User job 1234 test counter 1 BOSS dbUpdator test JOBID COUNTER 12345 0 1234 test counter 2 1234 test counter 3 2 3 1 #!/usr/bin/perl while(<STDIN>){ if($_=~/.*counter\s+(\d+).*/){ print “COUNTER=$1\n"; } } counter 1 BOSS jobExecutor counter 2 output counter 3 COUNTER=3 COUNTER=1 COUNTER=2 Filter • Once running wrapper starts real time monitoring services and users executable. • Writes all logging information (start time, finish time, exit code etc.) to local journal file. • Monitoring services parse job output looking for regular expressions specified by the job type. • Monitoring info saved to journal file and returned to the user via a database connection to the BOSS DB or via R-GMA (if possible).

  7. Using BOSS user can get status of jobs, pulling in information from BOSS DB, scheduler and Real-time Monitoring DB. When job finished output automatically stored at final destination if possible (i.e. shared file system on local cluster) if not (i.e, LCG) output must be fetched by separate BOSS command. If Real Time monitoring not available (i.e. firewall) BOSS DB can be updated from journal file. V3.x workflow III % boss q -all -specific -type test ID S_USR EXECUTABLE ST EXE_HOST START TIME STOP TIME counter 1 grandi test.pl 15 E pccms10.bo 14:30:00 06/06 14:30:16 06/06 3 2 grandi test.pl 15 R pccms10.bo 14:30:02 06/06 -------------- 2

  8. Following experience from CMS MC and distributed analysis systems it was decided to re-engineer BOSS. Provide a C++ and Python API (via SWIG) to allow higher level tools to steer BOSS. Introduce task, chain and program. Program is the users executable. Chain is an arbitrarily complex set of different programs run on the same worker node. Task is a group of homogeneous jobs that may be executed in parallel. In order to describe new task hierarchy move to xml task descriptions. Separate bookkeeping from real time monitoring. Improve real time monitoring but leave as optional. Allow multiple real time monitoring mechanisms. Allow pluggable chaining tools i.e. ShReek (CHEP06 id 276). Proposed changes

  9. Separate users logging and (optional) monitoring DB’s. Only allow access to logging DB via BOSS tools. i.e. remove all server requirements (allows personal db implementation in SQLite on local disk). Fill logging database with BOSS tools from information in monitoring DB and journal file retrieved at end of job. Real time server updated by updater on worker node. Transport mechanism possibly utilizing a proxy server. Real time update mechanism possible implementations R-GMA, MonaLisa etc… Allow for different RT mechanisms for each job. Information in monitoring database expires. Logging and Monitoring

  10. New data flow

  11. Job wrapper will start chainer and monitoring modules. Job chainer will launch each executable separately within its own environment. Job wrapper will provide 2 levels of monitoring, job and executable level. Job level monitoring includes overall variables such as total time, total memory usage etc.. Executable monitoring will monitor the executables progress and journal. Future plans include allowing action to be taken if certain circumstances are met - i.e running out of memory, detecting infinite loops etc. New job wrapper Chain J o b C ha i n i ng Program Task Task J o b E x e c ut er (wra pp er) TaskExec u tor ProgramExecuter TaskExec u tor stdin stdin pr e - filter stdin pr e - filter J o b M on i tor pr e - filter J o ur n al user (real - ti m e runt i me - filter user exec runt i me - filter user u p d a t er) exec runt i me - filter exec post - filt e r post - filt e r stdout post - filt e r stdout stde r r stdout stde r r stde r r

  12. <?xml version="1.0" encoding="UTF-8" standalone=”yes"?> <task> <iterator name=“ITR” start=“0” end=“100” step=“1”> <chain scheduler="glite” rtupdater="mysql" ch_tool_name="jobExecutor"> <program exec="test.pl" args=”ITR" stderr="err_ITR” program_type="test” stdin="in” stdout="out_ITR" infiles="Examples/test.pl,Examples/in” outfiles="out_ITR,err_ITR” outtopdir="" /> </chain> </iterator> </task> Sample Task specification • Example of task containing 100 chains each consisting of 1 program. • Program specific monitoring activated - results returned via MySQL connection.

  13. Significant new functionality identified and being actively integrated into BOSS. Latest release v3.6 includes much of the new functionality: Tasks, job and executables. XML task description. C++ and Python API’s Basic executable chaining - currently only default chainer with linear chaining. Separate logging and monitoring DB’s. Implemented DB’s in either MySQL or SQLite (more to come). Optional RT monitoring with multiple implementations, currently only MonaLisa and direct MySQL connections (to be deprecated). Still to be done: Allow chainer plugins. Implement more RT monitoring solutions i.e R-GMA. Finalize API. Look at writing wrapper in scripting language i.e Perl/Python. Status and plans

More Related