1 / 18

First ideas for a Resource Management Architecture for Productions

First ideas for a Resource Management Architecture for Productions. Massimo Sgaravatto INFN Padova. First step. Submit jobs (using globusrun). GRAM. GRAM. GRAM. CONDOR. LSF. PBS. Site1. Site2. Site3. Overview. GRAM as uniform interface to different resource management systems

carl
Download Presentation

First ideas for a Resource Management Architecture for Productions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova

  2. First step Submit jobs (using globusrun) GRAM GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3

  3. Overview • GRAM as uniform interface to different resource management systems • Job submission from a single location • Users must explicitly specify in which Globus resources (Condor pool, LSF cluster, …) the jobs must be executed • Usage of Globus tools (globusrun, globus-job-status, …) to “manage” the jobs • Are these “robust” tools with all the required capabilities ???

  4. Usage examples %globusrun –b –r lxpd.pd.infn.it/jobmanager-lsf –f file.rsl file.rsl: & (executable=$(CMS)/startcmsim.sh) (stdin=$(CMS)/Pythia/run.1) (stdout=$(CMS)/Cmsim/log.1) (count=1) (queue=cmsprod) %globusrun –b –r lxbo.bo.infn.it/jobmanager-condor –f file.rsl file.rsl: & (executable=$(CMS)/startcmsim.sh) (stdin=$(CMS)/Pythia/run.1) (stdout=$(CMS)/Cmsim/log.1) (count=1)

  5. What has been tested so far http://www.pd.infn.it/~sgaravat/ INFN-GRID/Globus/gram-report.pdf • Tests only with simple programs (just to evaluate the capabilities and functionalities) • No tests with “real” applications • No “stress tests” (to evaluate reliability, robustness, …) • GRAM – LSF: tested • Seems working

  6. What has been tested so far • GRAM – Condor: tested • GRAM assumes that the underlying environment is a “uniform” Condor pool (in particular for Vanilla jobs) • Difficult to consider the INFN WAN Condor pool as Globus resource • Usage of local “uniform” Condor pools ??? • GRAM – PBS: not tested

  7. Second step Submit jobs (using condor_submit and Globus Universe) Personal Condor globusrun GRAM GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3

  8. Overview • Personal Condor able to provide robustness and reliability • Job submission from a single location • Users still must explicitly specify in which Globus resources the jobs must be executed • Usage of Condor interface and tools (condor_submit, condor_q, …) to “manage” the jobs • “Robust” tools with all the required capabilities (monitor, logging, …)

  9. Usage examples %condor_submit file.cnd file.cnd: Universe=globus executable=$(CMS)/startcmsim.sh input=$(CMS)/Pythia/run.1 output=$(CMS)/Cmsim/log.1 GlobusScheduler=lxpd.pd.infn.it/jobmanager-lsf queue 1 %condor_submit file.cnd file.cnd: Universe=globus executable=$(CMS)/startcmsim.sh input=$(CMS)/Pythia/run.1 output=$(CMS)/Cmsim/log.1 GlobusScheduler=lxbo.bo.infn.it/jobmanager-condor queue 1

  10. Second step (option 2) Submit jobs (using condor_submit and Globus Universe) Personal Condor Condor Flocking globusrun condor_submit GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3

  11. Second step (option 3) Submit jobs (using condor_submit and Globus Universe) Personal Condor globusrun condor_submit GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3 Single Condor Pool

  12. Problems • The Globus Universe architecture is only a prototype • Only best effort support by Condor team • Tests not completed • Ongoing tests (considering the fork system call as underlying resource management system) • Tests considering the Globus Universe and LSF or Condor as underlying resource management system have not yet been performed • PBS • Is it supported by the Globus Universe mechanisms ??? • Do we need it ??

  13. Third step Resource Discovery Master GIS Submit jobs condor_submit (Globus Universe) Information on characteristics and status of local resources Personal Condor globusrun GRAM GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3

  14. Overview • Master smart enough to decide in which Globus resources the jobs must be submitted • The Master uses the information on characteristics and status of resources published in the GIS

  15. Problems and work needed • The Master doesn’t exist  We have to implement it • It is necessary to define the GIS architecture • The local GRAMs provide the GIS with not enough information The default schema must be integrated

  16. GRAM & Condor & GIS

  17. GRAM & LSF & GIS

  18. Fourth step Data Catalog Data Mover Data Discovery Resource Discovery Master GIS Submit jobs condor_submit (Globus Universe) Personal Condor globusrun GRAM GRAM GRAM CONDOR LSF PBS Site1 Site2 Site3 Information on characteristics and status of local resources

More Related