1 / 13

GRID Workload Management System

GRID Workload Management System. Massimo Sgaravatto INFN Padova. What do we want to implement (simplified design). Resource Discovery. Submit jobs (using Class-Ads). Master. Grid Information Service (GIS). condor_submit (Globus Universe). Master chooses in which

waylon
Download Presentation

GRID Workload Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRID Workload Management System Massimo Sgaravatto INFN Padova

  2. What do we want to implement (simplified design) Resource Discovery Submit jobs (using Class-Ads) Master Grid Information Service (GIS) condor_submit (Globus Universe) Master chooses in which Globus resources the jobs must be submitted Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3

  3. What can be implemented now (GWMS release 0) Not very useful in this model Submit jobs Grid Information Service (GIS) condor_submit (Globus Universe) Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3

  4. Overview • Job management (submission, monitoring) from a single machine using Condor tools • User must explicitly define in which Globus resource (which farm) the jobs must be submitted • The applications and the input files must be stored in the file system of the executing machine • The output files will be created in the file system of the executing machine • We can try to have just the standard input and/or output and/or error files (useful to check the “status” of the production) in the submitting machine, using bypass and/or Globus GASS

  5. Bypass vs. GASS • Bypass • Written by Douglas Thain (Condor team) • Redirection of standard input/output/error of a program to a remote machine when the program is running • Can be used for dynamically linked program • Successfully tested with Pythia • Use of Globus Security Infrastructure • Globus GASS • Possibility to copy the input file on the remote machine before the execution, and have the output file back after the execution (otherwise it is necessary to modify the source code)

  6. Status of GWMS release 0 • Tests on basic capabilities and functionalities have been performed • Some tests with real applications (Pythia, CMSIM) performed • No “stress” tests performed to evaluate scalability, reliability, … • Problems with scalability and fault tolerance found (Globus jobmanager)

  7. What is necessary for GWMS rel. 0 • Local farms with shared file system between the various nodes • Installation of proper experiment environment and applications on these farms • Local resource management system to manage the local farm • Fork • Warmly thoughtless (even for a single machine) • Necessary to install Globus on each machine • Job queuing up to the production manager • LSF • Local Condor pool • PBS • Tests on Globus-PBS interaction must be completed (i.e. farm environment) • Tests on Condor-G – Globus – PBS not performed yet • Globus • One installation per each farm (on a “visible” node) • Installation using INFNGRID distribution

  8. INFNGRID distribution • Done by INFN GRID release team (F. Donno, A. Sciaba`, Z. Xie) • Version 1.1 released !!! • Precompiled version for Linux Red Hat 6.1 • Scripts that make simpler and more “automatic” installation and deployment • Supported local resource management system: LSF, Condor • Possibility to implement INFN customizations • Certificates • “Test” GIS Architecture • Installation instructions (http://www.pi.infn.it/GRID/GRID_INST_1.1.html)

  9. Certificates • Use of personal certificates and host certificates signed by INFN CA • User certificates signed by Globus CA are accepted as well • By default it is not possible to “use” Globus resources outside INFN using personal certificates signed by INFN CA. Is this a problem ??? • Workaround 1: Users have also personal certificates signed by Globus CA • Workaround 2: “Small” modification in the Globus configuration of these resources outside INFN in order to accept “our” certificates too

  10. GIS Architecture (test phase) Dc=infn,dc=it, o=grid Top Level INFN GIIS Implemented Implemented using INFNGRID distribution To be implemented Exp=atlas, o=grid Dc=bo, Dc=infn, dc=it,o=grid INFN ATLAS GIIS Dc=mi,Dc=infn, dc=it,o=grid GIIS GIIS GRIS Milano Bologna

  11. INFNGRID distribution • Next release • Solaris 2.6 • Support of PBS as local resource management system • GDMP • Other works, changes, bug fixes “triggered” by users/administrators • Necessary to define relationship with DataGrid !!!

  12. What is necessary • Condor-G • Used by the production manager to submit jobs • Scripts to run productions using this GRID environment • Tools to “monitor” production • condor_q • Condor Job Viewer • Java GUI

  13. (Some) next steps • Tests with real applications and real environments • CMS fall production • Fix the problems • Globus jobmanager • Who, how, relations with Globus team, relations with Condor team ??? • … • GIS – ClassAds converter • Globus team ??? • Master implementation • Who, how, … ??? • The default GIS schema must be integrated with other info (the information on characteristics and status of local resources and on jobs is not enough) • We need to identify which other info are necessary • Much more clear during Master design • Packaging ???

More Related