1 / 17

Basic Grid Job Submission

Basic Grid Job Submission. Alessandra Forti 28 March 2006. Outline. Grid components Job submission Documentation. Grid Components RB and LB. RB=Resource Broker the equivalent of a batch server. Jobs land on a RB which decides depending on the User specifications where to send the Job

twyla
Download Presentation

Basic Grid Job Submission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Grid Job Submission Alessandra Forti 28 March 2006

  2. Outline • Grid components • Job submission • Documentation

  3. Grid Components RB and LB • RB=Resource Broker • the equivalent of a batch server. Jobs land on a RB which decides depending on the User specifications where to send the Job • LB=Logging and Bookkeeping • Handles the information about the jobs submitted through a specific RB. Normally a RB and a LB resides on the same machine but it is not necessary. A user can submit a job to an RB and log them on an LB at another site

  4. Grid Components UI and CE • UI=User Interface • The front end where the grid clients accessible to the users reside. It has login access. It can be located anywhere. A laptop with UI software on it can access grid resources. • CE=Computing Element • The gateway to a local batch system. It hanldes final authentication and authorization to access the local batch system. It can be on the same machine as the batch server but is not required.

  5. Grid Components SE • SE=Storage element • Gateway to the data. In the simplest form it is basically a GridFTP server, but this type is considered obsolete now. It handles authentication and authorization to access the local data. • SRM=Storage Resource Manager • SRM is a protocol designed to hide the implementation actually used as a backend from the user. Backends are now more sophisticated and have storage management tools and will support access policies based on the DN of the certificate rather than simple unix IDs • It is the current version of a SE

  6. Grid Componenents IS • IS=Information system • Each site publishes an amount of information about the resources available in the IS. • IS has a hierarchical structure (top level BDII, site BDII, service GIIS). The user only sees the top level BDII and this is what the RB, WN and UI see as well. • Generic top level BDII will contain all the same information (they are replicas). However a VO might want to run its own BDII containing only information about resources open to it. This gives control to the VO to select good sites in a way transparent to the user.

  7. Work Load Management

  8. Job Submission • First step login to a UI • A desktop in the department, your laptop, lxplus at CERN… if correctly configured they should be equivalent • You need your certificate and key in $HOME/.globus • If they have the wrong permissions the software will complain. This is to remind users to protect their certificates. • grid-proxy-init Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti Enter GRID pass phrase for this identity Creating proxy ................................................................ Done Your proxy is valid until: Wed Mar 29 02:00:20 2006

  9. Job submission: JDL language • JDL=Job Description Language • To submit a job you’ll need to write what is called a jdl file in which you specify the type of resources the job needs and what files the job needs to find. • Unfortunately not everything can be specified. Only what can be retrieved by the IS and the catalogs and sometimes not even that if the RB can’t handle it. • Most simple jdl is cat testJob.jdl Executable = “test.sh"; StdOutput = "testJob.out"; StdError = "testJob.err"; InputSandbox = {"./test.sh"}; OutputSandbox = {"testJob.out","testJob.err"};

  10. How to list resources for a job edg-job-list-match --vo dteam testJob.jdl Selected Virtual Organisation name (from --vo option): dteam Connecting to host lcgrb01.gridpp.rl.ac.uk, port 7772 ******************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* ce01.tier2.hep.manchester.ac.uk:2119/jobmanager-pbs-dteam ********************************************************************

  11. Submit a Job edg-job-submit --vo dteam testJob.jdl Selected Virtual Organisation name (from --vo option): dteam Connecting to host lcgrb01.gridpp.rl.ac.uk, port 7772 Logging to host lcgrb01.gridpp.rl.ac.uk, port 9002 ****************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://lcgrb01.gridpp.rl.ac.uk:9000/rxNaz6lEWskgJ5cxw9wnHQ ******************************************************************************

  12. Check the job status edg-job-status https://lcgrb01.gridpp.rl.ac.uk:9000/rxNaz6lEWskgJ5cxw9wnHQ ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://lcgrb01.gridpp.rl.ac.uk:9000/rxNaz6lEWskgJ5cxw9wnHQ Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: lcg-ce.ecm.ub.es:2119/jobmanager-pbs-dteam reached on: Tue Mar 28 13:16:12 2006 *************************************************************

  13. Cancel a Job edg-job-cancel https://lcgrb01.gridpp.rl.ac.uk:9000/rxNaz6lEWskgJ5cxw9wnHQ Are you sure you want to remove specified job(s)? [y/n]n :y =============== edg-job-cancel Success ================ The cancellation request has been successfully submitted for the following job(s): - https://lcgrb01.gridpp.rl.ac.uk:9000/rxNaz6lEWskgJ5cxw9wnHQ =============================================== • This command works only for jobs that have been already scheduled or are already running

  14. Check status of all jobs edg-job-status --all --vo dteam Selected Virtual Organisation name (from --vo option): dteam Retrieving Information from LB server lcgrb01.gridpp.rl.ac.uk:9000 Please wait: this operation could take some seconds. ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://lcgrb01.gridpp.rl.ac.uk:9000/8WQJwag-9YMC4Lk2SHye5g Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: ce01.tier2.hep.manchester.ac.uk:2119/jobmanager-pbs-dteam reached on: Tue Mar 28 13:32:44 2006 ************************************************************* ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://lcgrb01.gridpp.rl.ac.uk:9000/pCsb-7yWvzSGHfr_jCqTqQ Current Status: Running Status Reason: Job successfully submitted to Globus Destination: grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-large reached on: Tue Mar 28 13:44:49 2006 *************************************************************

  15. Retrieve the output edg-job-get-output https://lcgrb01.gridpp.rl.ac.uk:9000/pCsb-7yWvzSGHfr_jCqTqQ Retrieving files from host: lcgrb01.gridpp.rl.ac.uk ( for https://lcgrb01.gridpp.rl.ac.uk:9000/pCsb-7yWvzSGHfr_jCqTqQ ) ******************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lcgrb01.gridpp.rl.ac.uk:9000/pCsb-7yWvzSGHfr_jCqTqQ have been successfully retrieved and stored in the directory: /tmp/jobOutput/aforti_pCsb-7yWvzSGHfr_jCqTqQ *******************************************************************

  16. The Output less /tmp/jobOutput/aforti_pCsb-7yWvzSGHfr_jCqTqQ total 32 drwxrwxr-x 2 aforti aforti 4096 Mar 28 14:49 ./ -rw-rw-r-- 1 aforti aforti 0 Mar 28 14:49 testJob.err -rw-rw-r-- 1 aforti aforti 22052 Mar 28 14:49 testJob.out

  17. Documentation • LCG Main page • http://lcg.web.cern.ch/LCG/ • Latest version of LCG User Manual • https://edms.cern.ch/file/454439//LCG-2-UserGuide.html

More Related