1 / 15

Pilot Factory using Schedd Glidein

Pilot Factory using Schedd Glidein. Barnett Chiu BNL 10.04.07. Problem to solve (1). Pilot Probe the resource (http, environment, interpreter, other executables …etc) Pull jobs from remote server (e.g. Panda server) Matchmaking Group jobs in different categories

jquintana
Download Presentation

Pilot Factory using Schedd Glidein

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07

  2. Problem to solve (1) • Pilot • Probe the resource (http, environment, interpreter, other executables …etc) • Pull jobs from remote server (e.g. Panda server) • Matchmaking • Group jobs in different categories E.g Production jobs, Analysis jobs (CHARMM …), Test jobs … • Other criteria: Number of CPUs, RAM … etc

  3. Problem to Solve (2) • Current approach of pilot submissions • Local pool : Vanilla • Remote pool: Condor-G • Large amounts of user jobs (production + analysis) ~ large amount of Condor-G pilot jobs ~ computational overhead on gatekeepers (e.g. large memory consumptions)

  4. Solution • Is there any way to bypass GRAM to submit jobs to remote machines? • Local submissions, but how? • We need something that continuously submit local pilot jobs on the gatekeeper • Solution: Pilot Factory

  5. Pilot Factory Overview • Pilot Factory is an application that combines the following ideas: • schedd glidein • pilot submission program (or pilot generator) • What is glidein? • Mini-Condor pool on a remote machine • A complete Condor pool has at least 5 components: i.e. master, startd, schedd, collector, negotiator • Glidein: {master, startd}, {master, schedd}, … etc • Properly configured condor daemons submitted as batch job

  6. Glidein (1) • Two major steps Condor-G #1: installation glidein setup script condor configuration file glidein startup script download Condor binaries (http, gsiftp …etc) Condor-G #2: execution exec glidein startup script  condor_master

  7. Submit Host Central Manager Master master master startd schedd schedd master master master master startd startd schedd startd Glidein (2) ~/Condor_glidein Tarball server Startup script Glidein config {master, schedd …} ? Collector … Glidein types Execute hosts

  8. Schedd Glidein • Logics based on startd glidein (two Condor-G to set up ) • Usage: By running glidein schedd on gatekeeper, the schedd then serves as a gateway between submit host and grid sites • Mini Condor pool with schedd functionalities: • Submit host • Maintain persistent queue of jobs • Communicate with native batch system and forward user jobs • Condor, PBS, LSF, …etc • Manipulate job queues through the followoing commands: • condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio • Security Features(GSI) • Who is authorized to set up Pilot Factory?

  9. Schedd Glidein Example (1) • Command: // schedd glidein #1 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork-type schedd –forcesetup • Command: // schedd glidein #2 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork-type schedd –forcesetup • Command: // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-fork-type schedd –forcesetup Use fork since we want schedd to be on gatekeeper!

  10. Schedd Glidein Example (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs agrd0926@gridgk01.ra gridgk01.r 0 0 0 agrd0926@gridgk02.ra gridgk02.r 0 0 0 pleiades@gridui01.us gridui01.u 0 0 0 pleiades@ribera.cs.w ribera.cs. 0 0 0 pleiades@ron.cs.wisc ron.cs.wis 0 0 0 pleiades@vail.cs.wis vail.cs.wi 0 0 0 TotalRunningJobs TotalIdleJobs TotalHeldJobs Total 0 0 0

  11. Pilot Submission Program (Generator) • Communicate with a DB server that maintains information about pilot jobs • E.g. pilot_type, pilot_queue • Pulls desired pilot script from an external server • Periodically submit pilot jobs (with pilot script as executable) • condor_submit • qsub? No, not necessary, since …

  12. master schedd LSF PBS schedd Build Pilot Factory with Glidein Grid Resource • Schedd glidein installed and executed on the gatekeeper • User submit a Condor-C job with pilot generator as the executable • Generator runs on the gatekeeper as a local universe job supervised by the glidein schedd • Generator submits pilots • Types, frequency adjustable by users • Depending on the native batch system, pilots can be submitted as grid universe jobs • Along with GAHP and related binaries, schedd has the ability to communicate different batch systems JobManager ~ Pilot generator

  13. master schedd ~ Pilot Factory Cluster Worker Nodes Pilot Factory Connected to Collector Glidein request Submit Pilots Submit Node (Collector, Master, Negotiator, Schedd) Gatekeeper with {Globus, Condor|PBS|…}

  14. Future Work • Integrating pilot with Condor startd to implement startd-based pilot • the startd-based pilot retrieves the payload of a user job in the same way as does the generic pilot but in addition, it also inherits functionalities of Condor startd. • Original intention was to run PFs with the startd-pilots on worker nodes (too greedy, unacceptable?) • Using Condor started makes it easier to integrate with gLexec • Transform Generic PF (GPF) to Startd PF (SPF)

  15. Reference [1] Schedd Glidein [2] Pilot Factory [3] glideinWMS: An advanced application on glideins

More Related