1 / 4

Low Latency Invocation in Condor Project Overview

Low Latency Invocation in Condor Project Overview. Common usage scenario of Condor is to run a big batch of similar jobs , which differs only by arguments or input file (for example: Blast application, physical simulations, …) Condor will just run each job independently from others.

dante
Download Presentation

Low Latency Invocation in Condor Project Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Latency Invocation in Condor Project Overview

  2. Common usage scenario of Condor is to run a big batch of similar jobs, which differs only by arguments or input file (for example: Blast application, physical simulations, …) • Condor will just run each job independently from others. • Overhead of invocation process in Condor is high – matching, claiming, executable/input transfer. • We spare those unnecessary actions by performing them only once on each execution machine • The general idea is to submit an “Agent” program to Condor, instead of the real jobs. Agent is responsible to run actual jobs on execution machine while main attention is put on performance enhancement.

  3. Master - Agents invocation algorithm – general flow • User submit jobs to special daemon – “Master” • Master submits agents to Condor • When agent starts running on execution machine it connects its master and starts to receive jobs to run • Executable and common input files transferred only once • While agent executes one job its additional thread transfers back to master the results of previous one and immediately receives next job • Master is responsible to push jobs to agents in a greedy way • Special recovery algorithm was develop in case some agent fails (its machine crashes)

  4. Collector/Negotiator 2 0 2 1e 3 Startd Condor_submit Schedd 4 1d 4 Starter 4 1c MASTER 5 6 invocator.sub invocator.exe Queue 1 Condor_submit agent.exe Shadow 6 Chef 1b Job.exe Master_Submitter 1e EM Chef 3 Condor_submit Schedd 1a Chef Startd 4 Jobs.sub Queue 1000 1d 4 4 Chef invocator.sub invocator.exe Queue 1 Starter Shadow 6 5 Chef agent.exe 6 Job.exe SM EM For short job of up to 20 second with big common input files our algorithms achieves speedup of up to 8 times !!!!

More Related