1 / 6

Juggling Jobs

Juggling Jobs. Jug is a python-based job management system, borrowing ideas from DAGMan, MOP, BOSS, Hawk, and probably others. Filling the Jug Database. MCRunjob “configurator” Inserts a batch of job entries into Jug from a general workflow description.

brede
Download Presentation

Juggling Jobs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Juggling Jobs Jug is a python-based job management system, borrowing ideas from DAGMan, MOP, BOSS, Hawk, and probably others.

  2. Filling the Jug Database • MCRunjob “configurator” Inserts a batch of job entries into Jug from a general workflow description. May be driven by RefDB, the CERN assignment database. • Or native Jug syntax for stand-alone use Batch #child batch name = “edde.cmkin” seed_low = 120000 seed_high = seed_low + 400 software = “/cms/sw/cmkin_edde” environment = EVENTS_PER_JOB = 250 Batch #parent batch name = “edde.oscar” parent name = “edde.cmkin” input_files = “*.ntpl” software = “/cms/sw/oscar_3_3_2” “/cms/pool” environment = DATASET = “edde” OWNER = “edde_oscar332”

  3. Batch Management The “DAG in a database” may be monitored and extended at any time. User may drill into aggregate view to inspect details.

  4. Drill-Down Run Analysis

  5. Lazy Scheduling • Schedule by “competing pull” • Load balancing without prediction. • Nodes may race on same job. • Storage pulls output and provides two-phase commit of job. • Submission of workers to batch queue or grid may be balanced across multiple machines, including remote submission points.

  6. Autonomous Execution • Work-loop is a pipelined queue • Output may be queued • Pre-staged input may be queued • So jobs can keep running in the face of network and service outages. • Handling loss of contact with workers • Write them off but welcome them back. • Two-phase commit of output prevents race conditions. • Optimistic approach maximizes throughput.

More Related