1 / 16

The RunJob Project

The RunJob Project. A Proposal. What is RunJob?. Automatic Job Creation and Submission Metadata description of job steps Produces jobs for a variety of environments Easy to extend to new applications or new environments

poppy
Download Presentation

The RunJob Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The RunJob Project A Proposal Greg Graham, FNAL CD

  2. What is RunJob? • Automatic Job Creation and Submission • Metadata description of job steps • Produces jobs for a variety of environments • Easy to extend to new applications or new environments • Metadata model extends to catalogs or services to do production control or tracking • Links jobs together in tree-like dataflow arrangements Greg Graham, FNAL CD

  3. Who Uses RunJob? • DZero • Monte Carlo Challenges (CHEP 2000, CHEP 2001) • User Monte Carlo production • SAMGrid production (Under construction) • Data Reprocessing • CMS • Monte Carlo Challenges (CHEP 2003) • USCMS Grid, LCG based Grid production • User Monte Carlo Production (Under construction) • Data Reprocessing Greg Graham, FNAL CD

  4. The RunJob Pilot Project • Begun in early Spring 2003 to “merge” the then divergent DZero and CMS versions • ShahKar package created and developed during Summer 2003 with input from DZero and CMS reps. • ShahKar merged with CMS variant MCRunjob in Fall 2003 • but not propagated. • DZero integration pushed back to April 2004. Greg Graham, FNAL CD

  5. Proposal for a Full RunJob Project • Increase manpower to accomplish • better integration with experiments’ planning processes: CDF, CMS, DZero, others? • integration of codebases with the ShahKar code base from pilot project • features and core development happen in RunJob project to satisfy experiments’ needs and schedule • rigorous testing and debugging support, documentation, and release management Greg Graham, FNAL CD

  6. Requirements and Features • The RunJob Project Plan that was distributed comes with some very generally stated “requirements” and a lot of very specific work items • Reflects the need to begin talking to the experiments to tighten up the requirements and map them to specific work items. • The work items reflect developments ongoing within the RunJob pilot project • 12 man-years of experience building production processing systems for HEP applications in many different environments. Greg Graham, FNAL CD

  7. Requirements and Features • “It automatically generates jobs to run my application(s) in a variety of environments” • scriptObject design is a way to better abstract the job descriptions away from the jobs themselves and therefore away from the environments. These are like internal “sandboxes”. (Critical, needed by all) • Development work will include modules tailored for specific environments such as LSF, FBS, PBS, Condor, etc. (Critical, needed by all) • Development work will also include Grid environments and Web Services design work. (TBD, who needs this and when?) Greg Graham, FNAL CD

  8. Requirements and Features • “Later, I can go back and determine ho the job was configured.” • Physics parameters and defaults should not come from RunJob itself (Critical) • Contexts are documents that can record suitable defaults for various applications, groups of applications, or environments. (Critical) • Contexts can currently be combined in a rudimentary fashion; better (algebraic) combination rules lead to more expressiveness and better control in complex environments (TBD; who needs it and when?) Greg Graham, FNAL CD

  9. Requirements and Features • “I need to build jobs across datasets listed in a catalog using parameters in a control DB.” • Observation: everyone comes around to doing this eventually ;-) • Uniform interfaces to catalogs and control databases potentially decrease maintenance costs for all experiments and increase adaptibility to new systems. (TBD, who needs it and when?) • Interfaces to specific catalogs and control DBS are an integration task. (Critical.) Greg Graham, FNAL CD

  10. Requirements and Features • “I need to resubmit jobs when they fail” • Specification of RunJob state just before job creation/submission; this is the “XML” milestone. (Critical) • Storage of RunJob state specifications in an XML database or filesystem. (Critical) • Interface to specific job tracking systems designed by the experiments to do this. (TBD, who needs this and when?) Greg Graham, FNAL CD

  11. Requirements and Features • “I need feature X working by my experiments’ milestone Y.” • These need to be worked out during the negotiation phase this Spring. • The stated specific work items listed in the plan are probably a good cover of the forseeable requirements to come during the negotiation phase • … so on to the manpower estimates ;-) Greg Graham, FNAL CD

  12. Manpower Estimates • My favorite quote: “The plan is OK except possibly for the schedule and the manpower.” • For each listed milestone/deliverable/feature, a SWAG estimate was produced. The SWAGs were then summed, and the result was inflated by 25%. • 40 man-months total effort, not including management or testing. • The Level of Effort (LOE) was used • essentially equal to the average number of “warm bodies” active for the duration of the schedule. • Total FTE = LOE * project duration. Greg Graham, FNAL CD

  13. Manpower Estimates Greg Graham, FNAL CD

  14. Schedule Changes • Deferment cost estimates • Project management and essential functions LOE remain constant • Development driven functions scale against schedule length • Adjusted average LOE = 1.6 + 4.2/(length) • Risk: Can we satisfy the experiments’ milestones? Greg Graham, FNAL CD

  15. Schedule Changes • Cutting Work Items • Analysis cannot really be done without experiments’ input • Cutting Project Roles (eg- dedicated testing) • Analysis cannot really be done without experiments’ input • Probably there is some savings here: development could be pushed further up the integration food chain and into experiments’ variant codebases themselves. • We recommend against this because it dilutes the benefits of cooperation. Greg Graham, FNAL CD

  16. Conclusion • The RunJob project is an exciting opportunity for the RunII experiments and CMS to collaborate on software. • DZero and CMS already use fairly closely related variants. • The RunJob project can build upon • the experience of many people who have been working on it already for years • a successful pilot project that minimally satisfies many requirements already • We are eager to work with the experiments to effectively gather and address their requirements and milestones coherently across the experiments. Greg Graham, FNAL CD

More Related