1 / 43

SUMS ( STAR Unified Meta Scheduler )

SUMS ( STAR Unified Meta Scheduler ). SUMS is a highly modular meta-scheduler currently in use by STAR at there large data processing sites (ex. RCF / PDSF). It is also used by other organizations such as Stony Brook University and as a back end to some PHENIX GUI applications.

creola
Download Presentation

SUMS ( STAR Unified Meta Scheduler )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SUMS (STAR Unified Meta Scheduler) • SUMS is a highly modular meta-scheduler currently in use by STAR at there large data processing sites (ex. RCF / PDSF). It is also used by other organizations such as Stony Brook University and as a back end to some PHENIX GUI applications. • STAR has been using SUMS for 3.5 years now on both production and simulation jobs, but more importantly as a tool for users submission of request (jobs). • The function of SUMS: • Run processes on large datasets (many files) that may be distributed across many nodes, clusters sites and batch systems. • Resolve the users abstract requests in to an actual set of jobs that can be run on a farm (s). • Resolve request for datasets. This is done using a catalog plug-in which resolves the users request for a data-set into LFN or PFN. • Write scripts and submit them to the batch system(s). • Imbed resource handling information for the batch system to use. • Group and Split work in the most efficient way possible.

  2. Who contributes to SUMS research, development and administration • PPDG – funding • Jerome Lauret and Levente Hajdu – development and administration of SUMS at BNL • Lidia Didenko – Testing for grid readiness • David Alexander and Paul Hamill (Tech-X corp) - RDL deployment and prototype client and web service • Eric Hjort, Iwona Sakrejda, Doug Olson – administration of SUMS at PDSF • Valeri Fine – Job tracking. • Andrey Y. Shevel - administration of SUM at Stony Brook University • And Others • Gabriele Carcassi – development and administration of SUMS • Efstratios Efstathiadis – Queue monitoring, research

  3. Benefits of using SUMS over submitting directly • No knowledge of scripting required for splitting and submitting jobs. • No knowledge of how to use the batch system needed. • Datasets are resolved and chopped for the users. • The user is totally shielded from the complications of using the distributed file system. • There are safety measures in place to prevent users from downing the batch system by over using resources.

  4. RCF STARDATA24 STARDATA05 STARDATA02 QUEUE NODES JOBS

  5. RCF STARDATA24 STARDATA05 STARDATA02 QUEUE NODES JOBS

  6. STARDATA24 STARDATA02 STARDATA05 Queued jobs SD05 = 800 SD24 = 50 SD05 = 750 SD05 = 700 SD02 = 450 Running jobs SD05 = 500 SD24 = 10 SD05 = 1000 SD24 = 2 SD05 = 30 SD24 = 3 SD05 = 30 SD02 = 600 SD05 = 500 SD24 = 50 SD05 virtual resource (2040 units total) SD24 virtual resource (102 units total) SD02 virtual resource (800 units total)

  7. Variables generated on the fly for users • $JOBID – a unique ID is given to all jobs that SUMS will ever run. • Example 62338C856E6B2B0ABF0344116F94CEA3_0 • $ PROCESSID – The number of that job in the request, numbered 0,1,2,…n. • $ SCRATCH – A area on the local system that users can use for temporary files. (temp space) • Example /tmp/$USER/$JOBID • $ FILELIST – The location of a subset of data that SUMS has chopped from the dataset for processing by a given job. • Others

  8. JDL job XSD tree View

  9. Job Parameters • Required • Command - The command(s) to be run on the files • stdout • Optional • Name • stderr • maxFilesPerProcess (max files per job) • minFilesPerProcess (min files per job) • minMemory • maxMemory • simulateSubmission • filesPerHour • minWallTime • maxWallTime • fileListSyntax • fromScratch

  10. Sample job <?xml version="1.0" encoding="utf-8" ?> <jobmaxFilesPerProcess="10" fileListSyntax="rootd" minMemory= "15" > <command> root4star -q –b /star/macro/runMuHeavyMaker.C\(\"$SCRATCH/heavy.MuDst.root\",\"$FILELIST\"\) </command> <stdoutURL="file:/star/u/lbhajdu/temp/heavy.$JOBID.out" /> <inputURL="catalog:star.bnl.gov? production=P04ik, trgsetupname=proHigh, filetype=daq_reco_MuDst, tpc=1,ftpc=1, sanity=1“, nFiles="all"/> <outputfromScratch=“*.MuDst.root” toURL="file:/star/data02/heavy.$JOBID.root" /> </job>

  11. Configuring SUMS • SUMS uses java standardized xml de-serialization for its configuration. Over the years we have found this to be the ideal balance between ease of use and the power to define complex systems abstractly. • Pre-initialized scheduler objects are defined by the administrator. • One configuration file can hold many different instances of the same object. • By default the user will be given the default objects, or they can specify other objects that have been customized for the special needs of there jobs. • Objects include: • JobInitializer • Policy • Queue • Dispatcher • Application • Statistics recorder • Others

  12. JobInitializer • The job initializer is the module through which the user submits his job. • JobInitializers currently available: • Local command line • command line (web service) • Tested still in beta • GUI (web service) • Tested still in beta

  13. Dispatchers • A scheduler plug-in module, that implements the dispatcher interface, that converts job objects to a “real” job actually submitted to the batch system • Currently available dispatchers: • Boss • Condor • CondorG • Local (new) • LSF • PBS • SGE (new but heavily tested by PDSF)

  14. Virtual Queues • Defines a “place” (queue, pool, meta queue, service ,etc.) that a job can be submitted to. • Defines properties of that place. • Each Virtual Queue points to one dispatcher object.

  15. Virtual Queues

  16. Virtual Queues A typical queue configuration: <object id="NSFlocalQueueObj" class="gov.bnl.star.offline.scheduler.Queue"> <void method="setID"> <string>localQueue</string> </void> <void method="setName"> <string>star_cas_dd</string> </void> <void method="setAssociatedDispatcher"><object idref="RCASDispatcher"/> </void> <void method="setCluster"> <string>rcas.rcf.bnl.gov</string> </void> <void method="setTimeLimit"> <int>90</int> </void> <void method="setMaxMemory"> <int>440</int> </void> <void method="setSearchOrderPriority"> <int>1</int> </void> <void method="setType"> <string>LSF</string> </void> <void method="setImplementation"> <string>local</string> </void> </object>

  17. Policies • Resolves request for data sets. • Chops dataset and creates jobs to work on each peace. • Tries to split in most optimal way • Groups files based on where they have to be processed, in case of a files on distributed disk. • The size of each sub-data set is based on the users min and max data set size requirements and the time requirements of the queue calculated from files per hour, if the users supplies this parameter. • Brakes request into jobs • Assigns job objects to queue objects by using a algorithm unique to each policy class.

  18. Policies

  19. Policies Example of a custom policy used by the STAR resonance group. The algorithm for deciding where jobs go is “PassivePolic” the queues used are NSFlocalQueueObj, NFSQueueObj, HBT_group_Queue <object class="gov.bnl.star.offline.scheduler.policy.PassivePolicy"> <void method="addQueue"> <object idref="NSFlocalQueueObj"/> </void> <void method="addQueue"> <object idref="NFSQueueObj"/> </void> <void method="addQueue"> <object idref="HBT_group_Queue"/> </void> </object>

  20. Policies • PassivePolicy – A simplistic policy that allows the administrator to set the order in which queues will be tried. The order is set by a property of the queue called “search order priority”. If two or more queues have the same search order priority they will be tried in a round robin fashion. • ClusterAssignmentByMonitorPolicy – The first “monitoring policy” every tested. It detects the load of each cluster and then uses an equation to determine what percentage of jobs should go to that cluster. • AssignmentByQueueMonitorPolicy – A “monitoring policy” that works at the queue level. Performance is better then ClusterAssignmentByMonitorPolicy. It monitors the waiting time and throughput of each queue using a plug-in developed for MonaLisa, to determine the best (fastest) queue to submit to. Unlike other schedulers that attempt to model every single variable. This policy only uses a handful of variables that reflect the state of possibly hundreds or thousands of factors.

  21. Monitoring Policy Passive Policy

  22. Reports, Logs and Statistics • Logs and statistics collection is optional and the users report file is always generated. • Reports • Reports are put in the users directory they give information about the internal workings of SUMS to the user. • Reports information about every job that was processed. • The user decides when to delete these. • Logs • Holds information in a central area more valid to the administrator, for diagnosing problems. • The administrator decides when to delete these. • Statistics • General information about how many people are using SUMS and what options there using.

  23. Job tracking / monitoring /crash recovery • Dispatchers in SUMS currently provide 3 functions: • Submit Job(s) • Get Status of job(s) • Kill Job(s)

  24. Job tracking / monitoring /crash recovery • To implement this in the most simplest care free way possible it was decided no central data base should be used to store this information. The information should be given to the users directly. • The benefits are: • No db’s need to be set up on sites running SUMS. This automatically eliminates all securely and administration considerations. • The user decides when they no longer need this data. As the data is now in the user file system. As a file generated by SUMS

  25. RDL Request Definition Language – An XML based language under development by STAR in collaboration with other scientific groups and private industry for describing not only one job, but many jobs and the relationships between them geared towards web services with advanced gui clients.

  26. RDL Terminology on the layers of abstraction are not very clear all inclusive definitions are hard to come by. Note: These are only guide lines. Abstract / Meta / Composite request – defines a group of requests performing a common task. The order in which they run many be important. The output of one request may be the input to another request in the same meta request. example: Make a new dataset by running a program. When it is done sum the output and render a histogram. Request or meta job – defines a group of [0 to many] jobs that have a common function and can be run simultaneously. example: Take a data set and run an application on it. Physical Job – The unit of work the batch system deals with. example: Take a dataset and run an application on it.

  27. RDL V.S. JDL RDL JDL • Submitting on a grid landscape • Supports submit of multiple jobs • Supports submit of multiple request • Separates task and application • Supports work flow • XML format

More Related