1 / 30

The STAR Unified Meta-Scheduler (SUMS)

The STAR Unified Meta-Scheduler (SUMS). A front end around evolving technologies for user analysis and data production. J é r ô me Lauret , Gabriele Carcassi, Levente Hajdu Efstratios Efstathiadis, Lidia Didenko, Valeri Fine Iwona Sakrejda, Doug Olson. Outline. Project overview

alain
Download Presentation

The STAR Unified Meta-Scheduler (SUMS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. Jérôme Lauret, Gabriele Carcassi, Levente Hajdu Efstratios Efstathiadis, Lidia Didenko, Valeri Fine Iwona Sakrejda, Doug Olson

  2. Outline • Project overview • STAR Experiment • Problematic • Solution • Design and architecture • Basic principles • Building blocks • Add-on (usage tracking) • Usage • Grid experience • Schedulers • Key features • MonaLISA policy • Contributions • GUI, dispatchers • Future work & Conclusion Jérôme LAURET, RHIC-STAR/BNL

  3. Project overview Jérôme LAURET, RHIC-STAR/BNL

  4. The STAR Experiment • The Solenoidal Tracker At RHIC • http://www.star.bnl.gov/ is an experiment located at BNL (USA) • A collaboration of 546people wide, spanning over 12countries • A PByte scale experiment overall (raw, reconstructed events, simulation) with large amount of files (several Million) • Run4 alone (2003-2004) has produced 200 TB of raw data • Rich set of data analysis and simulation problems • Expecting 200 TB of reconstructed data • 40 TB of MuDST (1 pass) • Files copied to Tier1 using SRM tools (see Track 4, 344 ? Jérôme LAURET, RHIC-STAR/BNL

  5. Problematic • Ongoing analysis • Past and new sets of data are constantly analyzed • Data spread at many location • sites and storage type, some on distributed disk local to each machine not easily accessible • Evolving technologies • Distributed computing (re) shapes itself as we make progress: Condor-G, portals, Meta-Schedulers, Web Services, Grid Services, … • Batch technologies themselves evolve Users have to adapt within a productive environment and ever growing scientific program May be fine for new experiment, not for running ones Jérôme LAURET, RHIC-STAR/BNL

  6. Solution • Allow user to pursue scientific endeavor without disruption • Make use of current/available resources • Ensure same productivity (subjective without matrix) • Develop a front end shielding the user from technology details and changes – Job concept Abstraction • Attract users to migrate to new framework & Grid=> data management, file relocation => Catalog • Design a tool/framework allowing for evolution • Changing underlying technology should NOT mean change in user’s daily routine • Framework should allow for testing ideas, plug-in of new components (Dispatcher for Local Resource Managers = LRMS), moving users to distributed computing with no extraneous knowledge Jérôme LAURET, RHIC-STAR/BNL

  7. And so SUMS was born … • Project started in 2002 • Light developer team (<> ~ 1.0 FTE) • Surrounding activities have enriched the project and spawned activities and collaborations (Monitoring, U-JDL, Resource Brokering studies, …) • Historically • STAR project, design and prototype responsibility taken by WSU. • Project enhanced and brought to user community (Gabriele Carcassi) • Current development & design (Levente Hajdu) • Entirely written in Java • Portable, modular class based design • Project management, auto-documentation, … Jérôme LAURET, RHIC-STAR/BNL

  8. Design / Architecture - Opened Jérôme LAURET, RHIC-STAR/BNL

  9. Basic principles • Users do NOT write • shell scripts and submit • series of tag=value • Instead, they write an XML – U-JDL • Describing their “intent” to work on files, a DataSet, collections, etc … • They do not have to know where those files are located (LFN or collections may convert to PFN) • They do not have to handle the gory details of resource management (bsub –R …) • They do not need to think where their job will best fit, their input to SUMS are rates or ranges indications • Following a prescribed schema and … % star-submit MyJob.xml % star-submit-template –template MyTemplateJob.xml –entities jobname=test,year=2004 Jérôme LAURET, RHIC-STAR/BNL

  10. Query/Wildcard sched1043250413862_0.list / .csh resolution /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... Job description /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... test.xml ... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <?xml version="1.0" encoding="utf-8" ?> /star/data09/reco/productionCentral/FullFie... <job maxFilesPerProcess="500"> /star/data09/reco/productionCentral/FullFie... / star/data09/reco/productionCentral/FullFie... <command>root4star -q -b sched1043250413862_1.list / .csh /star/data09/reco/productionCentral/FullFie... rootMacros/numberOfEventsList.C\ /star/data09/reco/productionCentral/FullFie... (\"$FILELIST\"\)</command> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <stdout /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... URL="file:/star/u/xxx/scheduler/out/$JOBID.out" /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <input /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... URL="catalog:star.bnl.gov?production=P02gd,fil /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... ... etype=daq_reco_mudst" preferStorage="local" /star/data09/reco/productionCentral/FullFie... nFiles="all"/> ... <output fromScratch="*.root" sched1043250413862_2.list / .csh toURL="file:/star/u/xxx/scheduler/out/" /> </job> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... ... What it does … User Input … () … Policy …. dispatcher Jérôme LAURET, RHIC-STAR/BNL

  11. Architecture / building blocks • Main boxes are javaclasses • The framework choosesthe blocks to use depending on user options (% … -policy XXX) • Interface between blocks are identical • Implementations of the Policy class = the heart of SUMS (decision making, planning, resource brokering, …)Extendable, adaptable Jérôme LAURET, RHIC-STAR/BNL

  12. Job Initializer XML is validated, request objects created … Jérôme LAURET, RHIC-STAR/BNL

  13. Queues • Queue concept is “opened” • Queue can be a LRMS queue (PBS, LSF, SGE, …) • Queue can be a Pool or a DRMS (Condor, Condor-G, …) • A Web or Grid Service • … anything for which a dispatcher can be written • The object container is defined or defines • Defined by a name (may be logical) • Associated to a dispatcher (has a pointer to a dispatcher object) – LSFDispatcher uses logical name = queue name • Has resource requirements • CPUtime limits, memory limits, the type of storage it can access, storage limits • Base rule: they can be undefined -1 (to be expected from Policy stand point) Jérôme LAURET, RHIC-STAR/BNL

  14. Policies • Policies integrate pre-defined queues • Serialized XML as local configuration • A policy can make use of as many queues as necessary • Queues may have • a type (LSF, PBS, Condor, …) • a scope (Local, Distributed, …) • Allows SUMS to decide which one to take depending on RB decision • Queues can be given an initial weight (for example, used for ordering if weight = priority) • Queues have a weight-incremental • Complex policies may order queues as necessary (your choice) – Default order by weight (priority) Jérôme LAURET, RHIC-STAR/BNL

  15. Policy note – job splitting • <input> element can take several form • Transition formats: PFN, PFN (wildcard) • <input URL="file:/star/data15/reco/productionCentral/FullField/P02ge/2001/322/st_physics_2322006_raw_0016.MuDst.root" /> • <input URL="file:/star/data15/reco/productionCentral/FullField/P02ge/2001/*/*.MuDst.root" /> • Locally distributed PFN support • <input URL="file://rcas6078.rcf.bnl.gov/home/starreco/reco/productionCentral/FullField/P02gd/2001/279/st_physics_2279005_raw_0285.MuDst.root" /> • List support • <input URL="filelist:/star/u/user/username/filelists/mylist.list" /> • Dataset, MetaData support • <input URL="catalog:star.bnl.gov?production=P02gd,filetype=daq_reco_mudst,storage=local" nFiles="2000" /> • … LFN support on the way … • Preferred STAR usage: map MetaData/Collections or LFN to PFN, dispatch jobs--- BUT THERE ARE TWO WAYS --- • PFN converted (URL syntax do not end up in final lists, APPS work as usual) • Lists are formatted and passed to APPS as URL, APPS need to sort URLExample: rootd syntax like URL passed as-is Jérôme LAURET, RHIC-STAR/BNL

  16. Dispatchers • High level dispatcher • do a redirect to • PBS • LSF • SGE • Condor • Condor-G • BOSS • … Jérôme LAURET, RHIC-STAR/BNL

  17. Add-On – Usage monitoring • Needed usage feedback - Monitoring user’s usage to • Allow for a better targeted tool • Focus can be made on most used/preferred feature • CS fantasy trimmed down • Serves better the user community • Eliminates divergence and re-focus • Practicality first, SciFi later … • Ensures equity of usage • Helps re-focusing tutorials & documentation • JSP based (tomcat) with MySQL back-end • All options and usage are recorded Jérôme LAURET, RHIC-STAR/BNL

  18. Example of useful information … Which storage type is most used … may very well be a $$ / accessibility question Implemented two ways of accessing locally distributed files. Is it used ?? Added SGE dispatcher a few weeks ago … Jérôme LAURET, RHIC-STAR/BNL

  19. Example II-a PDSF BNL 4500 jobs /day Peaks at 20k Jérôme LAURET, RHIC-STAR/BNL

  20. Example II-b • Pessimistic graph is an • integral count over time. • It shows that after first usage, users keep using SUMS … • NB: Drop from the beginning of the summer indicates • Vacation time  • Conference time  • Lack of new data  • (this is not the best period • for SUMS commercial but • informative nonetheless) See more statistics at http://www.star.bnl.gov/STAR/comp/Grid/scheduler/ Jérôme LAURET, RHIC-STAR/BNL

  21. Physicist usage • As far as we know, 85% of active users using SUMS • Publications selection / confirmed as 100% SUMS analysis based • J. Gonzales - Nuclear Experiment, abstractnucl-ex/0408016, Pseudorapidity Asymmetry and Centrality Dependence of Charged Hadron Spectra in d+Au Collisions at sqrt(SNN)=200 GeV (submitted to PRC) • L. S. Barnby – QM Proceedings - 2004 J. Phys. G: Nucl. Part. Phys. 30 S1121-S1124 • T. Henry - Full jet reconstruction in d+Au and p+p collisions at RHIC, Journal of Physics G: Nuclear Physics (volume 30, issue 8) S1287 • J.S. Lange - Proceedings 19th Winter Workshop on Nuclear Dynamics (2003), nucl-ex/0306005 - Review of search for heavy flavor (c,b quarks) production in leptonic decay channels in Au+Au collisions at sqrt(sNN)=200 GeV at the STAR Experiment at RHIC. • A. Tang - Anisotropy at RHIC: the first and the fourth harmonic • … • http://www.star.bnl.gov/central/publications/ (7 papers / analysis submitted in the past 3 months) Jérôme LAURET, RHIC-STAR/BNL

  22. Grid experience • Use of SUMS for Grid job submissions possible • Modulo RSL extensions • <input> <output> tags MUST specify path as relative path (“bla.root”, “blop/test.dat”, …) • <output> attribute fromScratch / toURL designed to bring the files back (globus-url-copy) • Grid experience has been a challenge • Cryptic messages, had a problem with a globus error 74: no clue of what it was for months, no Grid Help-desk, no knowledge base index. Turned out to be a firewall issue, burst of massive job death • Nonetheless • ¼ of Run4 simulation production made on grid • 100,000 events generated, analysis ongoing • Success rate • 85% when all goes well • 60% when lots of jobs are submitted (above issue) • Planning to run on larger scale platform, Grid3+ and/or OSG-0 with (hopefully) better ways to track errors/problems Jérôme LAURET, RHIC-STAR/BNL

  23. Schedulers Jérôme LAURET, RHIC-STAR/BNL

  24. Schedulers • Can a user front end to other LRMS/DRMS be called a “scheduler” ?? • Is using the local resource within the same paradigm than globally distributed resources ? Jérôme LAURET, RHIC-STAR/BNL

  25. Schedulers • Key features for a scheduler • Keep global accounting • Scheduling decisions may be based on • Resource availability, respect of local policies, fairshare (cluster autonomy) • Advance reservation, best use of resources • Network and data cache, data availability • … • Job migration, moving jobs to/from a trusted cluster • Spanning and workflow • Human readable messages • … • Scheduling algorithm can be complex • Attempts to predict (Weather Services) has been proven difficult • Dedicated Global accounting and standard messages possible • Mixed of LRMS and DRMS capabilities (user autonomy) not common • Complex algorithm takes into account so many parameters … • Empirical approach • Inspect queue behavior, send jobs, see how queue reacts … re-adjust • Self-sustained system • Adapts to network/resource/load changes ?? Jérôme LAURET, RHIC-STAR/BNL

  26. Monitoring Policy LSF Empirical approach (?) • Information fed by agents to ML • Information is recovered by SUMS module • Scheduling decisions made based on load and “queue” or “pool” response time • Self-sustained system (no need for %tage based submission branching) • Hopefully no need for complex algorithm • Respond as resources, priorities, bandwidth adjusts • Results / details in Efstratios Efstathiadis presentation, Track 4 - 393 Jérôme LAURET, RHIC-STAR/BNL

  27. Contributions • RHIC/Phenix collaboration have tested and using SUMS • Contributions included addition of dispatchers (PBS, BOSS) – Andrey Shevel • Development includes creation of GUI front end for end-users – Mike Reuter • Job tracking and monitoring • SUMS allows for dispatching to ANY queues • BOSS (from CMS) a possible solution as “a” dispatcher • Implemented / contributed by Andrey Shevel (Phenix/SUNY-SB) – Track 5, 86 BODE tracking Jérôme LAURET, RHIC-STAR/BNL

  28. Future work • High Level User JDL work • Started with a document on RDL (PPDG-39) • Motivation • Current U-JDL simple enough but has its limitations • Extension to new resource requirement possible but inelegant • U-JDL considers most (but not all) data sets • Lacks concept of tasks and sandboxes • Workflow diagram are only AND (sequential) implemented (need OR, conditional branching etc …) • SBIR with Tech-X (David Alexander) • Deliverables • Enhanced and complete U-JDL (AJHDL) • A WSDL for creating a Grid Service • Reviewed most available high level JDL • Job Submission Description Language (JSDL) (GGF) • Analysis Job Description Language (AJDL) (Atlas) • User Request Description Language (URDL) (PPDG-39 / Jlab/STAR) • Job Description Language (JDL) (DataGrid) • Job Description Language (JDL) (JLab) • … Jérôme LAURET, RHIC-STAR/BNL

  29. Future work • We promised our users the U-JDL will not change • For what they know, it won’t (XSLT, schema transformation) • But the ones using AJHDL will have access to more features • We are working on job tracking • We are working on the concept of Meta-Log (application level monitoring) • Seems to be forgotten • Valeri Fine – Poster, 480 Jérôme LAURET, RHIC-STAR/BNL

  30. Conclusions • SUMS is NOT • a batch system • A toy (real needs, real use, real Physics) • SUMS is • A front end to local and distributed RMS acting like a client to multiple, heterogeneous RMS • A flexible opened architecture, object oriented framework in which with plug-and-play features • A good environment for further developing • Standards (such as High level JDL) • Scalability of other components (ML work, immediate use) • Used in STAR for real Physics (usage and publication list) • Usedfor Distributed / Grid Simulation job submission • Used successfully by other experiments • A mean to make active users transition to distributed computing and recover under-used resources … • … Jérôme LAURET, RHIC-STAR/BNL

More Related