1 / 19

ALICE job submission system - use and status of the CREAM-CE

ALICE job submission system - use and status of the CREAM-CE. Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009). Introduction. ALICE is interested in the deployment of the CREAM-CE service at all sites which provide support to the experiment

ferrera
Download Presentation

ALICE job submission system - use and status of the CREAM-CE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALICE job submission system - use and status of the CREAM-CE Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)

  2. Introduction • ALICE is interested in the deployment of the CREAM-CE service at all sites which provide support to the experiment • GOAL: Deprecation of the WMS use in benefit of the direct CREAM-CE submission • WMS submission mode to CREAM-CE not required • ALICE has began to test the CREAM-CE since the beginning of Summer 2008 into the real production environment • For the time being, ALICE is the only LHC experiment performing stress and real tests to the CREAM-CE This talk will focus on the ALICE experiences using CREAM-CE, the expectations, future plans and requirements for all the sites ALICE Offline Week -- CREAM-CE Use and Status for ALICE

  3. The CREAM-CE • CREAM (Computing Resource Execution And Management) lightweight service for job management operations at the CE level • Called to be the replacement of the current LCG-CE • Submission procedures allowed by CREAM: • Submissions to CREAM via WMS • Via generic clients which allow direct submission • The submission method depends basically on the experiment computing model • Normally pilot based follows the direct submission mode approach (4 LHC experiments) • Bulk submissions of real jobs follows the WMS submission approach (CMS) ALICE Offline Week -- CREAM-CE Use and Status for ALICE

  4. Direct Submission to CREAM-CE • Extra elements required for direct submission • Proxy renewal mechanism (required by CMS and ATLAS) • Responsible to automatically renew the user proxy if expiring • Already (recently) available • The lack of this element is not a showstop for ALICE • 48h voms extensions ensured by the security team@CERN • Enough to run production/analysis jobs without any addition extension ALICE Offline Week -- CREAM-CE Use and Status for ALICE

  5. The 1st test phase • Performed in summer 2008 at FZK (T1 site, Germany) • Tests operated through a second VOBOX parallel to the already existing service at the T1 (operating in WMS submission mode) • Access to the local CREAM-CE was ensured through the PPS infrastructure • Initially 30 CPUs • Moved to the ALICE production queue in few weeks (production setup) • Intensive functionality and stability tests from July to September 2008 • Production stopped to create and ALICE CREAM module into AliEn and to allow the site to upgrade their system • Excellent support from the CREAM-CE developers and the site admins • Specially Massimo Sgaravatto (INFN-Padova) and Angela Poschlad (GridKa T1 site) 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 5

  6. Results of the 1st test phase Running on the production queue Running on PPS nodes More than 55000 jobs successfully executed through the CREAM-CE in the mentioned period No interventions in the VOBOX required in the testing phase CREAM-CE used to distribute real (standard) ALICE jobs 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 6

  7. Implementation into AliEn (I) • Creation of a new CREAM module • Specific for CREAM-CE submissions • Available since AliEn v2-16 • In parallel with the usual LCG module (restricted to WMS submissions only) • Change on the jdl construction • The current ALICE jdl contained the outputsandbox field which specifies the standard outputs of the job agents • CREAM-CE requires a new jdl field which declares the gridftp server where to retrieve the standard outputs • ALICE PROCEDURE: to remove the outputsandbox field of the jdl files created by the CREAM module • Only available in case of submission in debug mode 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 7

  8. Implementation into AliEn (II) • gridftp server is required • Required to retrieve the standard outputs of the job agents • Sites are free to decide ist implementation (proposal: VOBOX) • 200 GB of space required • It will be used ONLY if the submission has been done in debug mode • Change on the proxy renewal mechanism • Submision optimization purpose • The user proxy will be renewed only once per hour • In previous AliEn version this procedure was executed BEFORE each agent submission • The procedure has been implemented ALSO in LCG.pm 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 8

  9. The 2nd test phase • After a debug phase of the CREAM module in January 2009, the new CREAM module in production the 19th of February (2nd testing phase started) • Stability and performance are currently the most important test issues at the sites providing CREAM-CE • The deployment of a 2nd VOBOX ensures that the production will continue on parallel through the WMS • A unique VOBOX would require a dedicated babysitting of the system (not realistic) • Feedback of all issues are directly provided to the CREAM developers • As of today, 11 sites are providing CREAM CE 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 9

  10. Site queues Status of the queues 2nd VOBOX VOBOX with clients General Status 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 10

  11. Status of the sites (I) • FZK • Minor actions required during the 2nd phase test • Delete some sandbox directories (hitting file limit again 32K subdirs) • Procedure not neccessary in the next CREAM versions • 46530 jobs since the 19th of Feb through the FZK CREAM-CE • RAL • No special actions reported by the site for service maintenance • 2678 jobs executed using the local CREAM-CE • Kolkata • Debugging phase performed directly with the developer (Massimo Sgaravatto) • In production from 9th of March Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 11

  12. Status of the sites (II) • CERN • Two CEs have been provided the 9th of March to ALICE for testing • In production since the 10th of March (voalice03 used for this production) • SLC5 WNs behind the CREAM-CE • 17247 jobs since the 10th of March • GSI • Still pending the setup of a 2nd VOBOX • The CREAM-CE performing well • CNAF • CREAM-CE ready to enter production at the end of February • After some instabilities observed last week (lack of automatic purge, entered the production back the 13th of March) • Info provider of the CREAM-CE showing certain instabilities Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 12

  13. Status of the sites (III) • KISTI • Instabilities at the VOBOX level prevents the full setup of the local CREAM-CE in production • CREAM-CE system performing well • ATHENS • The CREAM-CE is working but the site cannot be put in production • No CREAM clients on the VOBOX • IHEP • CREAM-CE is not working yet (siter admin working on) • Missing infrastructure - no 2nd VOBOX (it will be provided next week) Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 13

  14. Status of the sites (IV) • SARA • System tested yesterday evening with some few jobs • Still in testing phase • Torino • System in production since last week • Already 744 jobs executed through the local CREAM system • Subatech • 2nd vobox already provided, the setup of the CREAM-CE is ongoing Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 14

  15. Reminder: How to provide CREAM-CE services for ALICE • During the last October pre-GDB meeting it was explicitly mentioned: • Unlikely to be deployable as an lcg-CE replacement on this timescale (downtime period), but we can continue with rollout in parallel. • In addition during the November pre-GDB meeting it was concluded: • The lcg-CE replacement will required the WMS submission in place and the resolution of the proxy renewal issue (among more other points related to the service performance) • It was encouraged however the deployment of the system in parallel to the LCG-CE Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 15

  16. Reminder: How to provide CREAM-CE services for ALICE (II) • The parallel LCG-CE vs. CREAM-CE setup in terms of ALICE computing model means the deployment of a 2nd VOBOX • Each VOBOX is able to submit to a specific backend • One VOBOX  LCG-CE OR CREAM-CE submission: replacement approach • Two VOBOXES  LCG-CE AND CREAM-CE submission: parallel approach • This is a temporary solution during the parallel running phase • As soon as the replacement is ensured and the LCG-CE is deprecated ALICE will not required a 2nd VOBOX • Remarks for the 2nd VOBOX deployment • Its setup is not sign with blood • Each case can be studied individually • BUT! Sites with important Storage capability for ALICE should be included in the list of sites providing a 2nd VOBOX Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 16

  17. Reminder: How to provide CREAM-CE services for ALICE (III) • Setup of the ALICE production queue behind the CREAM-CE • This procedure puts the CREAM-CE directly in production • GridFTP server • Required to retrieve the job (agent) outputs • Removed from the VOBOX in January 2008 with the deployment of the gLite3.1 VOBOX • It was not longer required by the 4 LHC experiments at that time • No specific wish for the placement of this service • It can be provided into the VOBOX but this site decision Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 17

  18. Future Plans • Small changes in the CREAM module are still needed • The current implementation of the CREAM-CE via CLI allows the declaration of a single queue only • Sites can provide several queues per site (moreover T0/T1 sites) • The implementation of submission to several queues must be done to the application level • PROPOSAL for ALICE (in 3 lines of code): • Definition of a range per queue at the LDAP level • Calculation of a random number before each agent submission • Assignment of a queue based on the random number/range matchmaking Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 18

  19. Conclusions • The ALICE experience with the current CREAM-CE service is very positive • Stable (and maintenance-free) operation is achieved quickly after the initial debugging period • High performance and scalability (FZK 2000+ parallel jobs) served by a single CREAM-CE • Excellent support provided by the developers • Special thanks to Massimo Sgravatto (INFN Padova) • ALICE is working with all sites to install a CREAM-CE • In full production before start of data taking Site queues 18/03/09 ALICE Offline Week -- CREAM-CE Use and Status for ALICE 19

More Related