1 / 7

Site Migration to WMS

Site Migration to WMS. ALICE TF Meeting 30/10/08. WMS Migration (I). Several security issues found at several sites using still the LCG-RB LCG-RB is a deprecated service since months Goal: Migration to the gLite3.1 WMS at all sites Scheduled by the middle of November

Download Presentation

Site Migration to WMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Site Migration to WMS ALICE TF Meeting 30/10/08

  2. WMS Migration (I) • Several security issues found at several sites using still the LCG-RB • LCG-RB is a deprecated service since months • Goal: Migration to the gLite3.1 WMS at all sites • Scheduled by the middle of November • This is just a medium term approach: Waiting for the CREAM-CE deployment • Current situation: All sites migrated but: • French confederation • Still waiting for at least 1 WMS for ALICE in the country • Birmingham • They have confirmed the NIKHEF approach: Restricted access to 2 persons to the local VOBOX: Latchezar and me • Madrid • WMS will be provided by the 5th of November • UNAM • They have confirmed a WMS for Alice use • Kolkota • No news • In addition: NIKHEF has ensured a WMS for Alice in November. In the meantime the site has been configured with the SARA WMS

  3. WMS migration (II) • The WMS migration requires certain tunes in the ALICE submission approach • We must include a new field into the pilot jdls • This is what we have: RetryCount = 0;(deep resubmission) • This is what we still miss: ShallowRetryCount = 0; (shallow resubmission) • Differences: The resubmission is deep when the job fails after it has started running on the WN, and shallow otherwise • The lack of this argument exposed us to a problem at CERN last week: • An error in the CE YAIM configuration at CERN mapped all alicesgm users to non-existing accounts • In default the ShallowRetryCount value is set to 10. Until 10 times pilots were resubmitted before aborting

  4. WMS Migration (III) • WMS service provides jobs with a new feature (not included in RB): • In the case that the required queue is not available, the job does not die. It will be kept for a certain time (configurable) and resubmitted (case 1) • This is also the case if the WMS is temporary overloaded (case 2) • Following the submission approach of ALICE, this can be a mess • Configured at CERN and reduced to 2h • It is working fine • This configuration is service based and not VO based • If ALICE shares the WMS with other VOs which have opposite requirements (!)

  5. WMS Migration (IV) • Case 1 • Workaround: SAM • Implementation of a new test, WMS sensor related • A dummy job will be sent each 30min (1h) per each site • Once submitted it will check 10min later the status of this job • If still « waiting » most probably the WMS is suffering from any overloading issue • The WMS will be then removed from the VOBOX • Case 2 • A «drain flag» definition is foreseen for the WMS • In this case if one WMS is overloaded, the submission will pass automatically to the 2nd WMS defined • This is true if the list of WMS follows a load balance approach

  6. WMS Migration (V) • The current failover mechanism is not enough for the WMS use • Use RB1. If it fails… • Use RB2. If it fails…. • Use RB3 • The current mechanism is not able to identify an overloaded WMS • In order to explote all the potential of the drain flag feature we should be: • Use RB1 OR RB2 OR RB3. If all these WMS fail… • Use RB4 OR RB5 OR RB6

  7. WMS Migration (VI) • The defined code is now implemented at CERN and in Torino • LDAP configuration • wms1;wms2,wms3;wms4 1st group 2nd group • Into the VOBOX, this means the following: • $HOME/alien-logs/wms103.cern.ch;wms109.cern.ch.vo.conf • Where this files looks like as: [ VirtualOrganisation = "alice"; WMProxyEndpoints = {"https://wms103.cern.ch:7443/glite_wms_wmproxy_server","https://wms109.cern.ch:7443/glite_wms_wmproxy_server"}; MyProxyServer = "myproxy.cern.ch"; ]

More Related