1 / 7

FTS deployment model for ATLAS

This article discusses the FTS deployment model for ATLAS, including the use of multiple FTS3 instances to ensure service resiliency and automatic failover. It also highlights common issues and the support provided by the FTS3 service managers and developers.

dannyi
Download Presentation

FTS deployment model for ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FTS deployment modelfor ATLAS Ale Di Girolamo

  2. FTS servers: >1 … but not too many • FTS == FTS3. • FTS2 is dead. • Multiple FTS3 instances are useful (needed!) to guarantee service resiliency • even if one single instance (properly sized) could handle all the traffic • Too many instances (e.g. 10) are problematic in terms of upgrade, support, etc. • We decided for 3 instances • CERN, RAL and BNL Ale Di Girolamo

  3. FTS3 automatic failover • Not quite fully there • In case of submission is “easy” to switch… Many other cases are not so easy: e.g. what to do with the jobs already submitted? • Examples from Michail Salichos (FTS3 master) • DNS alias is down, but FTS3 hosts still processing transfer jobs, though can't submit nor get the status • One of the FTS3 servers is down, so you will notice a small percentage of submission failures when this host is picked • Networking problems to connect to the site • Great support from the FTS3 service managers and developers to improve in this area: • Dedicated discussions will be needed Ale Di Girolamo

  4. ATLAS Sites: which FTS? • FTS3 server is an attribute of the destination site: • i.e. transfers to INFN-ROMA1_* are managed by the FTS3 servers defined in AGIS for the site. • Rucio, in case of submissions problems, automatically failover on the others servers. Present config: • US,CA,DE,IT (312 DDMEndpoints): • FTS3: 1) BNL, 2) CERN, 3) RAL • CERN,ES,NL,ND,RU,TW (178 DDMEndpoints): • FTS3: 1) CERN, 2) RAL, 3) BNL • UK,FR (163 DDMEndpoints): • FTS3: 1) RAL, 2) BNL, 3) CERN Ale Di Girolamo

  5. FTS3 configuration • Configuration is managed centrally by team of FTS and experiment contacts • Central team takes care of adjusting conf to VO policy if needed • Default configuration is auto-configuration: FTS3 optimizer • Very good and quick interaction with FTS3 devs in case strange behaviors observed • Settings for specific endpoints can be applied if needed, e.g. in case of site request • Max active from/to SE, Max MB/s from/to SE or pair of SEs, stop processing transfer jobs for SE (downtime)…. • ATLAS Sites should contact atlas-adc-expert@cern.ch in case of needs • Procedures have been agreed with the other experiments in the FTS3 task force: • will be re-discussed if needed Alessandro Di Girolamo

  6. Few links • Main DDM ATLAS monitor • http://dashb-atlas-ddm.cern.ch/ddm2 • failed transfers have links to the FTS3 monitoring with log files to debug errors • FTS (a-la-ddm) dashboard • http://dashb-fts-transfers.cern.ch/ui/ • All the experiments • FTS3 Server monitor: • More details about each instance: • https://fts3.cern.ch:8449 • https://fts.usatlas.bnl.gov:8449 • https://lcgfts3.gridpp.rl.ac.uk:8449 • FTS3 users guide: • http://fts3-service.web.cern.ch/ Ale Di Girolamo

  7. … we definitely moved forward! Ale Di Girolamo

More Related