1 / 13

Torrent-based Software Distribution in ALICE

Torrent-based Software Distribution in ALICE. Costin.Grigoras@cern.ch. Outline. Motivation How it works Site requirements History Migration status. Motivation. ALICE was using site shared areas for installing the pre-compiled experiment software packages

xarles
Download Presentation

Torrent-based Software Distribution in ALICE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Torrent-based Software Distribution in ALICE Costin.Grigoras@cern.ch

  2. Outline Motivation How it works Site requirements History Migration status Torrent-based software distribution in ALICE

  3. Motivation ALICE was using site shared areas for installing the pre-compiled experiment software packages Large sites suffered from AFS/NFS/… scalability issues and being a single point of failure Large space needed for the many active versions Old model needed a site local service to manage the installation, unpacking and deletion of the packages Requirement for strict site configuration to support operation – excludes use of ‘opportunistic’ resources/centres From the very beginning, the shared SW area and its access from the VO-box was considered a security risk All of the above and more are solved by the use of the Torrent protocol to distribute the software packages Torrent-based software distribution in ALICE

  4. Torrent terminology package.tar.gz.torrent package.tar.gz • Metadata of the original file • SHA1 of chunks • SHA1 of entire file • Tracker location Initial seeder Chunks of equal size Advertise hashes of complete chunks Tracker Get file info Clients Leech Exchange chunks Seeder Prefer high-speed peers Leech Torrent-based software distribution in ALICE

  5. How it works AliEn file catalogue torrent://alitorrent.cern.ch/… Build servers Site X No seeding between sites Software repository ( one tar.gz / version ) WN 1 Site Y WN 1 WN 2 WN 2 Torrent seeder alitorrent.cern.ch:8092 WN n Torrent tracker alitorrent.cern.ch:8088 WN n Torrent-based software distribution in ALICE

  6. How it works (2) • Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b), Mac OS X, Ubuntus … • Software repository: 150GB in 600 archives • Total size of a compressed (4x factor) software ‘set’ per job is ~300MB (this is what is downloaded to the WN) • One central tracker and seeder • Limited to 50MB/s to the world • Fallback to other download methods if torrent download fails for any reason • wget, xrdcp • But seed them nevertheless Torrent-based software distribution in ALICE

  7. How it works (3) • Bootstrap • Pilot job script fetches and installs on the local node (`pwd`) the latest AliEn build by Torrent (20MB) • AliEn JobAgent gets a real job from the central queue and downloads the required software packages • Continuing to seed them in background for other local agents to quickly get them by LAN • The JA will run more jobs of the same type (user and SW requirements) within the TTL of the job • Everything is downloaded in the sandbox of the job, so is wiped at the end of its execution Torrent-based software distribution in ALICE

  8. Torrent features we use • Clients explicitly publish their private IP in the central tracker • Allowing the discovery of LAN peers via this common service even behind NAT • Local Peer Discovery • Multicast to discover peers on same network • Peer exchange • Peer lists are distributed between the local peers • Distributed Hash Tables • Decentralized seeder lookup – seeders are trackers Torrent-based software distribution in ALICE

  9. Site requirements • How to allow this to happen • iptables rules accepting: • Outgoing to alitorrent.cern.ch TCP/8088,8092 • WN-to-WN on • TCP, UDP / 6881:6999 – aria2c default listening ports • UDP, IGMP -> 224.0.0.0/4 – local peer discovery • Typically this is already the case, in some cases the ports had to be whitelisted (very smart firewalls  ) • Implicitly sites do not exchange any torrent traffic between them • No service to run on the site or on the machines, no shared area any more, no SPF, essentially no local support for this Torrent-based software distribution in ALICE

  10. History • The deployment has faced only policy difficulties • Eventually accepted after understanding the technology • There is no evil technology, only evil use… • First tests at CERN in 02.2009 • Site deployments starting 06.2009 • As the shared areas were proving insufficient • First at the large sites, in operation since 2 years • Presented in various forums within the collaboration and at CHEPs • Large awareness call in 01.2012 at ALICE T1/T2 Workshop in Karlsruhe Torrent-based software distribution in ALICE

  11. Migration status • First transitions done in close collaboration with the sites • debugging on the WNs, following up the consequences on the local network, firewalls and such • One month ago we have asked all sites for permission to enable torrent • Most have confirmed that the policy allows the torrent protocol and checked the firewall policies and now they run torrent • Working with the rest to solve the (mostly) non-technical issues • Some mails went to unread mailboxes … Torrent-based software distribution in ALICE

  12. Migration status • T0 – in operation since 3 years • T1s – 5 / 6 migrated • T2s – 36 / 78 migrated • Currently covering 2/3 of the resources, so on average more than 20K concurrent jobs are using torrent • Rock solid, very efficient technology • No incidents reported • Aiming for full migration until next AliEn version is deployed, to completely drop the PackManVoBox service and the need for shared SW area and caches Torrent-based software distribution in ALICE

  13. Conclusion • Torrents have enabled us to • Simplify site operations by removing a VoBox service and the shared SW areas • Significantly reduce problems associated with SW deployment, relieves the sites support staff • Have quick software release cycles (both experiment and Grid middleware) • The migration process was carefully staged • Policy limitation clarified – discussion with security experts • Discussions and deployment at T0/T1s and selected T2s (regional coverage) • Presently – towards complete site coverage • Lifts some of the requirement for a site VoBox, specific configurations and services • Forward-looking system - towards opportunistic use of resources and clouds! Torrent-based software distribution in ALICE

More Related