1 / 23

ALICE & the LCG Service Data Challenge 3

ALICE & the LCG Service Data Challenge 3. LCG SC Workshop - Bari. Site managers in INFN did not realise that SC3 is not just an “exercise” Now they understood it Introduction to SC3 (J. Shiers) Discussion on experiment plans (ALICE, ATLAS, CMS - LHCb not represented)

pruben
Download Presentation

ALICE & the LCG Service Data Challenge 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALICE & the LCG Service Data Challenge 3

  2. LCG SC Workshop - Bari • Site managers in INFN did not realise that SC3 is not just an “exercise” • Now they understood it • Introduction to SC3 (J. Shiers) • Discussion on experiment plans (ALICE, ATLAS, CMS - LHCb not represented) • Presentation of LCG common tools (LFC, DPM, SRM) ALICE Offline Week - CERN

  3. ATLAS & SC3 • ATLAS is highly interested to participate to SC3 • ATLAS Data Management system (DonQuijote) • Also interested in using COOL for the conditions data (calibration and alignment) ALICE Offline Week - CERN

  4. ATLAS & SC3: Summary • April-July: Preparation phase • Test of FTS (“gLite-SRM”) • Integration of FTS with DDM • July: Scalability tests • September: test of new components and preparation for real use of the service • Intensive debugging of COOL and DDM • Prepare for “scalability” running • Mid-October • Use of the Service • Scalability tests of all components (DDM) • Production of real data (MonteCarlo; Tier-0; …) • Data can be thrown away ALICE Offline Week - CERN

  5. CMS Service Challenge Goals • An integration test for next production system • Main output for SC3 data transfer and data serving infrastructure known to work for realistic use • Including testing the workload management components: the resource broker and computing elements • Bulk data processing mode of operation • Success is mandatory for preparation to SC4 and onward ALICE Offline Week - CERN

  6. CMS Service Challenge Goals • An integration test for next production system • Full experiment software stack – not a middleware test • “Stack” = s/w required by transfers, data serving, processing jobs • Becomes next production service if/when tests pass • Main output: data transfer and data serving infrastructure known to work for realistic use cases • Using realistic storage systems, files, transfer tools, • Prefer you to use standard services (SRM, …), but given a choice between system not reasonably ready for 24/7 deployment and reliable more basic system, CMS prefers success with the old system to failure with the new one ALICE Offline Week - CERN

  7. Qualitative CMS Goals • Concretely • Data produced centrally and distributed to Tier 1 centres (MSS) • Strip jobs at Tier 1 produce analysis datasets (probably “fake” jobs) • Approximately 1/10th of original data, also stored in MSS • Analysis datasets shipped to Tier 2 sites, published locally • May involve access from MSS at Tier 1 • Tier 2 sites produce MC data, ship to Tier 1 MSS (probably “fake” jobs) • Transfers between Tier 1 sites • Analysis datasets, 2nd replica of raw for failover simulation • Implied: software installation, job submission, harvesting, monitoring ALICE Offline Week - CERN

  8. CMS SC3 Schedule • July: throughput phase • Optional leading site-only tuning phase, may use middleware only • T0/T1/T2 simultaneous import/export using CMS data placement and transfer system (PhEDEx) to coordinate the transfers • Overlaps setup phase for other components on testbed; will not distract transfers – setting up e.g. software installation, job submission etc. • September: service phase 1 — modest throughput • Seed transfers to get initial data to the sites • Demonstrate bulk data processing, simulation at T1, T2s • Requires software, job submission, output harvesting, monitoring, … • Not everything everywhere, something reasonable at each site • November: service phase 2 — modest throughput • Phase 1 + continuous data movement • Any improvements to CMS production (as in MC production) system • Already in September if available then ALICE Offline Week - CERN

  9. CMS/SC3 Services In TestServices for all sites (I) • Data storage • dCache, Castor or other (xrootd, gpfs, …) • SRM interface highly desirable, but not mandatory if unrealistic • Data transfer • PhEDEx + normally SRM, can be + GridFTP – see Daniele’s presentation • CMS will test FTS from November with other experiments (ATLAS, LHCb) ALICE Offline Week - CERN

  10. CMS/SC3 Services In TestServices for all sites (II) • File catalogue • The “safe” choice is POOL MySQL catalogue • Big question will catalogue scale for worker node jobs • Currently using XML catalogues from worker nodes • LCG favours LFC, but first step to CMS validation only now (today!) • LFC exists, but no POOL version that can use it, and thus no CMS software • Existing CMS software to date will not be able to use LFC • US-CMS will test Globus RLS instead of LFC / MySQL on some sites • Same caveats as with LFC • Not planning to test EGEE Fireman yet ALICE Offline Week - CERN

  11. CMS/SC3 Services In TestServices for all sites (II) • Software packaging, installation, publishing into information system • Either central automated installation, or using local service • So far, central automated is not really very automated… • Computing element and worker nodes • In particular, how the CE obtains jobs (RB, direct submission?) • Interoperability between different grid variants • Job submission • Including head node / UI for submitting • Interoperability between different grid variants • Job output harvesting • CMS agents, often configured with PhEDEx ALICE Offline Week - CERN

  12. CMS/SC3 Services In TestServices for some sites • PubDB / DLS • Backend MySQL database + web server interface for PubDB • Job monitoring and logging • BOSS + MySQL database + local agents • MC Production tools and infrastructure • McRunjob, pile-up etc. • File merging • We have to get somewhere with this task item • Probably agents running at the site producing data • (These will evolve and be replaced with middleware improvements) ALICE Offline Week - CERN

  13. CMS/Support servers (I) • Server-type systems required at each site • UI / head node for job submission (public login) • Storage space for CMS software installation (single root for all) • “Small databases” server for CMS services (see below, MySQL) • File catalogue database server (presumably MySQL on most sites) • Gateway-type server for PubDB, PhEDEx, job output harvesting • PubDB needs web server, PhEDEx local disk (~20 GB sufficient) • Typically installed as UI, but not public login (CMS admins only) • For SC3, one machine to run all agents is enough • For SC3, requires outbound access, plus access to local resources • PubDB requires inbound HTTP access, can install under any web server • The agents do not require substantial CPU power or network bandwidth, “typical” recent box with local disk and “typical” local network bandwidth should be enough (CERN gateway dual 2.4GHz PIV, 2 GB memory – plenty) ALICE Offline Week - CERN

  14. CMS/Support servers (II) • Optional gateway services at some sites • BOSS job monitoring and logging • Local MySQL / SQLite backend per user on UI (MySQL can be shared) • Optional real-time monitoring database – to be discussed • BOSS itself does not require gateway server, only databases • File merging • Service + operation of CMS services by CMS people at the site • May have help from CMS people at your Tier 1, ask ALICE Offline Week - CERN

  15. ALICE Services Agents & Daemons • We must provide details on Jun. 13th • What “common” services will we use? • What services will we add? • What model for the storage? • DPM + xrootd, everything as disk cache - ok for storing data • How about retrieving from tape? • xrootd + CASTOR? • What for the file transfer? • … ALICE Offline Week - CERN

  16. Light weight Disk Pool Manager status and plans Jean-Philippe Baud, IT-GD, CERN May 2005

  17. Disk Pool Manager aims • Provide a solution for Tier-2s in LCG-2 • & maybe for T1’s cache??? • This implies a few tens of Terabytes in 2005 • Focus on manageability • Easy to install • Easy to configure • Low effort for ongoing maintenance • Easy to add/remove resources • Support for multiple physical partitions • On one or more disk server nodes • Support for different space types – volatile and permanent • Support for multiple replicas of hot files within the disk pools ALICE Offline Week - CERN

  18. Features • DPM access via different interfaces • Direct Socket interface • SRM v1 • SRM v2 Basic • Also offer a large part of SRM v2 Advanced • Global Space Reservation (next version) • Namespace operations • Permissions • Copy and Remote Get/Put (next version) • Data Access • Gridftp, rfio (ROOTD, XROOTD could be easily added) ALICE Offline Week - CERN

  19. Security • GSI Authentication and Authorization • Mapping done from Client DN to uid/gid pair • Authorization done in terms of uid/gid • Ownership of files is stored in DPM catalog, while the physical files on disk are owned by the DPM • Permissions implemented on files and directories • Unix (user, group, other) permissions • POSIX ACLs (group and users) • Propose to use SRM as interface to set the permissions in the Storage Elements (require v2.1 minimum with Directory and Permission methods) • VOMS will be integrated • VOMS roles appear as a list of gids ALICE Offline Week - CERN

  20. Status • DPM will be part of LCG 2.5.0 release but is available from now on for testing ALICE Offline Week - CERN

  21. Deployment • Replacement of ‘Classic SE’ • Only metadata operations needed (the data does not need to be copied) • Satisfies gLite requirement for SRM interface at Tier-2s ALICE Offline Week - CERN

  22. ALICE Offline Week - CERN

  23. ALICE Offline Week - CERN

More Related