1 / 18

ALICE@ARDA Interactive Analysis on a GRID

This presentation discusses the ALICE.Physics Data Challenge and provides an analysis of the different phases and results. It also outlines the design strategy and execution strategy for Phase III.

mingus
Download Presentation

ALICE@ARDA Interactive Analysis on a GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “ALICE @ ARDA”Interactive Analysis on a GRID P. Cerello on behalf of the ALICE Offline Team ARDA Workshop, CERN October, 21st, 2004

  2. Outline • The ALICE Physics Data Challenge • Summary of phase I / II • Phase III • What? • How? • When? • Summary & Outlook ARDA Workshop, CERN, October, 21st, 2004 - 2

  3. The ALICE Physics Data Challenge • Phase 1 - production of underlying events using heavy ion MC generators • Status: 100% complete (Mar-May 2004) • Basic statistics - ~ 1.3 million files, 26 TB data volume • Phase 2 – mixing of signal events in the underlying events • Status: 100% complete (Jun-Sep 2004) • Phase 3 – analysis of signal + underlying events • Goal – to test the data analysis model of ALICE • Status – to be started ARDA Workshop, CERN, October, 21st, 2004 - 3

  4. Phase I & II layout: a “Meta-Grid” AliEn CE/SE Master job queue AliEn CE/SE Submission AliEn CE/SE Server LCG CE/SE AliEn CE LCG UI Catalogue LCG CE/SE LCG RB Catalogue LCG CE/SE ARDA Workshop, CERN, October, 21st, 2004 - 4

  5. LCG RB Interface Site Server LCG Site AliEn CE EDG CE LCG UI LCG SE AliEn SE WN AliEn Interfacing AliEn and LCG Job submission Status report PFN PFN= GUID Data Registration Data Registration LFN Data Catalogue Replica Catalogue GUID ARDA Workshop, CERN, October, 21st, 2004 - 5

  6. Signal-free event Phase I - Simulation Small input with interaction conditions Large distributed output (1 GB/event) with simulated detector response Long execution time (10 hours/event) ARDA Workshop, CERN, October, 21st, 2004 - 6

  7. Phase I results • Number of jobs: • Central 1 (long, 12 hours) – 20 K • Peripheral 1 (medium – 6 hours) – 20 K • Peripheral 2 to 5 (short – 1 to 3 hours) – 16 K • Number of files: • AliEn file catalogue: 3.8 million (no degradation in performance observed) • CERN Castor: 1.3 million • We did not use the LCG Data Management tools as they were not mature or not deployed when we started Phase 1 • File size: • Total: 26 TB • Resources provided by 27 active production centres, 12 workhorses • Total: 285 MSI-2K hours • LCG: 67 MSI-2K hours (24%, can be 50% at top resource exploitation) • Some exotic centres as proof-of-concept (e.g. in Pakistan), an Itanium farm in Houston ARDA Workshop, CERN, October, 21st, 2004 - 7

  8. Signal-free event Mixed signal Phase II - Merging & Reconstruction Large distributed input (1 GB/event) Fairly Large distributed output (100 MB/event, 7MB files) with reconstructed events ARDA Workshop, CERN, October, 21st, 2004 - 8

  9. Phase II results • Number of jobs: • 180K, 5M events; • Jobs running: 430 average, 1150 max. • average duration: 4.5 hours • total CPU work: 780 MSi2K hours • Number of files: • AliEn file catalogue: 5.4M (+3.8M from Phase I, no degradation in performance observed) • About 0.5M files, generated on LCG, also registered in the LCG RLS • File size: • Total: 5.5 TB, all on remote SEs • Resources provided by 17 AliEn sites + 12 LCG sites • Total: 780 MSI-2K hours • LCG: 80 MSI-2K hours (10%) • Thanks to LCG support (P.Mendez in particular) for the excellent support ARDA Workshop, CERN, October, 21st, 2004 - 9

  10. Signal-free event Mixed signal GRID Server Phase III – (Interactive) Analysis Large distributed input (100 MB/event) Fairly small merged output ARDA Workshop, CERN, October, 21st, 2004 - 10

  11. Phase III - Design strategy • Distributed Input • 5.4 M files, on about 30 different SEs • Do not move the input • Algorithm definition by a user on site S • On site S, Input Query to the Data Catalogue, based on selected MetaData • Input files typically from many SEs, so • Split them in N subgroups defined by files stored on a given SE • Split the task into N sub-tasks, to be run in parallel on the CEs associated to SEs containing a fraction of the input files • Run the N sub-tasks in parallel • Merge the output on the user’s site S • How? • From a ROOT shell on the user’s site • Hopefully interactively with PROOF • Refer to Derek’s and Andreas’ presentation yesterday ARDA Workshop, CERN, October, 21st, 2004 - 11

  12. Phase III - Original Plan lfn 1 Central servers lfn 2 lfn 3 Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Job splitter lfn 4 lfn 5 lfn 6 lfn 7 lfn 8 Get PFN’s Sub-jobs Sub-jobs AliEn-LCG interface File catalogue PFN = (LCG SE:) LCG LFN PFN = AliEn PFN Metadata RB Query LFN’s User query CEs CEs Job processing Job processing Input files Input files Local SEs Local SEs File catalogue Primary copy Primary copy ARDA Workshop, CERN, October, 21st, 2004 - 12

  13. Phase III - Execution Strategy • Very labour intensive • The status of LCG DMS is not brilliant • Does not “leverage” the (excellent!) work done in ARDA • So… why not doing it with gLite? • Advantages • Uniform configuration: gLite on EGEE/LCG-managed sites & on ALICE-managed sites • If we have to go that way, the sooner the better • AliEn is anyway “frozen” as all the developers are working on gLite/ARDA • Disadvantages • It may introduce a delay with respect to the use of the present – available – AliEn/LCG configuration • But we believe it will pay off in the medium term ARDA Workshop, CERN, October, 21st, 2004 - 13

  14. Server User Query Catalog Phase III - Layout gLite/A CE/SE lfn 1 lfn 2 lfn 3 gLite/E CE/SE lfn 4 lfn 5 lfn 6 gLite/L CE/SE lfn 7 lfn 8 gLite/A CE/SE ARDA Workshop, CERN, October, 21st, 2004 - 14

  15. Phase III – The Plan • ALICE is ready to play the guinea-pig for a large scale deployment • i.e. on all ALICE resources and on as many existing LCG resources as possible • We have experience in deploying AliEn on most centres, we can redo the exercise with gLite • Even on most LCG centres we have a parallel AliEn installation • Many ALICE site-managers are ready to try it • And we would need little help • We need a gLite (beta-) as soon as possible, beginning November • Installation and configuration of sites must be as simple as possible • I.e. do not require root access • We expect help from LCG/EGEE to help us configure and maintain the ALICE gLite server, running common services ARDA Workshop, CERN, October, 21st, 2004 - 15

  16. Phase III – Steps to get started • Data Management Services • Input Data • They are already registered in the AliEn & LCG Data Catalogues and stored on AliEn & LCG Storage Elements • Access the Alien & LCG Catalogues from gLite • … or Translate the AliEn & LCG Catalogues to a gLite Catalogue instance • Input Data is scattered on about 30 sites: we need ALL of them to become gLite sites • … or we may do some data movement • Output Data • Must be made available at the User’s site for merging and optional registration to the gLite Data Catalogue ARDA Workshop, CERN, October, 21st, 2004 - 16

  17. Phase III – Steps to get started • Workload Management Services • No special requirement, as the job distribution is intrinsically defined by the input location, but: • We need the functionality to split a job into sub-jobs according to the input distribution -- ARDA has it! • Jobs submitted by users must be registered to a Master Queue which keeps the record of ALL the ALICE-VO tasks ARDA Workshop, CERN, October, 21st, 2004 - 17

  18. Conclusion • Phase I and II of our Data Challenge were completed • LCG support and resources were very important • LCG middleware has limited the usability of the resources • Phase III • With gLite on the verge to be released we think it would be absurd not to “bite the bullet” and use it now • We will provide an experience “complementary” to the component by component strategy of LCG • We feel we would gain many months and acquire a precious experience with no special additional load on the deployment team • This will help bootstrapping a process which we feel is much too slow and timid • And we have already done it with AliEn • ALICE intends to be gLite-only in the shortest possible time ARDA Workshop, CERN, October, 21st, 2004 - 18

More Related