1 / 16

WP8 Report

WP8 Report. F Harris (Oxford/CERN). Outline of presentation. Overview of experiment plans for use of Grid facilities/services for tests and data challenges ATLAS ALICE

dore
Download Presentation

WP8 Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP8 Report F Harris (Oxford/CERN) F Harris Plenary Budapest

  2. Outline of presentation • Overview of experiment plans for use of Grid facilities/services for tests and data challenges • ATLAS • ALICE • CMS • LHCb • BaBar • D0 • Status of ATLAS/EDG Task Force work • Essential requirements for making 1.2.n usable by broader physics user community • Future activities of WP8 and some questions regarding LCG etc. • Summary F Harris Plenary Budapest

  3. ATLAS • Currently in middle of Phase1 of DC1 (Geant3 simulation,Athena reconstruction,analysis). Many sites in Europe+US+Australia,Canada,Japan,Taiwan,Israel and Russia are involved • Phase2 of DC1 will begin Oct-Nov 2002 using new event model • Plans for use of Grid tools in DCs • Phase1 Atlas-EDG Task Force to repeat with EDG 1.2. ~1% of simulations already done. • Using CERN,CNAF,Nikhef,RAL,Lyon • 9 GB input 100 GB output 2000 CPU hrs • Phase2 will make larger use of Grid tools. Maybe different sites will use different tools. There will be (many?) more sites. This to be defined Sep 16-20. • ~10**6 CPU hrs 20 TB input to reconstruction 5TB output (? How much on testbed?) F Harris Plenary Budapest

  4. ALICE • Alice assume that as soon as a stable version of 1.2.n is tested and validated it will be progressively installed on ‘all’ EDG testbed sites • As new sites come will use an automatic tool for submission of test jobs of increasing output size and duration • at the moment do not plan a "data challenge" with EDG. However plan a data transfer test, as close as possible to the expected data transfer rate for a real production and analysis • Will concentrate the AliEn/EDG interface and on the AliRoot/EDG interface in particular for items concerning the Data Management. • Will use CERN, CNAF,Nikhef, Lyon,Turin,Catania for first tests • CPU and store requirements can be tailored to availability of facilities in testbed – but will need some scheduling and priorities F Harris Plenary Budapest

  5. CMS • CMS currently running production for DAQ Technical Design Report(TDR) Requires full chain of CMS software and production tools. This includes use of Objectivity.(licensing problem in hand..) • 5% Data Challenge(DC04) will start Summer 2003 and will last ~ 7 months. This will produce 5*10**7 events. In last month all data will be reconstructed and distributed to Tier1/2 centres for analysis. • 1000 CPUs for 5 months 100 TB output • Use of GRID tools and facilities • Will not be used for current production • Plan to use in DC04 production • EDG 1.2 will be used to make scale and performance tests (proof of concept). Tests on RB, RC and GDMP. Will need Objectivity for tests. • IC,RAL,CNAF/BO,Padova,CERN,Nikhef,IN2P3,Ecol- Poly,ITEP • Some sites will do EDT +GLUE tests • CPU ~50 CPUs distributed Store ~ 200 Gb per site • V2 seems best candidate for DC04 starting summer 2003(has functionality required by CMS) F Harris Plenary Budapest

  6. LHCB • First intensive Data Challenge starts Oct 2002 – currently doing intensive pre-tests at all sites. • Participating sites for 2002 • CERN,Lyon,Bologna,Nikhef,RAL • Bristol,Cambridge,Edinburgh,Imperial,Oxford,ITEP Moscow,Rio de Janeiro • Use of EDG Testbed • Install latest OO environment on testbed sites. Flexible job submission Grid/non-Grid • First tests(now) for MC + reconstruction +analysis with data stored to Mass Store • Large scale production tests(by October) • Production (if tests OK) • Aim to do percentage of production on Testbed • Total reqt is 500 CPUs for 2 months + ~ 10 TB • (10% should be OK on testbed?) F Harris Plenary Budapest

  7. BaBar Grid and EDG (talk by G Grodidier) • Target: have some production environment ready for all users by the end of this year • with attractive interface tools • Customised to SLAC site • There were 3 types of issues raised thru EDG/Globus(experience with 1.1.4) which were solved by local ‘hacks’ • use of LSF Batch Scheduler(uses AFS) • AFS File System used for User Home Directories • Batch Workers located inside of the IFZ (security issue) • Three parts of the Globus/EDG software were installed at SLAC: CE, WN and UI • The exercise clearly showed that they are running fine altogether, and also with the RB at IC • Had problems with ‘old’ version of RB. Will move now to latest version. • BaBar now have D.Boutigny on WP8/TWG F Harris Plenary Budapest

  8. D0 (Nikhef) • Have already ran many events on the testbeds of NIKHEF and SARA • Wish to extend tests to the whole testbed • D0 rpm's are already in the EDG releases and will be installed on all sites. Will set up a special VO and RC for D0 at NIKHEF on a rather short time scale. • Jeff Templon, NIKHEF rep. in WP8, will report on work F Harris Plenary Budapest

  9. Atlas/EDG Task Force(led by O Smirnova)- Foundation work accomplished since since late July • ATLAS 3.2.1 RPMs are distributed with the EDG tools to provide the ATLAS runtime environment • Validation of the ATLAS runtime environment by submitting a short (100 input events) DC1 job was done at several sites: • CERN • NIKHEF • RAL • CNAF • Lyon • Karlsruhe – in progress • A very fruitful cooperation between ATLAS users and EDG experts is ongoing since late July – this type of dialogue will be a principal factor in future developments F Harris Plenary Budapest

  10. What’s almost there • Input file replication: for user this is a multi-step procedure, requiring several steps and complex GDMP commands • Theoretically, it works. However it is very sensitive to errors in any step of the chain • So far, the recommended procedure worked for NIKHEF (input partitions 0003 and 0004) • As of Sept 3 Atlas have looked at the use of the interim Replica Manager which is much simpler for single file replication. F Harris Plenary Budapest

  11. What has just become reliable • Submission of long jobs • The provisional fix of the long known ‘gass-cache’ problem, which allowed the frequent submission of short jobs, turned out to cause long jobs (approx.>20’) never to get to “finished” status • After the fix was removed in recent days significant progress has been made. M Schulz had success with 23/24 long jobs. The single failure was probably related to network problems. • A temporary solution: the production testbed has a RB which point to “fixed” CE for frequent submission of short jobs, and an “ATLAS” RB pointing to “unfixed” CE for long jobs (sees only CERN and Karlsruhe sites, to be extended further) • Atlas are running long jobs now, but the gass-cache problem has to be fixed to allow also frequent submission of short jobs F Harris Plenary Budapest

  12. Essential requirements for making 1.2.n usable by broader physics user community • Top level requirements • Production testbed to be stable for weeks, not hours, and allow spectrum of job submissions • Have reasonably easy to use basic functions for job submission, replica handling and mass storage utilisation • Good concise user documentation for all functions • Easy for user to get certificates and to get into correct VO working environment • We had very positive discussions this week on our needs in joint meetings with Workpackages 1+2+5+6 • ‘gass-cache’ problem is absolute top priority • Can we ‘wrap’ data management complexity while waiting for version 2? (GDMP is too complex for ‘average’ user) – maybe use of interim RM will help. • We need to clarify use of mass store(Castor,HPSS,RAL store) by multi-VOs • E.g how is store partitioned between VOs, and how does non-Grid user access data F Harris Plenary Budapest

  13. More essential requirements on use of 1.2 • We must put people and procedures in place for mapping VO organisation onto test bed sites (e.g. quotas, priorities) • We must clarify user support at sites (middleware + applications) • Installation of applications software • should not be combined with the system installation • Authentication & authorisation • Can we streamline this procedure?(40-odd countries to accommodate for Atlas!) • Documentation (+ Training - EDG tutorials for experiments) • Has to be user-oriented and concise • Much good work going on here (user guide+examples). About to be released F Harris Plenary Budapest

  14. Some longer term requirements • Job Submission to take into account availability of space on SEs and quota assigned to ‘user’ (e.g. for macro-jobs, say 500 each generating 1 GB) • Mass Store should be on Grid in a transparent way (space management, archiving,staging) • Need ‘easy to use’ replica management system • Comments • Are some of these ‘1.2.n ‘rather than ‘2’, i.e. increments in functionality in successive releases? • Task Force people should maintain continuing dialogue with developers • (should include data challenge managers from all VOs in dialogue) F Harris Plenary Budapest

  15. Future activities of WP8 and some questions regarding LCG etc. • The mandate of WP8 is to facilitate the interfacing of applications to EDG middleware, and participate in the evaluation and produce the evaluation reports (start writing very soon!). • Loose Cannons have been heavily involved in testing middleware components, and have produced test software and documentation. This should be packaged for use by the Test Group. • LCs will be involved in liasing with the experiments testing their applications. The details of how this relates to the new Testing/Validation procedure have to be worked out. • WP8 have been involved in the development of ‘application use cases’ and participate to current ATF activities. This is continuing. • We are interested in the feasibility of a ‘common application layer’ running over middleware functions. This issue goes into the domain of current LCG deliberations. • More generally we need to clarify the relationship of WP8 work to applications work in LCG F Harris Plenary Budapest

  16. Summary • Current WP8 top priority activity is Atlas/EDG Task Force work • This has been very positive. Focuses attention on the real user problems, and as a result we review our requirements, design etc. Remember the eternal cycle! We should not be surprised if we change our ideas. We must maintain flexibility with continuing dialogue between users and developers. • Will continue Task Forceflavoured activities with the other experiments • Current use of Testbed is focused on main sites (CERN,Lyon,Nikhef,CNAF,RAL) – this is mainly for reasons of support • Once stability is achieved (see Atlas/EDG work) wewill expand to other sites. But we should be careful in selection of these sites in the first instance. Local support would seem essential. • WP8 will maintain a role in architecture discussions, and maybe be involved in some common application layer developments • THANKS To members of IT and the middleware WPs for heroic efforts in past months, and to Federico for laying WP8 foundations F Harris Plenary Budapest

More Related