1 / 11

MB Report on SC4 Technical Meeting

MB Report on SC4 Technical Meeting. A one day meeting was held at CERN on 21 June to look at technical issues of the current service challenge program. There were 30 registered attendees representing most Tier 1 sites and some UK and German Tier 2 sites and others joined by VRVS.

fgardner
Download Presentation

MB Report on SC4 Technical Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MB Report on SC4 Technical Meeting A one day meeting was held at CERN on 21 June to look at technical issues of the current service challenge program. There were 30 registered attendees representing most Tier 1 sites and some UK and German Tier 2 sites and others joined by VRVS. MB Report on SC4 Tecnical Meeting

  2. Talks/discussions: morning session Tier 1 sites were requested to send in advance: • Problems seen during the disk-disk and disk-tape transfers and steps taken/planned to address them • Problems seen in implementing the agreed services, including a timeline • Problems encountered in the gLite 3.0 upgrade • Features seen as missing in core services / middleware required for operations And these were reported on in the morning session: • Understanding Disk - Disk and Disk - Tape Results • Problems in setting up basic services • Operational Requirements for Core Services • Discussion on moving from here to full production services and data rates (based on experiment and DTEAM challenges/tests) MB Report on SC4 Tecnical Meeting

  3. Understanding Disk - Disk and Disk - Tape Results (M.Litmaath) • Full April rate of 1.6 GB/sec only reached for one day • Very variable ramp-up and stability across the T1 sites • FTS log files are vital to discover and analyze problems • Remote root login impractical – web access needed • srmCopy transfers do not yet have remote logging info sent to FTS logs so are hard to debug • Many problems could be detected by sensors at T1 sites • Significant failure rates for SRM or gridftp requests • Most channels need too many parallel transfers and/or too many streams – not the real experiment use cases - we need to improve the rate per stream • Network problems and interventions are not always announced • Firewall issues keep coming back • GridView needs a publisher for dCache transfer statistics • Currently results are only published for CASTOR and DPM sites MB Report on SC4 Tecnical Meeting

  4. Problems in setting up basic services (G.Mccance) MB Report on SC4 Tecnical Meeting

  5. ATLAS SC plans/requirements • Running now till 7 July to demonstrate the complete Atlas DAQ and first pass processing with distribution of raw and processed data to Tier 1 sites at the full nominal rates. Will also include data flow to some Tier2 sites and full usage of the Atlas Distributed Data Management system, DQ2. Raw data to go to tape, processed to disk only. Sites to delete from disk and tape • After summer investigate scenarios of recovery from failing Tier 1 sites and deploy cleanup of pools at Tier 0. • Later, test distributed production, analysis and reprocessing. • DQ2 has a central role with respect to Atlas Grid tools • ATLAS will install local DQ2 catalogues and services at Tier 1 centres • Define a region of a Tier 1 and well network connected sites that will depend on the Tier 1 DQ2 catalogue. • Expect such (volunteer) Tier 2 to join SC when T0/T1 runs stably • ATLAS will delete DQ2 catalogue entries • Require VO box per Tier 0 and Tier 1 – done • Require LFC server per Tier 1 – done, must be monitored • Require FTS server and validated channels per Tier 0 and Tier 1 – close • Require ‘durable’ MSS disk area at Tier 1 – few sites have it MB Report on SC4 Tecnical Meeting

  6. ALICE SC Plans • Validation of the LCG/gLite workload management services: ongoing • Stability of the services is fundamental for the entire duration of the exercise • Validation of the data transfer and storage services • 2nd phase: July/August T0 to T1 (scratch) at 300 MB/sec • The stability and support of the services have to be assured beyond the throughput tests • Validation of the ALICE distributed reconstruction and calibration model: Aug/Sep reconstruction at Tier 1 • Integration of all Grid resources within one single – interfaces to different Grids (LCG, OSG, NDGF) • End-user data analysis: September/October MB Report on SC4 Tecnical Meeting

  7. ALICE Requirements/Issues • ALICE deploy a VO box at all their T0-T1-T2 sites • Installation and maintenance by ALICE • Site related problems to be handled by site administrators (have they been notified of these machines ?) • FTS services required as plugin to AliEn File Transfer Daemon • LFC required at all ALICE sites • Used as a local catalogue for the site SE • ALICE will take care of the LFC updates • Require FTS endpoints at the T0 and T1 with SRM enabled storage and automatic data deletion (by the sites) for the 300 MB/sec throughput test (24 to 30 July). Will the SC team setup and test this before handing it over to ALICE ? • Require site support during the whole tests and beyond: • What are the site contacts for the central and distributed support teams, or does everything go through GGUS ? MB Report on SC4 Tecnical Meeting

  8. CMS SC Plans/Requirements • In September/October run CSA06, a 50 million event exercise to test the workflow and dataflow associated with the data handling and data access model of CMS • Till end June • Continue to try to improve transfer efficiency. Low rates and many errors now. • Attempt to hit 25k jobs per day and increase the number and reliability of sites performing 90% efficiency for job completion • In July • Demonstrate CMS analysis submitter in bulk mode with the gLite RB • In July and August • 25M events per month with the production systems • Second half of July participate in multi-experiment FTS Tier-0 to Tier-1 transfers at 150 MB/sec out of CERN • Continue through August with transfers • Requirements: • Improve Tier-1 to Tier-2 transfers and the reliability of the FTS channels. • We are exercising the channels available to us, but there are still issues with site preparation and reliability • the majority of sites are responsive, but there is a lot of work for this summer • Require to deploy the LCG-3D infrastructure • From late June deploy Frontier for SQUID caches • All participating sites should be able to complete the CMS workflow and metrics MB Report on SC4 Tecnical Meeting

  9. LHCB SC Plans/Requirements • Will start DC06 challenge at beginning of July using LCG production services and run till end August: • Distribution of raw data from CERN to Tier 1s at 23 MB/sec • Reconstruction/stripping at Tier 0 and Tier 1 • DST distribution to CERN and Tier 1s • Job prioritisation will be dealt with by LHCB but it is important jobs are not delayed by other VO activities • Preproduction for this is ongoing with 125 TB of MC data at CERN • Production will go on throughout the year for an LHCB physics book due in 2007 • Require SRM 1.1 based SE’s separated for disk and MSS at all Tier 1 as agreed in Mumbai and FTS channels for all CERN-T1’s • Data access directly from SE to ROOT/POOL (not just GridFTP/srmcp) • Require VO boxes at Tier 1 – so far at CERN, IN2P3, PIC and RAL. Need CNAF, NIKHEF and GridKa • Require central LFC catalogue at CERN and read-only copy at certain T1 • DC06-2 in Oct/Nov requires T1’s to run COOL and 3D database services MB Report on SC4 Tecnical Meeting

  10. Experiment Tier 1 Site Requirements MB Report on SC4 Tecnical Meeting

  11. Talks/discussions: afternoon session Talks were requested from each experiment to address: • What they want to achieve over the next few months with details of the specific tests and production runs. • Specific actions, timeline, sites involved. • If they have had bad experiences with specific sites then this should be discussed and resolved. MB Report on SC4 Tecnical Meeting

More Related