1 / 20

LHCb Distributed Computing and the Grid

This presentation provides an overview of LHCb's distributed computing organization, UK facilities and support through GridPP, current use of Globus and EDG middleware, planning for data challenges and the use of the grid, and current LHCb grid applications.

meeker
Download Presentation

LHCb Distributed Computing and the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHCb Distributed Computingand the GridNick BrookUniversity of Bristol D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol K. Harrison Cambridge E. Van Herwijnen, J. Closier, P. Mato CERN A. Khan Edinburgh A. Tsaregorodtsev Marseille H. Bulten, S. Klous Nikhef F. Harris, I. McArthur, A. Soroko Oxford G. N. Patrick, G. Kuznetsov RAL Nick Brook ACAT' 02

  2. Overview of presentation • Current organisation of LHCb distributed computing • UK facilities and support through GridPP • Current use of Globus and EDG middleware • Planning for data challenges and the use of Grid • Current LHCb Grid/applications R/D • Conclusions Nick Brook ACAT' 02

  3. History of distributed MC production • Distributed System has been running for 3+ years & processed many millions of events for LHCb design. • Main production sites: • CERN, Bologna, Liverpool, Lyon, NIKHEF & RAL • Globus already used for job submission to RAL and Lyon • System interfaced to GRID and demonstrated at EU-DG Review and NeSC/UK Opening. • For 2002 Data Challenges, adding new institutes: • Bristol, Cambridge, Oxford, ScotGrid • In 2003, add • Barcelona, Moscow, Germany, Switzerland & Poland. Nick Brook ACAT' 02

  4. Current Architecture Production Manager Create no. of jobs (500 events each) Determine configuration Run executable Check data Copy data/logs Physics Coordinator Physicist Job Creation/Submission via Web Identify outstanding requests Select workflow Create scripts via Java servlets. Monitoring via PVSS Submit jobs to distributed sites See what jobs are running Check configuration Kill jobs, etc Bookkeeping Database Nick Brook ACAT' 02

  5. LOGICAL FLOW Submit jobs remotely via Web Analysis Execute on farm Data quality check Update bookkeeping database Transfer data to mass store Nick Brook ACAT' 02

  6. Monitoring and Control of MC jobs • LHCb has adopted PVSS II as prototype control and monitoring system for MC production. • PVSS is a commercial SCADA (Supervisory Control And Data Acquisition) product developed by ETM. • Adopted as Control framework for LHC Joint Controls Project (JCOP). • Available for Linux and Windows platforms. Nick Brook ACAT' 02

  7. Nick Brook ACAT' 02

  8. UK Tier 1 - RAL New Computing Farm 4 racks holding 156 dual 1.4GHz Pentium III cpus. Each box has 1GB of memory, a 40GB internal disk and 100Mb ethernet. Tape Robot upgraded last year uses 60GB STK 9940 tapes 45TB current capacity could hold 330TB. 50TByte disk-based Mass Storage Unit after RAID 5 overhead. PCs are clustered on network switches with up to 8x1000Mb ethernet out of each rack. 2004 Scale: 1000 CPUs 0.5 PBytes Nick Brook ACAT' 02

  9. UK Regional Centres Local Perspective: Consolidate Research Computing Optimisation of Number of Nodes? 4 Relative size dependent on funding dynamics Nick Brook ACAT' 02

  10. UK Prototype Tier2 - ScotGrid • ScotGrid Processing nodes at Glasgow • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and dual ethernet • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switches • ScotGrid Storage at Edinburgh • IBM X Series 370 PIII Xeon with 512 MB memory 32 x 512 MB RAM • 70 x 73.4 GB IBM FC Hot-Swap HDD 2004 Scale: 300 CPUs 0.1 PBytes Nick Brook ACAT' 02

  11. GridPP support • 2 LHCb posts: • to work on Gaudi (software framework) persistency services • to work on MC monitoring and control software • 2 ATLAS/LHCb : • Gaudi/GANGA posts: • Interface between software framework and Grid services Nick Brook ACAT' 02

  12. Current Use of Grid Middleware in development system • Authentication • grid-proxy-init • Job submission to DataGrid • dg-job-submit • Monitoring and control • dg-job-status • dg-job-cancel • dg-job-get-output • Data publication and replication • globus-url-copy, GDMP • Resource scheduling – use of CERN MSS • JDL, sandboxes, storage elements Nick Brook ACAT' 02

  13. Example 1:Job Submission dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jdl -o /home/evh/logsub/ bbincl1600061.jdl: # Executable = "script_prod"; Arguments = "1600061,v235r4dst,v233r2"; StdOutput = "file1600061.output"; StdError = "file1600061.err"; InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl1600061.sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/sicb/sicb1600062.dat","/home/evhtbed/sicb/sicb1600063.dat","/home/evhtbed/v233r2.tar.gz"}; OutputSandbox = {"job1600061.txt","D1600063","file1600061.output","file1600061.err","job1600062.txt","job1600063.txt"}; Nick Brook ACAT' 02

  14. Example 2: Data Publishing & Replication Compute Element Storage Element MSS Local disk Job Data globus-url-copy Data register-local-file publish CERN TESTBED Replica Catalogue NIKHEF - Amsterdam REST-OF-GRID replica-get Job Data Storage Element Nick Brook ACAT' 02

  15. LHCb Data Challenge 1 (July-September 2002) • Physics Data Challenge (PDC) for detector, physics and trigger evaluations • based on existing MC production system – small amount of Grid tech to start with • Generate ~3*107 events (signal + specific background + generic b and c + min bias) • Computing Data Challenge (CDC) for checking developing software • will make more extensive use of Grid middleware • Components will be incorporated into PDC once proven in CDC Nick Brook ACAT' 02

  16. Converter Converter Application Manager Converter Transient Event Store Data Files Message Service Persistency Service Event Data Service JobOptions Service Algorithm Algorithm Algorithm Data Files Transient Detector Store Particle Prop. Service Persistency Service Detec. Data Service Other Services Data Files Transient Histogram Store Persistency Service Histogram Service LHCb software framework - Gaudi Nick Brook ACAT' 02

  17. GANGA: Gaudi ANd Grid AllianceJoint Atlas (C. Tull) and LHCb (P. Mato) project,formally supported by GridPP/UK with 2 joint Atlas/LHCb research posts at Cambridge and Oxford • Application facilitating end-user physicists and production managers the use of Grid services for running Gaudi/Athena jobs. • a GUI based application that should help for the complete job life-time: • - job preparation and • configuration • - resource booking • - job submission • - job monitoring and control GANGA GUI Collective & Resource Grid Services Histograms Monitoring Results JobOptions Algorithms GAUDI Program Nick Brook ACAT' 02

  18. Required functionality • Before Gaudi/Athena program starts • Security (obtaining certificates and credentials) • Job configuration (algorithm configuration, input data selection, ...) • Resource booking and policy checking (CPU, storage, network) • Installation of required software components • Job preparation and submission • While Gaudi/Athena program is running: • Job monitoring (generic and specific) • Job control (suspend, abort, ...) • After program has finished: • Data management (registration) Nick Brook ACAT' 02

  19. GUI PYTHON SW BUS GaudiPython PythonROOT GRID Athena\GAUDI GAUDIclient Internet Local user Production DB Bookkeeping DB Workspaces DB EDG API OS Module Java Module Remote user HTML page Job Configuration DB Python Bus Design(A possible model for implementation) Nick Brook ACAT' 02

  20. Conclusions • LHCb already has distributed MC production using GRID facilities for job submission • We are embarking on large scale data challenges commencing July 2002, and we are developing our analysis model • Grid middleware will be being progressively integrated into our production environment as it matures (starting with EDG, and looking forward to GLUE) • R/D projects are in place • for interfacing users (production + analysis) and Gaudi/Athena software framework to Grid services • for putting production system into integrated Grid environment with monitoring and control • All work being conducted in close participation with EDG and LCG projects • Ongoing evaluations of EDG middleware with physics jobs • Participate in LCG working groups e.g. Report on ‘Common use cases for a HEP Common Application layer’ http://cern.ch/fca/HEPCAL.doc Nick Brook ACAT' 02

More Related