1 / 1

Ying Ying Li

Windows Implementation of LHCb Experiment Workload Management System DIRAC. Ying Ying Li. 27Km. The Experiment.

winka
Download Presentation

Ying Ying Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Windows Implementation of LHCb Experiment Workload Management System DIRAC Ying Ying Li 27Km The Experiment LHCb is one of the four main high energy physics experiments at theLarge Hadron Collider (LHC) atCERN, Geneva. LHCb is designed to investigate the matter-antimatter asymmetries seen in the Universe today, concentrating on studies of particles containing a b quark. Once it starts operation in 2007/8 LHCb will need to process data volumes of the order of petabytes per year, requiring tens of thousands of CPUs. To be able to achieve this, a workload management system (DIRAC), allowing coordinated use of globally distributed computing resources (Grid) has been developed, with implementation in Python. DIRAC currently coordinates LHCb jobs running on 6000+ CPUs shared with other experiments, distributed among 80+ sites across 4 continents. DIRAC has demonstrated its capabilities during a series of data challenges held since 2002, with a current record of 10,000+ jobs running simultaneously across the Grid. Most of the LHCb data-processing applications are tested under both WindowsandLinux, but the production system has previously been deployed only onLinux platforms. This project will allow a significant increase in the resources available to LHCb, by extending the DIRAC system to also use Windows machines. Users can create jobs using a Python API or can directly write scripts in DIRAC’s Job Definition Language (JDL). In both cases, the user specifies the application to be run, the input data (if required), and any precompiled libraries. Applications developed by LHCb can be combined to form various types of jobs, rang- ing from production jobs (simulation + digitali- sation + reconstruc- tion) to physics analysis. Jobs are submitted via DISET, the DIRAC security module built from Openssl language tools and modified version of pyOpenssl. Authorisation use is made of certificate based authentication. Input files are uploaded to the sandbox service on the DIRAC server. SoftwarePackages = { “DaVinci.v12r15" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v12r15.log” }; JobType = "user"; JDL Users are able to monitor job progress from the monitoring web page: http://lhcb.pic.es/DIRAC/Monitoring/Analysis import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v12r15') job.setInputSandbox(['DaVinci.opts’]) job.setInputData(['LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst']) job.setOutputSandbox([‘DaVinci_v12r15.log’]) dirac.submit(job) User API Submit Job Monitoring Once a job reaches the DIRAC server it is checked for requirements placed by the owner, and waits for a suitably matched Agent from a free resource. DIRAC Agents act to link the distributed resources together. When a resource is free to process jobs it sends out a locally configured Agent, with the specifications of the resource, to request jobs from the central server. After a suitable job and Agent are matched:  Agent retrieves the job’s JDL and sandbox, it wraps the job in a Python script, and reports back.  If the resource is not a standalone CPU, the resource backend (LCG, Windows Compute Cluster, condor etc.) is checked, and the wrapper is submitted accordingly. Download and install any required application when necessary.  Using the GridFTP protocol and LFC (LCG File catalogue) download any required Grid data, for example from the CERN Castor system. Run the job, and report on progress.  Perform any requested data transfers. Create Job Applications DIRAC can be tailored to allow running of any type of application. The important applications for LHCb are based on a C++ frame work called Gaudi. GAUSS – Monte Carlo generator for simulation of particle collisions in the detector. Boole – Produces detector response to GAUSS ‘hits’. Brunel – Reconstruction of events from Boole/detector. DaVinci – Physics Analysis (C++). Bender – Physics Analysis (Python, using bindings to C++). Agent Agents Resources CASTOR storage LHC Computing Grid Clusters, Standalone desktops, laptops … GridFTP Data transfer Now + Future ... The process of portingDIRAC to Windowshas involved work in several areas, which include automated installation of DIRAC, DISET (security module), automated LHCbapplication download,installation, running, and secure data transfer with .NetGridFTP. The result is a fully operational DIRAC system, that is easily deployable in a Windows environment, and allows the authorised user to submit jobs, and offer the CPU as an available resource to the LHCb experiment alongside Linux resources. The work describe has been developed on a Windows Compute Cluster consisting of four Shuttle SN95G5 boxes, running Windows Server 2003 Compute ClusterEdition software. This has also assisted in the extension of DIRAC’s Compute Cluster backend computing element module. Tests have also been made on a Windows XP laptop, which demonstrate the flexibility and ease of deployment. The system has been deployed on a small cluster at Cambridge and Bristol, and on a larger cluster (~100 CPU’s) at Oxford. This project displays the platform independence of DIRAC and its potential. The DIRAC system has been used successfully with a subset of the LHCb applications. Current work focuses on deploying and testing the full set of LHCb applications under Windows to allow the running of production jobs.

More Related