1 / 28

Distributed Analysis using GANGA

Learn about GANGA, an easy-to-use frontend that allows for job definition and management in distributed analysis. Explore its user interface options, supported applications, and component architecture.

rbarger
Download Presentation

Distributed Analysis using GANGA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Analysis using GANGA Dietrich Liko

  2. Overview • What is GANGA ? • Why a User Interface ? • How GANGA works ? • User Interface • Command line • GUI • Scripts • GANGA Usage • ATLAS • LHCb

  3. What is GANGA ? • Ganga is an easy-to-use frontend for job definition and management • Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid) • Developed in the context of ATLAS and LHCb • For ATLAS, have built-in support for applicationsbased on Athena framework, for JobTransforms,and for DQ2 data-management system • For LHCb built-in support for applications build on Gaudi framework and the DIRAC middleware • Component architecture readily allows extension • Implemented in Python

  4. Who is GANGA ? • Ganga is an ATLAS - LHCb joint project • Support for development work from UK (PPARC/GridPP), Germany (D-Grid) and EU (EGEE/ARDA) • Core team U.Egede (Imperial), K.Harrison (Cambridge), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Soroko (Oxford), C.L.Tan (Birmingham), A. Muraru (CERN), J.Elmshäuser (Munich)

  5. GANGA Job What to run Application Where to run Backend Data read by application Input Dataset Job Data written by application Output Dataset Rule for dividing into subjobs Splitter Rule for combining outputs Merger

  6. Applications LHCb applications ATLAS applications Metadata catalogues Other applications File catalogues Data storage and retrieval User interface for job definition and management Tools for data management GANGA Remoterepository Experiment-specific workload-management systems Gangamonitoringloop Localrepository Local batch systems Distributed (Grid) systems Ganga job archives Processing systems (backends) GANGA Building Blocks Ganga has built-in support for ATLAS and LHCb Component architecture allows customization for other user groups

  7. Backends and Applications AthenaMC (Production) Athena (Simulation/Digitisation/ Reconstruction/Analysis) Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) Executable PBS LSF OSG PANDA LHCb WMS US-ATLAS WMS Implemented Coming soon

  8. More then 300 GANGA Users

  9. GANGA Users

  10. GANGA Activities • Main Users • Other activities Garfield HARP

  11. About 50 GANGA Domains

  12. Different working styles • Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython) • Especially good for trying things out, and seeing how the system works • Scripts, which may contain any Python/IPython or CLIP commands • allow automation of repetitive tasks • Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system • Graphical User Interface (GUI) allows job management based on mouse selections and field completion • Lots of configuration possibilities

  13. IPython Comfortable python shell Many useful extensions http://ipython.scipy.org/ CLIP GANGA Command line interface Jobs are Python objects How to define a job ? j=Job() j.application=Executable() j.application.exe=‘/bin/echo’ j.applications.args=[‘Hello World’] j.backend=LCG() j.submit() Command Line Interface

  14. Scripts Example from ATLAS ganga athena --inDS trig1_misal1_csc11.005033.Jimmy_jetsJ4.recon.AOD.v12000601 --outputdata AnalysisSkeleton.aan.root --split 3 --maxevt 100 --lcg --ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas AnalysisSkeleton_topOptions.py

  15. GUI

  16. ATLAS Computing Model • Event Filter Farm at CERN • Located near the Experiment, assembles data into a stream to the Tier 0 Center • Tier 0 Center at CERN • Raw data  Mass storage at CERN and to Tier 1 centers • Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD) • Ship ESD, AOD to Tier 1 centers  Mass storage at CERN • Tier 1 Centers distributed worldwide (10 centers) • Re-reconstruction of raw data, producing new ESD, AOD • Scheduled, group access to full ESD and AOD • Tier 2 Centers distributed worldwide (approximately 30 centers) • Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers • On demand user physics analysis • CERN Analysis Facility • Analysis • Heightened access to ESD and RAW/calibration data on demand • Tier 3 Centers distributed worldwide • Physics analysis

  17. ATLAS Distributed Analysis • Data is being distributed to the sites • AOD: To all Tier-1 sites, further distribution to the Tier-2 • ESD: To two Tier-1 sites, only small subsets to Tier-2 • Jobs are send to the data • Tier-2 are the main resource for User Analysis • TAG based analysis will reduce the need for IO • Some important event characteristics will be stored in DB • POOL/Root file based (Tier-2) • Database based (Tier-1 ?) • Two main activities • GANGA on LCG • Pathena/PANDA on OSG

  18. ATLAS infrastructure • ATLAS uses several Grid Infrastructures • EGEE/LCG • OSG • Nordugrid • Workload management • LCG RB • PANDA • ARC middleware • Datamanagement • Don Quijote • Uses FTS and grid specific file catalogs

  19. Workload management • EGEE - Resource Broker • Push model • New gLite RB not yet available for analysis • New players have entered the field … • PANDA • Pull model • Similar to Alien and Dirac • Integrated with DDM • ARC • Push model • Gatekeeper regulates the access to data

  20. Datamanagement with DQ2 Central Dataset catalog Local File Catalogs LFC MySQL RLS OSG EGEE Nordugrid

  21. GANGA Core U. Egede, K. Harrison, J.Moscicki, A.Soroko, V.Romanovsky, A. Murao GANGA GUI C.L. Tan Athena AOD analysis J. Elmshäuser Tag Navigator M. Kenyon, C. Nicholson User production F. Brochu EGEE/LCG H.-C. Lee, D. Liko Nordugrid P. Katarina, B. Hallvard PANDA D.Liko + support from PANDA AMI Integration F. Fassi, C.L. Tan + support from AMI Mona Lisa Montoring B. Gaidioz, J. Yu, T. Reddy Who is ATLAS GANGA ?

  22. GANGA features • Current situation • GANGA on EGEE/LCG • pathena/PANDA on OSG • In upcoming release GANGA 4.3 • Support for gLite RB • Support for PANDA (OSG ATLAS) • Support for ARC (Nordugrid) • Additional features • AMI Metadata • MonaLisa based application monitoring • Tag navigator for event selection on DB

  23. LHCb Computing Model

  24. User sends job to DIRACWMS DIRAC sends pilot job to LCG site Only if the Pilot is running well the job is pulled from the taskqueue Small files returned via Sandbox, large files are registered in LFC User is interacting only with DIRAC and is shielded from RB problems Analysis with DIRAC

  25. Analysis of 5 million events

  26. Throughput • 90 % of the results within 3 hours • 95 % of the results after 4 hours • 100 % after 10 hours • The delay was casued by a problem to access the data on a Tier-1

  27. Reliability • DIRAC • Problems releated to file registration • Resource Broker • Submission to all sites gave a bad success rate • Submission to a well working tier-1 site gave results close to DIRAC submissions

  28. Summary • User Analysis based on GANGA is well progressing • More then 300 persons have tried GANGA since this year • Up to 50 users on a daily basis • ATLAS and LHCb use the same framework, but different plugins • The flexibility of the framework is an insurance for future middleware developments • The GANGA model has brought together a good collaboration between various development teams • Have a try yourself • Experiment specific tutorials on regular basis • ATLAS Distributed Analysis Tutorial in Lyon • Interest to include GANGA in the EGEE Dissemination • First tutorial in Taipeh

More Related