1 / 49

Introduction to Distributed Analysis

Introduction to Distributed Analysis. Dietrich Liko. Overview. Introduction to Grid Computing Three grid flavors in ATLAS EGEE OSG Nordugrid Distributed Analysis Activities GANGA/LCG PANDA/OSG Other tools How to find your data ? Where is the data stored

ghurst
Download Presentation

Introduction to Distributed Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Distributed Analysis Dietrich Liko

  2. Overview • Introduction to Grid Computing • Three grid flavors in ATLAS • EGEE • OSG • Nordugrid • Distributed Analysis Activities • GANGA/LCG • PANDA/OSG • Other tools • How to find your data ? • Where is the data stored • Which data is really available ?

  3. Evolution of CERN computing Evolution of CERN computing 2 years to build 3 months to install 320 kBytes storage Less computing power than today’s calculators 1958: Ferranti Mercury 1967: CDC 6400 1976: IBM 370/168 The scope and complexity of particle-physics experiments has increased in parallel with increases in computing power Massive upsurge in computing requirements in going from LEP to LHC 2001: PC Farm 1988: IBM MM 3090,DEC VAX, Cray X-MP

  4. Strategy for processing LHC data • Majority of data processing (reconstruction/simulation/analysis) for LEP experiments performed at CERN • About 50% of physics analyses run at collaborating institutes • Similar approach might have been possible for LHC • Increase data-processing capacity at CERN • Take advantage of Moore’s Law increase inCPU power and storage • LHC Computing Review (CERN/LHCC/2001-004)discouraged LEP-type approach • Rules out access to funding not available to CERN • Makes poor use of expertise and resources at collaborating institutes • Require solution for managing distributed data and CPUs: Grid computing  Project for LHC Computing Grid (LCG) started 2002

  5. Grid Computing • Ideas behind Grid computing have been around since the 1970s, but became very fashionable around the turn of the century A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access tohigh-end computational capabilities. Ian Foster and Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure (1998) • First release of Globus Toolkit for Grid infrastructures made in 1998 • World Wide Web commercially attractive by late 1990s • e-Everything suddenly in vogue: e-mail, e-Commerce, e-Science • Dot-com bubble 1998-2002 • Grid proposed as evolution of World Wide Web:access to resources as well as to information • Many projects: • EGEE, OSG, Nordugrid • GridPP, INFN Grid, D-Grid

  6. Distributed Analysis • Data Analysis • AOD & ESD analysis • TAG based analysis • pathena/PANDA • GANGA/LCG • User Production • Prodsys • LJSF • GANGA (DQ2 Integration)

  7. EGEE • Job submission via LCG Resource Broker • The new gLite RB is on its way … • LFC File catalog • Also CondorG submission is possible • Requires some expertise and has no support from the service provider • New approach using Condor glideins is under investigation (Cronus)

  8. Resource Broker Model CE RB CE RB CE

  9. OSG/Panda • PANDA is an integrated production and distributed analysis system • Pilot job based • Similar to DIRAC & Alien • Simple File Catalogs at sites • Will be supported by GANGA in release 4.3

  10. Three grids …. • ATLAS is using three large infrastructures • EGEE • OSG • Nordugroid • Grids have different middleware • Different software to submit jobs • Different catalogs to store the data • We have to aim to hide this differences from the ATLAS user

  11. Panda Model CE Task queue CE CE

  12. Nordugrid • ARC middleware for job submission • Powerful and simple • RLS Filecatalog • Will be supported by GANGA in release 4.3

  13. ARC Model CE CE CE

  14. How can we live with that ? • Data management layer to hide this differences – Don Quixote 2 • Tools that aim to hide the difficulties to submit jobs • pathena/PANDA on OSG • GANGA on LCG • In the future better interoperability • On level of the ATLAS tools • On the level of the middleware

  15. pathena/PANDA • Lightweight client • Integrated to Athena release • Very nice work • A lot of work has been done to support better user jobs • Short queues, multitasking pilots etc. • A large set of data is available • Available since some time

  16. GANGA/LCG • Text UI & GUI • A pathena-like interface is available • Multiple backends • LCG/EGEE • LSF – works also with CAT queues • PBS • PANDA & Nordugrid for 4.3 • And others

  17. Dashboard Monitoring • We are setting up a framework to monitor distributed analysis jobs • MonaLisa based (OSG, LCG) • RGMA • Imperial collage DB • Production system • http://dashboard.cern.ch/atlas • GANGA has been instrumented to understand its usage

  18. Since September 1st …

  19. Dataset distribution • In principle data should be everywhere • AOD & ESD during this year ~ 30 TB max • Three steps • Not all data can be consolidated • Other grids, Tier-2 • Distribution between Tier-1 not yet perfect • Distribution to Tier-2’s can only be the next step

  20. Latest number by Alexei – Feb 27 Files req/copied Transfers Waiting(*) Transfered in 7 days ASGC 5604 1883 33.6 53 1883 BNL 1891 1532 81.0 5 24 CERN 5587 5489 98.2 1 2581 CNAF 5610 2801 49.9 12 1111 FZK 5645 5541 98.2 0 2668 LYON 5529 5464 98.8 0 2643 NDGF 4822 3116 64.6 10 893 NIKHEF 5700 5471 96.0 1 2563 PIC 5787 2362 40.8 32 2617 RAL 5763 3903 67.7 12 30 TRIUMF 5744 3740 65.1 13 843 The milage is varying between 33.6 % to 98.2

  21. Monitoring of transfers

  22. Why can I not send the jobs to the data automatically ? • I will advise you to send jobs to selected sites • This is not the final word, this is just a way to address the current situation • ATLAS is using a dataset concept • Datasets have a content • Datasets have one or more locations • Datasets can be complete or incomplete at a location • Only complete datasets can be used in a dataset based brokering process • We are currently trying to understand • How much data is available as complete datasets • Can we do a file based brokering for incomplete datasets ? • We have big progress in the last months, but not yet all is working as we would like

  23. How to find out which data exists • AMI Metadata • http://lpsc1168x.in2p3.fr:8080/opencms/opencms/AMI/www/index.html • Prodsys database • http://cern.ch/atlas-php/DbAdmin/Ora/php-4.3.4/proddb/monitor/Datasets.php • Dataset browser • http://panda.atlascomp.org/?overview=dslist

  24. How to access data ? • Download with dq2_get, analyze locally • Works (sometimes), is not scalable • Data is distributed on sites, jobs are send to sites to analyze the data • DA is promoting this way of working • The process of finding the data will be fully automated in some time

  25. Posix like IO • DA wants to read data directly from the SE • Prodsys is downloading the data using gridftp • Use rfio, dcap, GFAL, xrootd • We want to use posix like IO • Size of the local disk for the job • We do not need the full event • We do not need all events • As of today ATLAS AOD jobs read data with ~ 2 MB/sec

  26. Analysis jobs • Today on job • 10 to 100 AOD files, 130 MB each • 1 year of running of LHC • 150 TB AOD according to ATLAS computing model • Filesize 10 GB • Still order of 10000 files • Backnavigation • Reduces IO • Increase load on SE do to more “open”

  27. Some measurements • 10 files a 130 MB • Standard Analysis Example • Local: 14:02 min • DPM using rfio: 16:30 min • Castor-2: 20:29 min • 150 TB: about 1000 days

  28. DPM in Glasglow

  29. Athena jobs • Athena uses POOL/ROOT • Many issues concerning plugins and current configuration • See Wiki page • https://twiki.cern.ch/twiki/bin/view/Atlas/IssuesWithPosixIO

  30. Highlights • dCache • Wrong dCache library (except BNL) • DPM • Need to provide a symbolic link (libdpm.so -> libshift.so) • Broken RFIO plugin • DPM URLs not support • Castor • New castor sytntax not supported • No files larger the 2BG • Some issues will go away with v13 • RFIO plugin will still be outdated • New rfio library not yet released • We need to do systematic test • Proposed by Stephane

  31. Backporting ROOT RFIO plugin • Advantages • New syntax a la Castor-2 • Large Files > 2GB • Problems with DPM • A different URL format • Some problems querying the file attributes • Several patches required to make in work • Security context required, but Grid UI clashes since last week with Athena due to python version • New RFIO plugin is under development inside ROOT • In generally new ROOT IO plugins should be backported to agreed ROOT versions

  32. Short queues • Distributed Analysis competes with Production • Short queues can be used to speed up the analysis • There is a lot of discussion going on how useful short queues are • Empirically I prefer to send jobs to short queues • https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ#How_to_find_out_suited_Computing • Selection of queues is the easy part, selecting the dataset location is the complicated aspect • Fully automatic for complete datasets

  33. Summary • Several tools are available to perform Distributed Analysis • Integrated with DQ2 • Data is being collected and also distributed • Still a lot of work in front of us • We are learning how to access data everywhere • How to find data • How to read data • Not fully automatic yet • But we aim for that • We learn how to handle user jobs • Job Priorities on LCG • Short Queues

  34. Next steps • Increase the number of sites • We have to push getting the data at all Tier-1. They are the backbone of the ATLAS data distribution • Interoperability • Is for sure be an issue for this year • GANGA will send jobs to other sites • PANDA will run on LCG • Cronus wants to bridge all resources

  35. GANGA Introduction

  36. GANGA Core Ulrik Egede, Karl Harrison, Jakub.Moscicki, A.Soroko, V.Romanovsky, Adrina Murao GANGA GUI Chun Lik Tan Athena AOD analysis Johannes Elmsheuser Tag Navigator Mike Kenyon, Caitherina Nicholson User production Fredric Brochu EGEE/LCG Hurng-Chun Lee, Dietrich Liko Nordugrid Pajchel Katarina, Bjoern Hallvard PANDA Dietrich Liko + support from PANDA Cronus Rod Walker AMI Integration Farida Fassi, Chun Lik Tan + support from AMI Mona Lisa Montoring Benjamin Gaidioz, Jae Yu, Tummalapalli Reddy Who is ATLAS GANGA ?

  37. What is GANGA ? • Ganga is an easy-to-use frontend for job definition and management • Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid) • Developed in the context of ATLAS and LHC. For ATLAS • Athena framework • JobTransformations • DQ2 data-management system • EGEE/LCG • For release 4.3 • AMI • PANDA/OSG • Nordugrid • Cronus • Component architecture readily allows extension • Implemented in Python

  38. Users

  39. Domains

  40. GANGA Job Abstraction What to run Application Where to run Backend Data read by application Input Dataset Job Data written by application Output Dataset Rule for dividing into subjobs Splitter Rule for combining outputs Merger

  41. Framework for plugins GangaObject Plugin Interfaces IApplication ISplitter IMerger IDataset IBackend • atlas_release • max_events • options • option_file • user_setupfile • user_area • CE • requirements • jobtype • middleware • id • status • reason • actualCE • exitcode Athena LCG User Example plugins and schemas System

  42. Backends and Applications AthenaMC (Production) Athena (Simulation/Digitisation/ Reconstruction/Analysis) Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) Executable PBS LSF OSG PANDA LHCb WMS US-ATLAS WMS Implemented Coming soon

  43. Status • Actual version: 4.2.11 • AOD analysis • TAG based analysis • Mona Lisa based Monitoring • LCG/EGEE • Batch handlers • Upcoming version 4.3 • Tag Navigator • AMI Integration • PANDA • Nordugrid • Cronus

  44. Applications LHCb applications ATLAS applications Metadata catalogues Other applications File catalogues Data storage and retrieval User interface for job definition and management Tools for data management GANGA Remoterepository Experiment-specific workload-management systems Gangamonitoringloop Localrepository Local batch systems Distributed (Grid) systems Ganga job archives Processing systems (backends) How elements work together ? Ganga has built-in support for ATLAS and LHCb Component architecture allows customisation for other user groups

  45. Different working styles • Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython) • Especially good for trying things out, and seeing how the system works • Scripts, which may contain any Python/IPython or CLIP commands,allow automation of repetitive tasks • Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system • Graphical User Interface (GUI) allows job management based on mouse selections and field completion • Lots of configuration possibilities

  46. Scripts provide pathena like interface ganga athena --inDS trig1_misal1_csc11.005033.Jimmy_jetsJ4.recon.AOD.v12000601 --outputdata AnalysisSkeleton.aan.root --split 3 --maxevt 100 --lcg --ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas AnalysisSkeleton_topOptions.py Monitoring the job status for example using GUI or CLI

  47. IPython • IPython • Comfortable python shell • Many useful extensions • http://ipython.scipy.org/ • CLIP • GANGA Command line interface • How to define a job j=Job() j.application=Executable() j.application.exe=‘/bin/echo’ j.applications.args=[‘Hello World’] j.backend=LCG() j.submit() Other commands jobs Jobs[20].kill() jobs[20].copy()

  48. GUI

  49. Exercises • Subset adapted for today • https://cern.ch/twiki/bin/view/Atlas/GangaTutorialAtCCIN2P3 • Current Tutorial that explains more features • https://cern.ch/twiki/bin/view/Atlas/GangaGUITutorial427 • FAQ • https://cern.ch/twiki/bin/view/Atlas/DAGangaFAQ • User Support using hypernews • https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html

More Related