1 / 24

Distributed Analysis on Grid: An Efficient Framework for Utilizing Large-Scale and Geographically Distributed Resources

This article discusses the use of distributed analysis tools such as Panda and Ganga for running analysis on grid systems, with a focus on parallelization for faster turnaround time and the challenges of working with unavoidably distributed data.

wintersl
Download Presentation

Distributed Analysis on Grid: An Efficient Framework for Utilizing Large-Scale and Geographically Distributed Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Analysis ToolsPanda & Ganga Tadashi Maeno(Brookhaven National Laboratory)

  2. Distributed Analysis on Grid (script,exe) Grid A framework to utilizelarge-scale andgeographically distributed resources • Local analysis Limited resources • 1~4 CPUs • ~1TB of disk

  3. Distributed Analysis on Grid Grid Local analysis

  4. Distributed Analysis on Grid • Run own analysis on distributed resources • Parallelization for fast turnaround • 1CPU×800hours  800CPUs×1hour • Unavoidably distributed data • 10 T1 computing center for ATLAS, but no T1 can host all data

  5. Traditional Procedure of DA brokerage Grid • Site selection Gate Keeper = Computing Element • File upload • Authentication Local analysis CPUs = Worker Nodes • Job execution Storage • Input/Output data

  6. Traditional Procedure of DA brokerage Grid • Site selection Gate Keeper = Computing Element • File upload • Authentication Different among grid-flavors = grid-middlewaredependent Local analysis CPUs = Worker Nodes local batch-system condor,LSF,PBS,… • Job execution local storage for EGEE/OSG dcap,dpm,xrootd,castor remote storage for NDGF Storage • Input/Output data

  7. Three middleware EGEE Grid = EGEE backend e.g., upload a file using different protocol OSG backend NDGF backend Not entirely true ! Just for intuitive understanding

  8. Simple Implementation of CommonUser I/F for Various Backends def upload (file,backendType): if backendType==EGEE: egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF: ndgfModule.upload(file) • Prepare a plug-in module for each backend • Implementation of GANGA to support multiple backends • Easily extended for other backends

  9. Simple Implementation of CommonUser I/F for Various Backends • Ultimately users have to understand each backend • e.g., connection failure  each backend uses a different port  check each port • 3 backends  expertise and support/development work are 3 times more • Limited manpower • Capability for easy extension is useful in R&D phase but is redundant in production phase

  10. Common I/F using pilot (PANDA System) Panda server NDGF pilot arc pull aCT https OSG pilot analysis job job https submit EGEE pilot Pilot factory End-user

  11. Common I/F using pilot (PANDA System) Panda server NDGF pilot Users access a common server using a single protocol arc pull aCT https OSG pilot analysis job Interaction with backends is done centrally job https submit EGEE pilot Pilot factory End-user

  12. Operation/Service Model of PANDA • End-users are insulated from GRID • Communicate with the Panda (HTTP) server • Lower threshold especially for physicists • Pilot factory sends pilots using GRID middleware • Only the operator of the scheduler needs to have enough expertise on GRID • Production and Analysis run on the same infrastructure • Production should suffer from the same problem as analysis • Once production team (one shift crew) fix the problem for official production, analysis get cured automatically no additional manpower is needed for analysis

  13. Panda • PANDA = Production ANd Distributed Analysis system • Designed for analysis as well as production • Project started Aug 2005, prototype Sep 2005, production Dec 2005 • The backend for all ATLAS production jobs • The primary backend for all ATLAS anaysis jobs • A single task queue and pilots • Apache-based Central Server • Pilots retrieve jobs from the server as soon as CPU is available  low latency • Highly automated, has an integrated monitoring system, and requires low operation manpower • Integrated with ATLAS Distributed Data Management (DDM) system

  14. Panda System Overview DQ2 Panda server prod job LFC ProdDB bamboo logger pull NDGF https pilot analysis job OSG job submit pilot https arc submit aCT EGEE submit pilot condor-g autopyfactory End-user Worker Nodes

  15. Ownership Issue • GK maps each job to individual UNIX ID • In traditional model, each user sends job to GK possible to know who runs a process • In pilot models, pilot factory sends pilots to GK impossible to distinguish processes using UID. Note that each role is mapped to a different UID and thus it is possible to distinguish role-ed users from end-users • Separation between physical/logical layers is popular • Virtualization (e.g., cloud,LVM,…) • But conflicts with a “policy” • WLCG is going to introduce glexec which changes UID on WN • Each site admin will be able to see who runs a process without peeking logical layer • File ownership is unrelated to UID since SRM itself sets owner using proxy • Only proxy delegation is required (glexec requires proxy delegation)

  16. panda-client • Tools to submit or manage analysis jobs on Panda • Five tools • pathena • Athena jobs • prun • General jobs (ROOT,python,sh,exe,…) • pbook • Bookkeeping • psequencer • Analysis chain (e.g., submit job + download output) • puserinfo • Access control

  17. pathena (1/2) • To submit Athena jobs to Panda • A simple command line tool, but contains advanced capabilities for more complex needs • Provides a consistent interface to users who are familiar with Athena $ athena jobO.py  $ pathena jobO.py -–inDS inputDatasetName -–outDS outputDatasetName

  18. pathena (2/2) • What pathena does • Extract job configuration by running Athena with fake application manager • Collect source/jobO files in local working area • Assign the job to a site where • Athena version is available • Input datasets is available • CPUs are free • Prepare one buildJob to compile source files, and one or many runAthena jobs to run Athena • Send them to Panda

  19. What happens when job is submitted (1/2) buildJob x 1 runAthena x N Single Job = Local Remote pathena submit compile Storage sources buildJob binaries dq2 download trigger binaries runAthena outputs inputs runAthena outputs inputs Automatically split output dataset input dataset

  20. What happens when job is submitted (2/2) • Why buildJob is required? • Platform (OS,CPU-architecture) may be different between local and remote • Sl5/64bit binaries cannot run on SL4/32bit • Athena creates some absolute links in InstallArea, i.e., generally not relocatable • The total time of (buildJob + N x runAthena) is shorter than N x (buildJob+runAthena) • Use CPUs more efficiently • buildJob can be skipped using an option if you know the step is not required

  21. prun • To submit General jobs to Panda • ROOT (ARA), Python, shell script,exe … • Two-staged Analysis Model of ATLAS • Run Athena on AOD/ESD to produce DPD  pathena • Run ROOT or something on DPD to produce final plots  prun • In principle you can do anything, but please avoid careless network operations unless you know well about scalability of those operations • svn co, wget, lcg-cp … • A single job is split to many sub-jobs running in parallel which can easily break remote servers

  22. pbook • Bookkeeping of Panda jobs • Browsing • Kill • Retry • Make local sqlite3 repository to keep personal job information • IMAP like sync-diff mechanism • Not scanning global Panda repository  quick response • Dual user interface • Command-line • Graphical

  23. GangaPanda • Plug-in to access PANDA def upload (file,backendType): if backendType==EGEE: egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF: ndgfModule.upload(file) elif backendType==PANDA pandaModule.upload(file) • All ATLAS backends will be consolidated to PANDA • Other backends are still maintained for some reason

  24. Links • User support hn-atlas-dist-analysis-help@cern.ch • Bug report Savannah • Documentations panda-client package pathena prun Pbook ganga

More Related