1 / 27

Advanced Grid Technologies in ATLAS Data Management

Advanced Grid Technologies in ATLAS Data Management. Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium on Nuclear Electronics & Computing Varna, Bulgaria 15-20 September 2003. ATLAS Software Overview. ATLAS computing challenge

berthakelly
Download Presentation

Advanced Grid Technologies in ATLAS Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium on Nuclear Electronics & Computing Varna, Bulgaria 15-20 September 2003

  2. ATLAS Software Overview • ATLAS computing challenge • Core software domains • Data management architecture • Grid technologies deployed • DC1 production experience Alexandre Vaniachine

  3. ATLAS Computing Challenge • Our event size: 1-1.5 MB • After on-line selection events will be written to permanent storage at a rate of 100-200 Hz • Raw data:  1 PB/year • With reconstructed and simulated data the total is ~10 PB/year • ATLAS depends on computing as much as it depends on the trigger or the hadron calorimeter • These data start coming at the full rate at the end of 2006 Alexandre Vaniachine

  4. Planetary Computing Model • The problem of the larger and more distributed collaboration • >2000 collaborators • 151 institutions • 34 countries • The decision that CERN will supply only a fraction of the computing with the rest supplied by collaborators • The RESULT of the unprecedented data sizes and the distributed nature of physicists and computing is the need for multiple advances in computing tools • Computing infrastructure, which was centralized in the past, now will be distributed • (For experiments the trend is reverse) Alexandre Vaniachine

  5. Software Framework: Athena • The backbone of ATLAS Computing Model data flow Athena features: • Common code base with Gaudi framework (LHCb) • Separation of data and algorithms • Memory management • Transient/ persistent data split Alexandre Vaniachine

  6. Core Computing Domains Separation of transient and persistent datain ATLAS software architecture determinesthree corecomputingdomains • Scalable solutions for data persistency • Software framework for data processing algorithms • Grid computing for data processing and analysis • My presentation will focus on advances in computing technologies integrating Grid Computing and Data Management – two core software domains providing foundation for ATLAS Software Framework Alexandre Vaniachine

  7. Interfacing Athena to the Grid GANGA: Gaudi/Athena aNd Grid Alliance GRID Services Job: • configuration • monitoring • scheduling Resource: • estimation • booking Virtual Data, Algorithms Histograms, Monitoring Results Athena/GAUDI Application Alexandre Vaniachine

  8. Just Extract Extract & Transform Site 1 Transport, Transform & Install Transport & Install Site 2 Site 3 ATLAS Database Architecture • Independent on persistency technology • Described in ATLAS Database Architecture document Ready for Grid Integration Alexandre Vaniachine

  9. Technology Independence • Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split) • Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else • The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent representations • Integrated operation of framework & data management domains demonstrated the capability of • reading the same data from different frameworks • switching between persistency technologies: • Objectivity DB & ROOT I/O persistency in ATLAS DC0 • ATLAS-specific temporary solution (AthenaROOT) in DC1 • An important milestone towards DC2 has been achieved recently: • the LHC-wide hybrid ROOT-based persistency technology POOL for DC2 delivered in the latest ATLAS software release 7.0.0 (AthenaPOOL) Alexandre Vaniachine

  10. LHC Common Persistence Infrastructure (POOL) • During the past year a new effort emerged – the LHC-wide Computing Grid Project (LCG) • The LCG's Requirements Technical Assessment Group (RTAG) on persistencerecommended a common infrastructure: • an object streaming layer based upon ROOT • and a relational database layer for file management and higher-level services • Based on RTAG recommendations a common development project was launched:POOL • ATLAS is committed to this effort and adopted POOL technology • To be clear: the common project infrastructure that POOL will provide is our baseline event store technology Alexandre Vaniachine

  11. ATLAS Data Challenges • In a recent world-wide collaborative effort - Data Challenge 1 (DC1) - spanning over 56 prototype tier centers in 21 countries on four continents, ATLAS produced more than60 TB of data for physics studies • DC1 provided a testbed for integration and testing of advanced Grid computing components in a production environment Alexandre Vaniachine

  12. DC1 Production on the Grid A significant fraction of DC1 data produced: • NorduGrid • US ATLAS Grid Testbed DC1 jobs successfully tested: • EDG • Grid3 (US ATLAS, US CMS, LIGO, SDSS sites) Alexandre Vaniachine

  13. Innovative Technologies • Several novel Grid technologies were used in ATLAS data production and data management for the first time. My presentation will describe new Grid technologies introduced in HEP production environment: • Chimera Virtual Data System automating data derivation • Virtual Data Cookbook services managing templated production recipes • efficient Grid certificate authorization technologies for virtual data access control • virtual database services delivery for reconstruction on Grid clusters behind closed firewalls Alexandre Vaniachine

  14. Centralized Management • For efficiency of the large production tasks distributed worldwide, it is essential to establish shared production management tools • To complete the data management architecturefor distributed production ATLAS prototyped Virtual Data services • The ATLAS Metadata Catalogue AMI and the Replica Catalogue MAGDA exemplify such Grid tools deployed in DC1 Alexandre Vaniachine

  15. MAGDA Architecture Replica Catalogue MAGDA: MAnager for Grid-based DAta Alexandre Vaniachine

  16. AMI Architecture Metadata Catalogue AMI: ATLAS Metadata Interface Alexandre Vaniachine

  17. Introducing Virtual Data • The prevailing views in HEP Computing have been data-centric: we need to produce the data (ASAP), with the production recipes being just some tools that were used in the process by the “production gurus”. The value of the production recipes has not been fully appreciated. • Preparation of recipes for data production requires significant efforts and encapsulates a considerable experts’ knowledge • Because the production recipes have to be fully validated their development is an iterative time-consuming process similar to the fundamental knowledge discovery • The GriPhyN project (www.griphyn.org) introduced a different perspective: recipes are as valuable as the data • If you have the recipes you may not even need the data: you can reproduce the data ‘on-demand’ Alexandre Vaniachine

  18. VDC Architecture Alexandre Vaniachine

  19. Virtual Data in DC1 Production • To deliver scalable data management solution ATLASimplemented innovative Computing Science concepts in practice: first use of Virtual Data technologies in DC1 production • Two concepts are implemented in ATLASVirtual Data System operation: • Production workflow became computerized • Acyclic data dependencies tracking using GriPhyN and iVDGL software • Providing Data Provenance Services • first use of Chimera Virtual Data system in production • Production recipes became templetized • Templated recipes repository: Cookbook • Providing Data Providence* Services • about a half of more than two hundred DC1 datasets were serviced * prov·i·dencen. 1. Care or preparation in advance; foresight, The American Heritage Dictionary of the English Language Alexandre Vaniachine

  20. Acyclic Portion of DC1 Workflow • Chimera Virtual Data system eliminates ‘manual’ tracking of the data dependencies between independent production steps & enables multi-step compound data transformations on-demand Atlfast.root Athena Atlfast recon.root Atlfast recon filtering.ntuple HepMC.root atlsim pileup digis.root Athena recon recon.root atlsim digis.zebra Athena Generators Athena conversion Athena QA geometry.zebra Athena QA geometry.root QA.ntuple QA.ntuple • Feedback loop introduced in ATLAS by physics validation is omitted Alexandre Vaniachine

  21. Chimera in DC1 Reconstruction • Installed ATLAS releases 6.0.2+ (Pacman cache) on select US ATLAS testbed sites • 2x520 partitions of DataSet 2001 (lumi10) have been reconstructed at JAZZ-cluster (Argonne), LBNL, IU and BU, BNL (test) • 2x520 Chimera derivations, ~200,000 events reconstructed • Submit hosts - LBNL; others: Argonne, UC, IU • RLS-servers at the University of Chicago and BNL • Storage host and Magda cache at BNL • Group-level Magda registration of output • Output transferred to BNL and CERN/Castor Alexandre Vaniachine

  22. Uncharted OGSA Area • Database services on the grid is an uncharted OGSA area • At CHEP’03 MySQL emerged as the most popular database • Interest in X509 authorization capabilities of MySQL was prompted by Doug Olson announcement to PPDG mailing list • Numerous e-mail exchanges and discussions with interested PPDG participants on grid-enabling MySQL Grid example by Kate Keahey Alexandre Vaniachine

  23. Database Access on the Grid Different security models • A separate server does the grid authorization: • Spitfire (EDG WP2) – SOAP/XML text-only data transport • DAI (IBM UK) – Spitfire technologies + XML binary extensions • Perl DBI database proxy (ALICE) – SQL data transport • Oracle 10g (separate authorization layer) • Authorization is integrated in database server: • on a higher level: GSS API (work by Richard Casella, BNL) • on a lower level: certificate verification (my current work) Alexandre Vaniachine

  24. Grid-enabling MySQL • Tested MySQL X509 certificate authorization technology • validated with DOE, CERN and Nordugrid certificates • potential problem with host certificates issued at CERN • Developed solutions for MySQL security problems • adopted in MySQL 4.0.13 • Increased MySQL AB awareness of the grid computing needs • Set up grid-enabled server prototype for ATLAS • used in ATLAS Data Challenge 1 production forChimera-basedreconstruction Alexandre Vaniachine

  25. Production Experience • Collected production experience with grid security model: • need to expand backward compatibility of grid proxy tools • need to add the server purpose to grid host certificates • need to initiate the grid proxy upon login (similar to AFS token) • need for shared grid certificates • similar to privileged accounts traditionally shared in HENP computing for production, librarian, data management and database administration tasks • More information was presented at • PPDG (All-hands meeting) • Grid3 (production experience reported) Alexandre Vaniachine

  26. Coherent Approach Extract-Transport-Install • MySQL simplified the delivery of the extract-transport-install components of ATLAS database architecture to provide database services needed for the DC1 reconstruction on sites with Grid Compute Elements behind closed firewalls (e.g., NorduGrid) Extract & Transport Main Server Transport & Install Replica Servers Alexandre Vaniachine

  27. Roadmap to Success • ATLAS computing is steadily progressing towards a highly functional software suite, plus a World Wide computing model • During the past year, Data Challenges have provided both an impetus and a testbed for bringing coherence to developments in all core software domains • Several advanced Grid Computing technologies were successfully tested and deployed in ATLAS Data Challenge 1 production environment Alexandre Vaniachine

More Related