Enhancing GRID Job Efficiency with Distributed POOL File Access
This meeting overview by Elizabeth Gallas outlines the implementation of Frontier/Squid servers to optimize GRID job performance by managing Conditions data from Oracle and POOL files. With a focus on reducing Oracle load and network latency, the setup allows GRID sites to efficiently access necessary files. The document discusses configurations, space tokens, and methodologies for generating POOL File Catalogs (PFCs) on different tier systems. This initiative aims to ensure successful job submissions across the GRID by enhancing local storage and access to vital data.
Enhancing GRID Job Efficiency with Distributed POOL File Access
E N D
Presentation Transcript
Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting
Overview • ATLAS relies on the Grid for processing many types of jobs. • Jobs need Conditions data from Oracle + referenced POOL files. • ATLAS has decided to deploy an array of Frontier/Squid servers to • negotiate transactions between grid jobs and the Oracle DB. • reduce the load on Oracle • reduce latency observed connecting to Oracle over the WAN. • With Frontier: • Inline Conditions via Squid cache –> Frontier Server -> Oracle • Referenced Conditions data is in POOL files (always < 2GB) • which are manageable on all systems. • FOCUS TODAY on how GRID JOBS find the POOL files. • All sites accepting jobs on the grid must have: • all the POOL files and a • PFC (POOL File Catalog) – xml file w/POOL file locations at the site • Job success on the GRID requires • GRID submission system must know how sites are configured. • GRID sites configured with site appropriate env and Squid failover* Elizabeth Gallas
DB Access Software Components Elizabeth Gallas
Where are the POOL files ? • DQ2(DDM) - distributes Event data files and Conditions POOL files. • TWiki: StorageSetUp for T0, T1's and T2's • ADC/DDM maintains ToA sites (Tiers of ATLAS) • ToA sites are subscribed to receive DQ2 POOL files • ToA sites have "space tokens" (areas for file destinations) such as: • “DATADISK" for real event data • “MCDISK" area for simulated event data • … • “HOTDISK" area for holding POOL files needed by many jobs • has more robust hardware for more intense access • Some sites also use Charles Waldman's "pcache": • Duplicates files to a scratchdisk accessible to local jobs • avoiding network access to "hotdisk". • Magic in pcache tells the job to look in the scratchdisk first. • Are POOL files deployed to all ToA sites 'on the GRID' ? • Tier-1 ? Tier-2 ? bigger Tier-3s ? • Any other sites that want to use them ? Are these sites in ToA ? Elizabeth Gallas
Email from Stephane Jezequel (Sept 15) • Could you please forward this request to all ATLAS Grid sites which are included in DDM: • As discussed during the ATLAS software week, sites are requested to implement the space token ATLASHOTDISK. • More information: • https://twiki.cern.ch/twiki/bin/view/Atlas/StorageSetUp#The_ATLASHOTDISK_space_token • Sites should assign at least 1 TB to this space token (should foresee 5 TB). In case of storage crisis at the site, the 1 TB can be reduced to 0.5 TB. Because of the special usage of these files, sites should decide to assign a specific pool or not. • When it is done, please report to DDM Ops (Savannah ticket is a good solution) to create the new DDM site. Elizabeth Gallas
Where are the PFCs (POOL File catalogs)? • Mario Lassnig - modified DQ2 client dq2-ls • Can ‘on the fly’ create the PFC for the POOL files on a system • written to work for "SRM systems“ (generally Tier-1s) • Non-SRM systems (generally Tier-2,3) • this PFC file must be modified: replace SRM specific descriptors • We need to collectively agree on the best method and designate who will follow it up • Scriptable way to remove SRM descriptors from PFC for use on non-SRM systems. • Cron? • Detection of new POOL file arrival • Generate updated PFC • Run above script preparing file for local use Elizabeth Gallas
Configuring jobs on the GRID Item 5 from Dario’s TOB Action items: DB and ADC groups: discuss and implement a way to set the environment on each site so as to point to the nearest Squid and the local POOL file catalogue • Grid submission system must know which sites have • Squid access to Conditions data • Site specific ? Failover • Experience at Michigan with muon calibration: Frontier / Squid access to multiple Squid servers • Subscriptions in place to insure POOL files are in place and PFC location (?) • Site specific – continuous updates to local PFC • Manual setup for now in Ganga/Panda, • will move to AGIS with configuration file on each site. Link to AGIS Technical Design Proposal: • http://indico.cern.ch/getFile.py/access?sessionId=4&resId=1&materialId=7&confId=50976 Elizabeth Gallas
BACKUP Elizabeth Gallas
Features of Athena: • Previous to Release 15.4: • Athena (RH) looks at IP the job is running at, • uses dblookup.xml in the release to decide the order of database connections to try to get the Conditions data. • Release 15.4 • Athena looks for Frontier environment variable, • if found, ignores the dblookup • using instead another env Elizabeth Gallas