Introduction distributed pool file access
1 / 9

Introduction: Distributed POOL File Access - PowerPoint PPT Presentation

  • Uploaded on

Introduction: Distributed POOL File Access. Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting. Overview. ATLAS relies on the Grid for processing many types of jobs. Jobs need Conditions data from Oracle + referenced POOL files.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Introduction: Distributed POOL File Access' - mab

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction distributed pool file access

Introduction: Distributed POOL File Access

Elizabeth Gallas - Oxford –

September 16, 2009

Offline Database Meeting


  • ATLAS relies on the Grid for processing many types of jobs.

    • Jobs need Conditions data from Oracle + referenced POOL files.

  • ATLAS has decided to deploy an array of Frontier/Squid servers to

    • negotiate transactions between grid jobs and the Oracle DB.

      • reduce the load on Oracle

      • reduce latency observed connecting to Oracle over the WAN.

  • With Frontier:

    • Inline Conditions via Squid cache –> Frontier Server -> Oracle

    • Referenced Conditions data is in POOL files (always < 2GB)

      • which are manageable on all systems.

  • FOCUS TODAY on how GRID JOBS find the POOL files.

  • All sites accepting jobs on the grid must have:

    • all the POOL files and a

    • PFC (POOL File Catalog) – xml file w/POOL file locations at the site

  • Job success on the GRID requires

    • GRID submission system must know how sites are configured.

    • GRID sites configured with site appropriate env and Squid failover*

Elizabeth Gallas

Where are the pool files
Where are the POOL files ?

  • DQ2(DDM) - distributes Event data files and Conditions POOL files.

  • TWiki: StorageSetUp for T0, T1's and T2's

  • ADC/DDM maintains ToA sites (Tiers of ATLAS)

    • ToA sites are subscribed to receive DQ2 POOL files

    • ToA sites have "space tokens" (areas for file destinations) such as:

      • “DATADISK" for real event data

      • “MCDISK" area for simulated event data

      • “HOTDISK" area for holding POOL files needed by many jobs

        • has more robust hardware for more intense access

  • Some sites also use Charles Waldman's "pcache":

    • Duplicates files to a scratchdisk accessible to local jobs

      • avoiding network access to "hotdisk".

      • Magic in pcache tells the job to look in the scratchdisk first.

  • Are POOL files deployed to all ToA sites 'on the GRID' ?

    • Tier-1 ? Tier-2 ? bigger Tier-3s ?

    • Any other sites that want to use them ? Are these sites in ToA ?

Elizabeth Gallas

Email from stephane jezequel sept 15
Email from Stephane Jezequel (Sept 15)

  • Could you please forward this request to all ATLAS Grid sites which are included in DDM:

  • As discussed during the ATLAS software week, sites are requested to implement the space token ATLASHOTDISK.

  • More information:


  • Sites should assign at least 1 TB to this space token (should foresee 5 TB). In case of storage crisis at the site, the 1 TB can be reduced to 0.5 TB. Because of the special usage of these files, sites should decide to assign a specific pool or not.

  • When it is done, please report to DDM Ops (Savannah ticket is a good solution) to create the new DDM site.

Elizabeth Gallas

Where are the pfcs pool file catalogs
Where are the PFCs (POOL File catalogs)?

  • Mario Lassnig - modified DQ2 client dq2-ls

    • Can ‘on the fly’ create the PFC for the POOL files on a system

    • written to work for "SRM systems“ (generally Tier-1s)

    • Non-SRM systems (generally Tier-2,3)

      • this PFC file must be modified: replace SRM specific descriptors

  • We need to collectively agree on the best method and designate who will follow it up

    • Scriptable way to remove SRM descriptors from PFC for use on non-SRM systems.

    • Cron?

      • Detection of new POOL file arrival

      • Generate updated PFC

      • Run above script preparing file for local use

Elizabeth Gallas

Configuring jobs on the grid
Configuring jobs on the GRID

Item 5 from Dario’s TOB Action items:

DB and ADC groups: discuss and implement a way to set the environment on each site so as to point to the nearest Squid and the local POOL file catalogue

  • Grid submission system must know which sites have

    • Squid access to Conditions data

      • Site specific ? Failover

        • Experience at Michigan with muon calibration: Frontier / Squid access to multiple Squid servers

    • Subscriptions in place to insure POOL files are in place and PFC location (?)

      • Site specific – continuous updates to local PFC

  • Manual setup for now in Ganga/Panda,

    • will move to AGIS with configuration file on each site.

      Link to AGIS Technical Design Proposal:


Elizabeth Gallas


Elizabeth Gallas

Features of athena
Features of Athena:

  • Previous to Release 15.4:

    • Athena (RH) looks at IP the job is running at,

      • uses dblookup.xml in the release to decide the order of database connections to try to get the Conditions data.

  • Release 15.4

    • Athena looks for Frontier environment variable,

      • if found, ignores the dblookup

        • using instead another env

Elizabeth Gallas