introduction distributed pool file access
Download
Skip this Video
Download Presentation
Introduction: Distributed POOL File Access

Loading in 2 Seconds...

play fullscreen
1 / 9

Introduction: Distributed POOL File Access - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Introduction: Distributed POOL File Access. Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting. Overview. ATLAS relies on the Grid for processing many types of jobs. Jobs need Conditions data from Oracle + referenced POOL files.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction: Distributed POOL File Access' - mab


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction distributed pool file access

Introduction: Distributed POOL File Access

Elizabeth Gallas - Oxford –

September 16, 2009

Offline Database Meeting

overview
Overview
  • ATLAS relies on the Grid for processing many types of jobs.
    • Jobs need Conditions data from Oracle + referenced POOL files.
  • ATLAS has decided to deploy an array of Frontier/Squid servers to
    • negotiate transactions between grid jobs and the Oracle DB.
      • reduce the load on Oracle
      • reduce latency observed connecting to Oracle over the WAN.
  • With Frontier:
    • Inline Conditions via Squid cache –> Frontier Server -> Oracle
    • Referenced Conditions data is in POOL files (always < 2GB)
      • which are manageable on all systems.
  • FOCUS TODAY on how GRID JOBS find the POOL files.
  • All sites accepting jobs on the grid must have:
    • all the POOL files and a
    • PFC (POOL File Catalog) – xml file w/POOL file locations at the site
  • Job success on the GRID requires
    • GRID submission system must know how sites are configured.
    • GRID sites configured with site appropriate env and Squid failover*

Elizabeth Gallas

where are the pool files
Where are the POOL files ?
  • DQ2(DDM) - distributes Event data files and Conditions POOL files.
  • TWiki: StorageSetUp for T0, T1\'s and T2\'s
  • ADC/DDM maintains ToA sites (Tiers of ATLAS)
    • ToA sites are subscribed to receive DQ2 POOL files
    • ToA sites have "space tokens" (areas for file destinations) such as:
      • “DATADISK" for real event data
      • “MCDISK" area for simulated event data
      • “HOTDISK" area for holding POOL files needed by many jobs
        • has more robust hardware for more intense access
  • Some sites also use Charles Waldman\'s "pcache":
    • Duplicates files to a scratchdisk accessible to local jobs
      • avoiding network access to "hotdisk".
      • Magic in pcache tells the job to look in the scratchdisk first.
  • Are POOL files deployed to all ToA sites \'on the GRID\' ?
    • Tier-1 ? Tier-2 ? bigger Tier-3s ?
    • Any other sites that want to use them ? Are these sites in ToA ?

Elizabeth Gallas

email from stephane jezequel sept 15
Email from Stephane Jezequel (Sept 15)
  • Could you please forward this request to all ATLAS Grid sites which are included in DDM:
  • As discussed during the ATLAS software week, sites are requested to implement the space token ATLASHOTDISK.
  • More information:
    • https://twiki.cern.ch/twiki/bin/view/Atlas/StorageSetUp#The_ATLASHOTDISK_space_token
  • Sites should assign at least 1 TB to this space token (should foresee 5 TB). In case of storage crisis at the site, the 1 TB can be reduced to 0.5 TB. Because of the special usage of these files, sites should decide to assign a specific pool or not.
  • When it is done, please report to DDM Ops (Savannah ticket is a good solution) to create the new DDM site.

Elizabeth Gallas

where are the pfcs pool file catalogs
Where are the PFCs (POOL File catalogs)?
  • Mario Lassnig - modified DQ2 client dq2-ls
    • Can ‘on the fly’ create the PFC for the POOL files on a system
    • written to work for "SRM systems“ (generally Tier-1s)
    • Non-SRM systems (generally Tier-2,3)
      • this PFC file must be modified: replace SRM specific descriptors
  • We need to collectively agree on the best method and designate who will follow it up
    • Scriptable way to remove SRM descriptors from PFC for use on non-SRM systems.
    • Cron?
      • Detection of new POOL file arrival
      • Generate updated PFC
      • Run above script preparing file for local use

Elizabeth Gallas

configuring jobs on the grid
Configuring jobs on the GRID

Item 5 from Dario’s TOB Action items:

DB and ADC groups: discuss and implement a way to set the environment on each site so as to point to the nearest Squid and the local POOL file catalogue

  • Grid submission system must know which sites have
    • Squid access to Conditions data
      • Site specific ? Failover
        • Experience at Michigan with muon calibration: Frontier / Squid access to multiple Squid servers
    • Subscriptions in place to insure POOL files are in place and PFC location (?)
      • Site specific – continuous updates to local PFC
  • Manual setup for now in Ganga/Panda,
    • will move to AGIS with configuration file on each site.

Link to AGIS Technical Design Proposal:

  • http://indico.cern.ch/getFile.py/access?sessionId=4&resId=1&materialId=7&confId=50976

Elizabeth Gallas

slide8
BACKUP

Elizabeth Gallas

features of athena
Features of Athena:
  • Previous to Release 15.4:
    • Athena (RH) looks at IP the job is running at,
      • uses dblookup.xml in the release to decide the order of database connections to try to get the Conditions data.
  • Release 15.4
    • Athena looks for Frontier environment variable,
      • if found, ignores the dblookup
        • using instead another env

Elizabeth Gallas

ad