1 / 3

File catalogues

File catalogues. Current implementation and usage: Replica tables inside the LHCb bookkeeping DB (~metadata) AliEn (2003 brand) XML-RPC interface identical for both Both are populated in parallel during production

aren
Download Presentation

File catalogues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File catalogues • Current implementation and usage: • Replica tables inside the LHCb bookkeeping DB (~metadata) • AliEn (2003 brand) • XML-RPC interface identical for both • Both are populated in parallel during production • Used by Dirac (default AliEn, as better performing for queries) for job placement • Also used for ancillary files (log files…) • Current investigations • Investigating LFC possibilities (implementing the same interface to it) - Status: being done • Populate LFC with current files (~1 million files) • Stress test LFC • Note: FC should be used for all files (logfiles, distribution kits…) Philippe Charpentier

  2. Datasets • Possibility of identifying data files with a “dataset name” • For data management (“make a dataset available on site X”) • For analysis (specify only the dataset as parameter of the job) • Question: is it the level where the location index should sit? • How to deal with partial datasets? • What if a dataset cannot fit on a single SE? • How does the WMS know about datasets? Job placement, job splitting… • Simplest method (is it sufficient?) • Use a hierarchical LFN namespace • Use directories as datasets + symlinks for grouping files Philippe Charpentier

  3. Metadata catalogue • No need for metadata in the FC • Except regular file metadata (size, date of creation, ACLs if needed…) • Metadata for LHCb is more “provenance” (BKDB) • Queries on event types and file types • Currently - queries on replica location (should use FC?) • Possibility to select the origin of the data (program versions, history…) • BKDB also contains information at the job level (for each step in the job) • Split from the replica tables (that could be removed) • Link between BKDB and FC via LFN or GUID • Being implemented (very promising) using the ARDA scheme Philippe Charpentier

More Related