110 likes | 226 Views
This document details the CDF Run II Data Catalog and associated data access modules designed for efficient data handling in high-energy physics experiments. It covers essential components such as storage management, data access hierarchy, logical data selection, and error recovery mechanisms, with an emphasis on seamless integration and performance enhancement. Key features include a C++ API, distributed access capabilities, and robust file catalog management techniques. This comprehensive framework aims to facilitate data storage, retrieval, and processing, ensuring reliability and efficiency in scientific data management.
E N D
The CDF Run II Data Catalog and Data Access Modules P. Calafiura, J. Kowalkowski, S. Lammel, M. Lancaster, F. Ratnikov, E. Sexton-Kennedy, I. Sfiligoi, T. Watts, E. Wicklund
Data Handling Software Components Storage Management • S. Lammel - C 366 Data Management
Data Access Hierarchy • Data view • Dataset • Run Section • Storage view • (Tape) Stream • Fileset/Partition • File
Reading Data Transparent Storage Management Logical Data Selection
Writing Data Temporary disk space management Fileset Creation Log progress
The File Catalog • Locate file(set)s belonging to a dataset from • a time range • a run range • applying quality cuts, … • Log output files and filesets info • Maintain tape management info • Log job progress (error recovery, checkpoint-restart) • C++ API • Command-line and web based tools • Distributed access
The File Catalog Clients DFC DBManager Data Logger Offline Farm L3 Farm Reader Writer Filtered Data Data Logger Raw Data Writer Oracle MSQL
The DBManager Package • J. Kowalkowski C236 Poster • DBMS-independent C++ API (calibration,geometry,DFC) • type-safe mapping table rows transient C++ objects • smart pointers • lazy instantiation • caching • update pointer when new key notified • pluggable factory to select DBMS at run time • code generator • provide binding (Oracle, MSQL, JDBC, text) for predefined queries • java-based table description language
Data Handling Input Module • Module of the Babar/CDF AC++ framework • Invisible to users • Select relevant filesets in a logical fashion • Iterate over them • stage ahead • out-of-order • Mantain state of request for error recovery
Data Handling Output Module In Out • AC++ Module • close files at target size but • aligned to run section boundaries (keep events from a section together) • Log output files info into catalog • Commit blocks of completed files to the DIM
Data Logger Offline Farm L3 Farm Filtered Data Data Logger Raw Data Status and Outlook • Defined Interfaces between all components • All components have at least a prototype implementation • Successful system integration for Mock Data Challenge 1 • T. Watts C 268 (tomorrow) • Improve performance and reliability