The sequential access model for run ii data management and delivery
1 / 20

The Sequential Access Model for Run II Data Management and Delivery - PowerPoint PPT Presentation

  • Uploaded on

The Sequential Access Model for Run II Data Management and Delivery. Lee Lueking , Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: CHEP98 Sept. 3, 1998.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Sequential Access Model for Run II Data Management and Delivery' - darva

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The sequential access model for run ii data management and delivery

The Sequential Access Model for Run II Data Management and Delivery

Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White.



Sept. 3, 1998

What is the sequential access model sam
What is The Sequential Access Model: SAM? Delivery

  • Sequential events: Data is stored in files as sequential events.

  • Data Tiers: Each event is stored in each of several data tiers.

    • The Event Data Unit (EDU) is the unit of data stored in each tier.

    • Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera.

  • Physical streaming (clustering): Data categories based on Trigger or reconstruction information

  • Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.

Data organization
Data Organization Delivery

User and

physics group

(derived) data

File & Event Database






Physical Clustering

How do i access data
How Do I Access Data? Delivery

  • Pipelines: Data access channels tailored for particular processing and analysis patterns.

  • Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk.

  • Example access modes:

    • Database:Access to event, trigger & other FEDB info.

    • Thumbnail: Disk resident sketch of each event.

    • Freight Train: Large data stream file server.

    • Event Picking: Random event selection from any data tier.

    • Small Data-set:One or a few files from any data tier.

Data access
Data Access Delivery

Mass Storage





Freight Train

Pick Event

User File

=Group of Users

=Data flow


=Disk Storage

=Tape Storage



=Single User



D0 specifications
D0 Specifications Delivery

  • Data sizes

  • Further details

    • 10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information.

    • 10% warm (tape or disk) caches of Raw and Medium EDU data.

    • Possible on-demand reconstruction.

Exclusive streaming
Exclusive Streaming Delivery

See Talk #182: Heidi Schellman, “Assurance of Data

Integrity in a Petabyte Data Sample”

Data handling system
Data Handling System Delivery

Buffer and Cache

Sam design details
SAM Design Details Delivery

  • Network distributed.

  • Easily scalable.

  • Works for all access modes.

  • Uses CORBA interfaces between modules.

  • Modules being written in JAVA, Python and C++.

  • File, Event and Processing Database uses ORACLE 8.

  • Not tightly coupled to:

    • Tape Mass Storage System.

    • CPU availability or Batch processing facilities on Farm or Analysis machines.

    • The D0 event data model.

Main components
Main Components Delivery

  • File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” )

  • Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities.

  • Station: Management for a set of processing resources, including buffer and Data I/O.

  • Project Master: Responsible for managing projects which are lists of files to process.

  • Consumer/producer: Actual data processing

  • GUI and API user interfaces: Allow users to access data and administrators to control the system.

Components of sam
Components of SAM Delivery



User & Admin.


(API and GUI)



Station F



Station A

Station E



Project Master

DB and



Mass Storage




Global Optimizer

Station D

Station B

Station C

File and event database
File and Event Database Delivery



Data Tier



Event Number

Trigger L1

Trigger L2

Trigger L3

Off-line Filter







# Events


Data Stream








Mass storage system needs
(Mass Storage System Needs) Delivery

  • Provide access to data through file-level semantics.

  • Manage all tape activity within the ATL(S) and to/from shelf.

  • Allow data to be physically clustered in tape groupings or “file families”.

  • A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities.

  • System must optimize the use of resources such as arm time and tape mounts.

  • Retry and fail-over features for failed tape read/write activities.

  • Open tape format to allow removal of tapes and exchange of data with other sites.

  • Reliable and unattended operation.

See ENSTORE presentation #126: Don Patravic, “ENSTORE - An Alternative

Data Storage System”

Access to data through sam
Access to Data through SAM Delivery

  • User or group defines a “project” by sending a list of constraints or file list to the Database Server.

  • DB Server returns a summary of the project (number of files, size and availability).

  • User is provided a list of possible “stations” where the project might run. He chooses one.

  • User registers with the station for a given (new or existing) project. He is given a unique “key” to use.

  • User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.

Sam prototype
SAM Prototype Delivery

  • Status: Being built, ready early October.

  • Goals:

    • Populate and exercise the SAM database.

    • Specify projects - data to be accessed for processing or analysis.

    • Attach to a ‘Station’ which makes files for that Project accessible.

    • Interface to ENSTORE - get/put files - using SAM “Global Optimizer”.

    • Build Analysis programs using D0 framework.

    • Demonstrate multiple Stations, Projects, Analysis consumers .

  • Testing: Further testing in fall with SAM PC test-bed.

  • Beta version: Plan to make MC data available through SAM late ‘98.

Sam prototype pc test bed example configuration
SAM Prototype PC test-bed DeliveryExample configuration

Enstore Warehouse



SAM Station Servers


Main Backbone

To Database Server

Summary Delivery

  • Dzero plans to use a file based Sequential Access Model for run II data access.

  • The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB.

  • A SAM prototype is being built now and will be ready in Early October.

  • Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system.

  • We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.