The sequential access model for run ii data management and delivery
Sponsored Links
This presentation is the property of its rightful owner.
1 / 20

The Sequential Access Model for Run II Data Management and Delivery PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

The Sequential Access Model for Run II Data Management and Delivery. Lee Lueking , Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998.

Download Presentation

The Sequential Access Model for Run II Data Management and Delivery

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The Sequential Access Model for Run II Data Management and Delivery

Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White.

URL: www-d0.fnal.gov/~lueking/sam/sequential.html.

CHEP98

Sept. 3, 1998


What is The Sequential Access Model: SAM?

  • Sequential events: Data is stored in files as sequential events.

  • Data Tiers: Each event is stored in each of several data tiers.

    • The Event Data Unit (EDU) is the unit of data stored in each tier.

    • Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera.

  • Physical streaming (clustering): Data categories based on Trigger or reconstruction information

  • Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.


Data Organization

User and

physics group

(derived) data

File & Event Database

Event

Information

Tiers

Warm

Cache

Physical Clustering


How Do I Access Data?

  • Pipelines: Data access channels tailored for particular processing and analysis patterns.

  • Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk.

  • Example access modes:

    • Database:Access to event, trigger & other FEDB info.

    • Thumbnail: Disk resident sketch of each event.

    • Freight Train: Large data stream file server.

    • Event Picking: Random event selection from any data tier.

    • Small Data-set:One or a few files from any data tier.


Data Access

Mass Storage

Pipeline

Consumers

File&EventDB

Thumbnail

Freight Train

Pick Event

User File

=Group of Users

=Data flow

=File

=Disk Storage

=Tape Storage

=Pipeline

Name

=Single User

=Event

File&EventDB


D0 Specifications

  • Data sizes

  • Further details

    • 10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information.

    • 10% warm (tape or disk) caches of Raw and Medium EDU data.

    • Possible on-demand reconstruction.


Will SAM Scale to Run II?


Exclusive Streaming

See Talk #182: Heidi Schellman, “Assurance of Data

Integrity in a Petabyte Data Sample”


Data Handling System

Buffer and Cache


SAM Design Details

  • Network distributed.

  • Easily scalable.

  • Works for all access modes.

  • Uses CORBA interfaces between modules.

  • Modules being written in JAVA, Python and C++.

  • File, Event and Processing Database uses ORACLE 8.

  • Not tightly coupled to:

    • Tape Mass Storage System.

    • CPU availability or Batch processing facilities on Farm or Analysis machines.

    • The D0 event data model.


Main Components

  • File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” )

  • Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities.

  • Station: Management for a set of processing resources, including buffer and Data I/O.

  • Project Master: Responsible for managing projects which are lists of files to process.

  • Consumer/producer: Actual data processing

  • GUI and API user interfaces: Allow users to access data and administrators to control the system.


Components of SAM

Consumer/

Producer

User & Admin.

Interface

(API and GUI)

Consumer/

Producer

Station F

Consumer/

Producer

Station A

Station E

Consumer/

Producer

Project Master

DB and

Information

Servers

Mass Storage

System

Consumer/

Producer

Global Optimizer

Station D

Station B

Station C


File and Event Database

Run

Volume

Data Tier

Events

ID

Event Number

Trigger L1

Trigger L2

Trigger L3

Off-line Filter

Thumbnail

Files

ID

Name

Format

Size

# Events

Physical

Data Stream

Trigger

Configuration

Project

Event-File

Catalog

Processing

Info


(Mass Storage System Needs)

  • Provide access to data through file-level semantics.

  • Manage all tape activity within the ATL(S) and to/from shelf.

  • Allow data to be physically clustered in tape groupings or “file families”.

  • A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities.

  • System must optimize the use of resources such as arm time and tape mounts.

  • Retry and fail-over features for failed tape read/write activities.

  • Open tape format to allow removal of tapes and exchange of data with other sites.

  • Reliable and unattended operation.

See ENSTORE presentation #126: Don Patravic, “ENSTORE - An Alternative

Data Storage System”


Access to Data through SAM

  • User or group defines a “project” by sending a list of constraints or file list to the Database Server.

  • DB Server returns a summary of the project (number of files, size and availability).

  • User is provided a list of possible “stations” where the project might run. He chooses one.

  • User registers with the station for a given (new or existing) project. He is given a unique “key” to use.

  • User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.


Consumer- Read from Storage


Producer - Write to Storage


SAM Prototype

  • Status: Being built, ready early October.

  • Goals:

    • Populate and exercise the SAM database.

    • Specify projects - data to be accessed for processing or analysis.

    • Attach to a ‘Station’ which makes files for that Project accessible.

    • Interface to ENSTORE - get/put files - using SAM “Global Optimizer”.

    • Build Analysis programs using D0 framework.

    • Demonstrate multiple Stations, Projects, Analysis consumers .

  • Testing: Further testing in fall with SAM PC test-bed.

  • Beta version: Plan to make MC data available through SAM late ‘98.


SAM Prototype PC test-bed Example configuration

Enstore Warehouse

Network

HUB

SAM Station Servers

Consumers/Producers

Main Backbone

To Database Server


Summary

  • Dzero plans to use a file based Sequential Access Model for run II data access.

  • The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB.

  • A SAM prototype is being built now and will be ready in Early October.

  • Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system.

  • We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.


  • Login