The sequential access model for run ii data management and delivery
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

The Sequential Access Model for Run II Data Management and Delivery PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

The Sequential Access Model for Run II Data Management and Delivery. Lee Lueking , Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998.

Download Presentation

The Sequential Access Model for Run II Data Management and Delivery

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The sequential access model for run ii data management and delivery

The Sequential Access Model for Run II Data Management and Delivery

Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White.

URL: www-d0.fnal.gov/~lueking/sam/sequential.html.

CHEP98

Sept. 3, 1998


What is the sequential access model sam

What is The Sequential Access Model: SAM?

  • Sequential events: Data is stored in files as sequential events.

  • Data Tiers: Each event is stored in each of several data tiers.

    • The Event Data Unit (EDU) is the unit of data stored in each tier.

    • Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera.

  • Physical streaming (clustering): Data categories based on Trigger or reconstruction information

  • Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.


Data organization

Data Organization

User and

physics group

(derived) data

File & Event Database

Event

Information

Tiers

Warm

Cache

Physical Clustering


How do i access data

How Do I Access Data?

  • Pipelines: Data access channels tailored for particular processing and analysis patterns.

  • Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk.

  • Example access modes:

    • Database:Access to event, trigger & other FEDB info.

    • Thumbnail: Disk resident sketch of each event.

    • Freight Train: Large data stream file server.

    • Event Picking: Random event selection from any data tier.

    • Small Data-set:One or a few files from any data tier.


Data access

Data Access

Mass Storage

Pipeline

Consumers

File&EventDB

Thumbnail

Freight Train

Pick Event

User File

=Group of Users

=Data flow

=File

=Disk Storage

=Tape Storage

=Pipeline

Name

=Single User

=Event

File&EventDB


D0 specifications

D0 Specifications

  • Data sizes

  • Further details

    • 10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information.

    • 10% warm (tape or disk) caches of Raw and Medium EDU data.

    • Possible on-demand reconstruction.


Will sam scale to run ii

Will SAM Scale to Run II?


Exclusive streaming

Exclusive Streaming

See Talk #182: Heidi Schellman, “Assurance of Data

Integrity in a Petabyte Data Sample”


Data handling system

Data Handling System

Buffer and Cache


Sam design details

SAM Design Details

  • Network distributed.

  • Easily scalable.

  • Works for all access modes.

  • Uses CORBA interfaces between modules.

  • Modules being written in JAVA, Python and C++.

  • File, Event and Processing Database uses ORACLE 8.

  • Not tightly coupled to:

    • Tape Mass Storage System.

    • CPU availability or Batch processing facilities on Farm or Analysis machines.

    • The D0 event data model.


Main components

Main Components

  • File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” )

  • Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities.

  • Station: Management for a set of processing resources, including buffer and Data I/O.

  • Project Master: Responsible for managing projects which are lists of files to process.

  • Consumer/producer: Actual data processing

  • GUI and API user interfaces: Allow users to access data and administrators to control the system.


Components of sam

Components of SAM

Consumer/

Producer

User & Admin.

Interface

(API and GUI)

Consumer/

Producer

Station F

Consumer/

Producer

Station A

Station E

Consumer/

Producer

Project Master

DB and

Information

Servers

Mass Storage

System

Consumer/

Producer

Global Optimizer

Station D

Station B

Station C


File and event database

File and Event Database

Run

Volume

Data Tier

Events

ID

Event Number

Trigger L1

Trigger L2

Trigger L3

Off-line Filter

Thumbnail

Files

ID

Name

Format

Size

# Events

Physical

Data Stream

Trigger

Configuration

Project

Event-File

Catalog

Processing

Info


Mass storage system needs

(Mass Storage System Needs)

  • Provide access to data through file-level semantics.

  • Manage all tape activity within the ATL(S) and to/from shelf.

  • Allow data to be physically clustered in tape groupings or “file families”.

  • A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities.

  • System must optimize the use of resources such as arm time and tape mounts.

  • Retry and fail-over features for failed tape read/write activities.

  • Open tape format to allow removal of tapes and exchange of data with other sites.

  • Reliable and unattended operation.

See ENSTORE presentation #126: Don Patravic, “ENSTORE - An Alternative

Data Storage System”


Access to data through sam

Access to Data through SAM

  • User or group defines a “project” by sending a list of constraints or file list to the Database Server.

  • DB Server returns a summary of the project (number of files, size and availability).

  • User is provided a list of possible “stations” where the project might run. He chooses one.

  • User registers with the station for a given (new or existing) project. He is given a unique “key” to use.

  • User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.


Consumer read from storage

Consumer- Read from Storage


Producer write to storage

Producer - Write to Storage


Sam prototype

SAM Prototype

  • Status: Being built, ready early October.

  • Goals:

    • Populate and exercise the SAM database.

    • Specify projects - data to be accessed for processing or analysis.

    • Attach to a ‘Station’ which makes files for that Project accessible.

    • Interface to ENSTORE - get/put files - using SAM “Global Optimizer”.

    • Build Analysis programs using D0 framework.

    • Demonstrate multiple Stations, Projects, Analysis consumers .

  • Testing: Further testing in fall with SAM PC test-bed.

  • Beta version: Plan to make MC data available through SAM late ‘98.


Sam prototype pc test bed example configuration

SAM Prototype PC test-bed Example configuration

Enstore Warehouse

Network

HUB

SAM Station Servers

Consumers/Producers

Main Backbone

To Database Server


Summary

Summary

  • Dzero plans to use a file based Sequential Access Model for run II data access.

  • The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB.

  • A SAM prototype is being built now and will be ready in Early October.

  • Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system.

  • We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.


  • Login