The physics analysis server project phaser
1 / 13

The PHysics Analysis SERver Project (PHASER) - PowerPoint PPT Presentation

  • Uploaded on

The PHysics Analysis SERver Project (PHASER). M. Bowen, G. Landsberg, and R. Partridge* Brown University. CHEP 2000 Padova, Italy February 7-11, 2000. What is the PHASER project?. Effort to substantially increase productivity of physicists analyzing multi-TB summary data sets

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'The PHysics Analysis SERver Project (PHASER)' - lyn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The physics analysis server project phaser

The PHysics Analysis SERver Project(PHASER)

M. Bowen, G. Landsberg, and R. Partridge*

Brown University

CHEP 2000

Padova, Italy

February 7-11, 2000

What is the phaser project
What is the PHASER project?

  • Effort to substantially increase productivity of physicists analyzing multi-TB summary data sets

  • Our immediate focus is on the DØ experiment

    • 600 million data events/year starting in early 2001

    • Summary data set expected to grow at rate of 3TB/year

  • Concentrate on event selection and “ntuple” creation stage

    • transition in data handling from monolithic reconstruction processing to the much more chaotic processing of summary data by many physicisits

    • IO and CPU intensive due to need to apply latest calibration, particle ID, and event selection algorithms to several hundred million events

Richard Partridge

Phaser architecture
PHASER Architecture

  • Physics Object Database (POD) stores meta-data used by most physics analyses for their initial event selection

  • Physics Object and Particle ID tables in POD store calibrated 4-vectors, object quality variables, and results of particle ID algorithms

  • DVD storage of full summary (mDST) data set and useful subsets of larger DST and STA data sets

Richard Partridge

Phaser is phast

  • New calibrations and particle ID algorithms can be quickly incorporated

    • Only the changes need to be importd

    • Regenerating the large mDST data set will only be done infrequently

  • Storage of up-to-date calibrations and particle ID algorihtms avoids the need to re-apply these alogorithms for each event selection pass

  • Particle ID tables are small, making it possible to quickly eliminate events not having the desired set of physics objects

  • Direct access to full mDST sample on DVD allows a mDST subset to be quickly generated for advanced analyses developing new algorithms not yet in the database

Richard Partridge

The physics object database pod
The Physics Object Database (POD)

  • Stores fully calibrated meta-data associated with the various physics objects

    • leptons, photons, jets, missing ET, secondary vertices, triggers, etc.

    • for example, an electron object would have the energy, direction, and various quantities used in the electron ID algorithms stored

  • Each physics object associated with a table in a relational database

  • Primary key uniquely identifies each physics object and provides information needed to correlate physics objects from a single event

    • Currently use Run, Event, Instance (where appropriate) and row number from ntuple used to load database

    • Alternative: data source index, sequence number, and instance

Richard Partridge

Why use a relational database
Why use a Relational Database?

  • Physics objects typically have a fixed set of attributes used for event selection and analysis

  • Independence of tables aids loading, updating database

    • Data can be “bulk loaded” as long as primary key is provided in input data stream

  • Several vendors with quite capable products, large commercial market

Richard Partridge

Prototype pod
Prototype POD

  • Use DØ Run 1 data (1992 - 1996 running period)

  • 62 million events loaded into the database

  • Entire “All-Stream” data set loaded

    • Data set used by almost all DØ physics analyses

    • Only files with special processing or trigger conditions excluded

  • Column-wise ntuple format used for importing/exporting data

Richard Partridge

D run 1 pod
DØ Run 1 POD

  • Including indexes, Run 1 POD occupies ~100 GB

    • 58% physics object data

    • 18% indexes on object ET

    • 12% primary keys

    • 12% database overhead

Richard Partridge

Pod benchmarks
POD Benchmarks

  • Z  e+e- candidate event selection:

    • 7 seconds to identify ~6k events

  • W  en candidate event selection:

    • 18 seconds to identify ~86k events

  • Both benchmarks times make use of particle ID tables

  • Event selection times compare very favorably with ~1000 CPU hours required to generate ntuples used in this study

    Benchmark Hardware/Software

  • 450 MHz dual-processor Pentium II with 256 MB RAM

  • Database stored on (6) 36 GB disks in Raid 0 stripe set

  • MS SQL Server running on Windows NT 4.0

Richard Partridge

Dvd storage
DVD Storage

  • Provide access to additional event information not included in POD

  • DVD-RAM has a number of unique capabilities

    • Less expensive than disk storage, doesn’t require backup

    • Access to individual events is much faster than tape storage

  • Current disk capacity is 2.6 GB, 4.7 GB expected soon

  • Commercial DVD libraries hold up to 600 DVD disks

    • 2.8 TB capacity using 4.7 GB DVD-RAM disks

    • Average disk load time of 4.5 s, <1 hour to cycle through 600 disks

    • Up to 6 DVD-RAM drives gives ~10 MB/s IO rate

Richard Partridge

Web interface
Web Interface

  • Plan to develop web-based user interface

  • Interface modelled on “3-tier” architecture widely used in commercial applications

  • Physicist will enter event selection requirements using a Java applet

  • Applet communicates request to “Physics Intelligence” middleware running on PHASER system (via CORBA)

    • Translate request to SQL for event selection

    • Verify that request can be accommodated within resource constraints

    • Produce the requested output files

Richard Partridge

Phaser output

  • Several output options:

    • List of run and event numbers satisfying the request

    • Ntuple created from POD information

    • mDST stream containing requested events from DVD library

  • Output files will generally be small enough to transfer over the network

  • Larger output files can be written to DVD and physically sent to physicist for further analysis

Richard Partridge


  • PHASER offers a way for both experts, novices, and “dinosaurs” to quickly extract information about a particular class of events

  • Feasibility of loading “Run 1” size physics object info into a relational database has been demonstrated

  • Significant improvements in event selection time has been observed for W/Z benchmarks

  • Expect these results will scale up to Run 2 data load

  • Database technology is also potentially useful for helping manage complex analyses and storing intermediate results

Richard Partridge