the physics analysis server project phaser
Download
Skip this Video
Download Presentation
The PHysics Analysis SERver Project (PHASER)

Loading in 2 Seconds...

play fullscreen
1 / 13

The PHysics Analysis SERver Project (PHASER) - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

The PHysics Analysis SERver Project (PHASER). M. Bowen, G. Landsberg, and R. Partridge* Brown University. CHEP 2000 Padova, Italy February 7-11, 2000. What is the PHASER project?. Effort to substantially increase productivity of physicists analyzing multi-TB summary data sets

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The PHysics Analysis SERver Project (PHASER)' - lyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the physics analysis server project phaser

The PHysics Analysis SERver Project(PHASER)

M. Bowen, G. Landsberg, and R. Partridge*

Brown University

CHEP 2000

Padova, Italy

February 7-11, 2000

what is the phaser project
What is the PHASER project?
  • Effort to substantially increase productivity of physicists analyzing multi-TB summary data sets
  • Our immediate focus is on the DØ experiment
    • 600 million data events/year starting in early 2001
    • Summary data set expected to grow at rate of 3TB/year
  • Concentrate on event selection and “ntuple” creation stage
    • transition in data handling from monolithic reconstruction processing to the much more chaotic processing of summary data by many physicisits
    • IO and CPU intensive due to need to apply latest calibration, particle ID, and event selection algorithms to several hundred million events

Richard Partridge

phaser architecture
PHASER Architecture
  • Physics Object Database (POD) stores meta-data used by most physics analyses for their initial event selection
  • Physics Object and Particle ID tables in POD store calibrated 4-vectors, object quality variables, and results of particle ID algorithms
  • DVD storage of full summary (mDST) data set and useful subsets of larger DST and STA data sets

Richard Partridge

phaser is phast
PHASER is PHast
  • New calibrations and particle ID algorithms can be quickly incorporated
    • Only the changes need to be importd
    • Regenerating the large mDST data set will only be done infrequently
  • Storage of up-to-date calibrations and particle ID algorihtms avoids the need to re-apply these alogorithms for each event selection pass
  • Particle ID tables are small, making it possible to quickly eliminate events not having the desired set of physics objects
  • Direct access to full mDST sample on DVD allows a mDST subset to be quickly generated for advanced analyses developing new algorithms not yet in the database

Richard Partridge

the physics object database pod
The Physics Object Database (POD)
  • Stores fully calibrated meta-data associated with the various physics objects
    • leptons, photons, jets, missing ET, secondary vertices, triggers, etc.
    • for example, an electron object would have the energy, direction, and various quantities used in the electron ID algorithms stored
  • Each physics object associated with a table in a relational database
  • Primary key uniquely identifies each physics object and provides information needed to correlate physics objects from a single event
    • Currently use Run, Event, Instance (where appropriate) and row number from ntuple used to load database
    • Alternative: data source index, sequence number, and instance

Richard Partridge

why use a relational database
Why use a Relational Database?
  • Physics objects typically have a fixed set of attributes used for event selection and analysis
  • Independence of tables aids loading, updating database
    • Data can be “bulk loaded” as long as primary key is provided in input data stream
  • Several vendors with quite capable products, large commercial market

Richard Partridge

prototype pod
Prototype POD
  • Use DØ Run 1 data (1992 - 1996 running period)
  • 62 million events loaded into the database
  • Entire “All-Stream” data set loaded
    • Data set used by almost all DØ physics analyses
    • Only files with special processing or trigger conditions excluded
  • Column-wise ntuple format used for importing/exporting data

Richard Partridge

d run 1 pod
DØ Run 1 POD
  • Including indexes, Run 1 POD occupies ~100 GB
    • 58% physics object data
    • 18% indexes on object ET
    • 12% primary keys
    • 12% database overhead

Richard Partridge

pod benchmarks
POD Benchmarks
  • Z  e+e- candidate event selection:
    • 7 seconds to identify ~6k events
  • W  en candidate event selection:
    • 18 seconds to identify ~86k events
  • Both benchmarks times make use of particle ID tables
  • Event selection times compare very favorably with ~1000 CPU hours required to generate ntuples used in this study

Benchmark Hardware/Software

  • 450 MHz dual-processor Pentium II with 256 MB RAM
  • Database stored on (6) 36 GB disks in Raid 0 stripe set
  • MS SQL Server running on Windows NT 4.0

Richard Partridge

dvd storage
DVD Storage
  • Provide access to additional event information not included in POD
  • DVD-RAM has a number of unique capabilities
    • Less expensive than disk storage, doesn’t require backup
    • Access to individual events is much faster than tape storage
  • Current disk capacity is 2.6 GB, 4.7 GB expected soon
  • Commercial DVD libraries hold up to 600 DVD disks
    • 2.8 TB capacity using 4.7 GB DVD-RAM disks
    • Average disk load time of 4.5 s, <1 hour to cycle through 600 disks
    • Up to 6 DVD-RAM drives gives ~10 MB/s IO rate

Richard Partridge

web interface
Web Interface
  • Plan to develop web-based user interface
  • Interface modelled on “3-tier” architecture widely used in commercial applications
  • Physicist will enter event selection requirements using a Java applet
  • Applet communicates request to “Physics Intelligence” middleware running on PHASER system (via CORBA)
    • Translate request to SQL for event selection
    • Verify that request can be accommodated within resource constraints
    • Produce the requested output files

Richard Partridge

phaser output
PHASER Output
  • Several output options:
    • List of run and event numbers satisfying the request
    • Ntuple created from POD information
    • mDST stream containing requested events from DVD library
  • Output files will generally be small enough to transfer over the network
  • Larger output files can be written to DVD and physically sent to physicist for further analysis

Richard Partridge

conclusions
Conclusions
  • PHASER offers a way for both experts, novices, and “dinosaurs” to quickly extract information about a particular class of events
  • Feasibility of loading “Run 1” size physics object info into a relational database has been demonstrated
  • Significant improvements in event selection time has been observed for W/Z benchmarks
  • Expect these results will scale up to Run 2 data load
  • Database technology is also potentially useful for helping manage complex analyses and storing intermediate results

Richard Partridge

ad