1 / 13

DIAL

DIAL. PPDG meeting Interactive analysis. David Adams BNL December 19, 2002. Goals of DIAL What is DIAL? DIAL interactions Dataset properties Application properties DIAL status Future. Contents. Goals of DIAL. Demonstrate the feasibility of interactive analysis of large datasets

soleil
Download Presentation

DIAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIAL PPDG meeting Interactive analysis David Adams BNL December 19, 2002

  2. Goals of DIAL What is DIAL? DIAL interactions Dataset properties Application properties DIAL status Future Contents DIAL PPDG Interactive Analysis

  3. Goals of DIAL • Demonstrate the feasibility of interactive analysis of large datasets • Large means too big for interactive analysis on a single CPU • Set requirements for GRID middleware • Provide ATLAS with a tool to analyze DC1 and DC2 event data • More that just ntuples • Large samples • Distributed data and processing DIAL PPDG Interactive Analysis

  4. What is DIAL? • Distributed • Data and processing • Interactive • Prompt response (seconds rather than hours) • Analysis of • Fill histograms, select events, … • Large datasets • Any event data (not just ntuples or tag) DIAL PPDG Interactive Analysis

  5. What is DIAL? (cont) • DIAL provides a connection between • Interactive analysis framework • E.g. ROOT • Data processing application • Athena for ATLAS • User supplies task • Defines result • E.g. histogram • C++ code snippet to fill result DIAL PPDG Interactive Analysis

  6. What is DIAL? (cont) • Scheduler • Accepts dataset, task and application from user • Splits dataset along file boundaries • Creates and submits a job for each sub-dataset • Concatenates results from jobs • Makes combined result available to the user • Provides status reports • Fraction of events processed • Estimated time to completion • Partial results DIAL PPDG Interactive Analysis

  7. DIAL interactions 9. fill Job 1 Dataset 1 Dataset 2 Result 7. create 8. create(app,tsk,ds1) Dataset 6. split 10. gather Scheduler 4. select 1. Create or locate 8. create(app,tsk,ds2) Analyzer 5. submit(app,tsk,ds) e.g. ROOT 2. select 3. Create or select Job 2 Result Application Task 9. fill e.g. ATHENA DIAL PPDG Interactive Analysis

  8. Dataset properties • From this interaction we deduce the following properties for datasets: • Dataset is a collection of data objects • Dataset has content • Dataset has location • Dataset has an identity • Dataset is portable • For details, see following talk • http://www.usatlas.bnl.gov/~dladams/dataset/talks/021219_dataset.ppt DIAL PPDG Interactive Analysis

  9. Application properties • Current specification is • Name • E.g. athena • Version • E.g. 5.10.01 • List of shared libraries • E.g. libRawData, libInnerDetectorReco DIAL PPDG Interactive Analysis

  10. Application properties (cont) • Each DIAL compute node provides an application description database • Indexed by application name and version • Application description includes • Location of executable • Run time environment (env variables) • Including executable and shared library paths • Command to build shared library from task source code • Can be shared by nodes with the same OS/compiler DIAL PPDG Interactive Analysis

  11. Application properties (cont) • Alternative view: Packages • Application specifies • Software packages (e.g. ROOT or ATLAS) • Executable • Shared libraries • Data files • Task build command and run time environment are extracted from the package(s) • Requires common package interface • No need to distribute application definitions DIAL PPDG Interactive Analysis

  12. DIAL status • All components in place • http://www.usatlas.bnl.gov/~dladams/dial • But scheduler is very simple • local (same node) • creates a single processing job DIAL PPDG Interactive Analysis

  13. Future • Refine application definition • Scheduler • Remote processing • Multiple jobs (splitting input dataset) • Multiple sites using GRID • GRID integration • Identify components • Set requirements • Incorporate existing products DIAL PPDG Interactive Analysis

More Related