1 / 28

DIAL: Distributed Interactive Analysis of Large Datasets

DIAL: Distributed Interactive Analysis of Large Datasets. DOSAR meeting. David Adams – BNL September 16, 2005. DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces. Contents. Status Interactivity Current results Conclusions

boris
Download Presentation

DIAL: Distributed Interactive Analysis of Large Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIAL: Distributed Interactive Analysisof Large Datasets DOSAR meeting David Adams – BNL September 16, 2005

  2. DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces Contents • Status • Interactivity • Current results • Conclusions • More information • Contibutors D. Adams DOSAR meeting DIAL

  3. DIAL project • DIAL = Distributed Interactive Analysis of Large datasets • Goal is to demonstrate the feasibility of doing interactive analysis of large data samples • Analysis means production of analysis objects (e.g. histograms or ntuples) from HEP event data • Interactive means that user request is processed in a few minutes • Large is whatever is available and useful for physics studies • How large can we go? • Approach is to distribute processing over many nodes using the inherent parallelism of HEP data • Assume each event can be independently processed • Each node runs an independent job to processes a subset of the events • Results from each job are merged into the overall result D. Adams DOSAR meeting DIAL

  4. Implementation • DIAL software provides a generic end-to-end framework for distributed analysis • Generic means few assumptions about the data to be processed or application that carries out the application • Experiment and user must provide extensions to the system • Able to support a wide range of processing systems • Local processing using batch systems such as LSF, Condor and PBS • Grid processing using different flavors of CE or WMS • Provides friendly user interfaces • Integration with root • Python binding (from GANGA) • GUI for job submission and monitoring (from GANGA) • Mostly written in C++ • Releases packaged 3-4 times/year D. Adams DOSAR meeting DIAL

  5. Components • AJDL • Abstract job definition language • Dataset and transformation (application + task) • Job • C++ interface and XML representation for each • Scheduler • Handles job processing • Catalogs • Hold job definition objects and their metadata • Web services • User interfaces • Monitoring and accounting D. Adams DOSAR meeting DIAL

  6. Dataset • Generic data description includes • Description of the content (type of data) • E.g. reconstructed objects, electrons, jets with cone size 0.7,… • Number of events and their ID’s • Data location • Typically a list of logical file names • List of constituent datasets • Model is inherently hierarchical • DIAL provides the following classes • Dataset – defines the common interface • GenericDataset – provides data and XML representation • TextDataset – embedded collection of text files • SingleFileDataset – Data location is a single file • EventMergeDataset – Collection of single file event datasets D. Adams DOSAR meeting DIAL

  7. Dataset (cont) • Current approach for experiment-specific event datasets • Add a GenericDataset subclass that is able to read experiment files and fill the appropriate data • Use EventMergeDataset to describe large collections • DIAL can carry out processing using the GenericDataset interface and implementation • Allowed for experiment to provide its own implementation of the Dataset interface • If GenericDataset is not sufficient • Processing components then require this implementation D. Adams DOSAR meeting DIAL

  8. Transformation • A transformation acts on a dataset to produce another dataset • Fundamental user interaction with the system • Transformation has two components • Application describes the action to take • Task provides data to configure the application • Task is a collection of named text files • Application holds two scripts • run – Carries out transformation • Input is dataset.xml and output is result.xml • Has access to the transformation • build_task – Creates transformation data from the files in the task • E.g. compiles to build library • Build is platform specific D. Adams DOSAR meeting DIAL

  9. Job • The Job class hold the following data: • Job definition: application, task, dataset and job preferences • Status of the corresponding job • Initialized, running, done, failed or killed • Other history information • Compute node, native ID, start and stop times, return code, … • Job provides interface to • Start or submit job • Update the status and history information • Kill job • Fetch the result (output dataset) D. Adams DOSAR meeting DIAL

  10. Job (cont) • Subclasses • Provide connection to underlying processing system • Implement these methods • Currently support fork, LSF and Condor • ScriptedJob • Subclass that implements these methods by calling a script • Scripts may be then written to support different processing systems • Alternative to providing subclass • Added in release 1.20 • Job copies • Jobs may be copied locally or remotely • Copy includes only the base class: • All common data available but job cannot be started, updated or killed • Use scheduler for these interactions D. Adams DOSAR meeting DIAL

  11. Scheduler • Scheduler carries out processing • The following interface is defined • add_task builds the task • Input: application, task • Output: job ID • submit creates and starts a job • Input: application, task, dataset and job preferences • Output: job ID • job(JobId) returns a reference to the job • Or to a copy if the scheduler is remote • kill(JobId) to kill a job D. Adams DOSAR meeting DIAL

  12. Scheduler (cont) • Subclasses implement this interface • LocalScheduler • Use a single job type and queue to handle processing, e.g. LSF • MasterScheduler • Splits input dataset • Uses an instance of LocalScheduler to process subjobs • Merges results from subjobs • Analysis service • Web service wrapper for Scheduler • This service may run any scheduler with any job type • WsClientScheduler subclass acts as a client to the service • Same interface for interacting with local fork, local batch or remote analysis service • Typical mode of operation is to use a remote analysis service D. Adams DOSAR meeting DIAL

  13. Scheduler (cont) Typical structure of a job running on a MasterScheduler D. Adams DOSAR meeting DIAL

  14. Scheduler (cont) • Scheduler hierarchy • Plan to add a subclass to forward requests to a selected scheduler based on job characteristics • This will make it possible to create a tree (or web) of analysis services • Should allow us to scale to arbitrarily large data samples (limited only by available computing resources) D. Adams DOSAR meeting DIAL

  15. Scheduler (cont) Example of a possible scheduler tree for ATLAS D. Adams DOSAR meeting DIAL

  16. Catalogs • DIAL provides support for different types of catalogs • Repositories to hold instances of application, task, dataset and job • Selection catalogs associate names and metadata with instances of these objects • And others • MySql implementations • Populated with ATLAS transformations and Rome AOD datasets D. Adams DOSAR meeting DIAL

  17. User interfaces • Root • Almost all DIAL classes are available at the root command line • User can access catalogs, create job definitions, and submit and monitor jobs • Python interface available in PyDIAL • Most DIAL classes are available as wrapped python classes • Work done by GANGA using LCG tools • GUI • There is a GUI that supports job specification, submission and monitoring • Provided by GANGA using the PyDIAL D. Adams DOSAR meeting DIAL

  18. D. Adams DOSAR meeting DIAL

  19. D. Adams DOSAR meeting DIAL

  20. Status • Releases • Most of the previous is available in the current release (1.20) • Release 1.30 expected next month • Release with service hierarchy in January • Use in ATLAS • Ambition is to use DIAL to define a common interface for distributed analysis with connections to different systems • See figure • ATLAS Distributed analysis is under review • Also want to continue with the original goal of providing a system with interactive response for appropriate jobs • Integration with the new USATLAS PANDA project D. Adams DOSAR meeting DIAL

  21. Possible analysis service model for ATLAS D. Adams DOSAR meeting DIAL

  22. Status (cont) • Use in other experiments • Some interest but no serious use that I know of • Current releases depend on ATLAS software but I am willing to build a version without that dependency • Not difficult • A generic system like this might be of interest to smaller experiments that would like to have the capabilities but do not have the resources to build a dedicated system D. Adams DOSAR meeting DIAL

  23. Interactivity • What does it mean for a system to be interactive? • Literal meaning suggests user is able to directly communicate with jobs and subjobs • Most users will be satisfied if their jobs complete quickly and do not need to interact with subjobs or care if some other agent is doing so • Important exception is the capability to a job and all its subjobs • Interactive (responsive) impose requirements on the processing system (batch, WMS, …) • High job rates (> 1 Hz) • Low submit-to-result job latencies (< 1 minute) • High data input rate from SE to farm (> 10 MB/job) D. Adams DOSAR meeting DIAL

  24. Current results • ATLAS generated data samples for the Rome physics meeting in June • Data includes AOD (analysis oriented data) that is a summary of the reconstructed event data • AOD size is 100 kB/event • All Rome AOD datasets are available in DIAL • An interactive analysis service is deployed at BNL using a special LSF queue for job submission • The following plot shows processing times for datasets of various sizes • LSF SUSY (green triangles) is a “real AOD analysis” producing histograms • LSF big produces ntuples and extra time is for ntuple merging (not parallelized) • Largest jobs take 3 hours to run serially and are processed in about 10 minutes with the DIAL interactive service D. Adams DOSAR meeting DIAL

  25. Conclusions • DIAL analysis service running at at BNL • Serves clients from anywhere • Processing done locally • Rome AOD datasets available • Robust and responsive since June • Next steps • Similar service running at UTA (using PBS) • Service to select between these • BNL service to connect to OSG CE (in place of local LSF) • Integration with new ATLAS data management system • Integration with PANDA D. Adams DOSAR meeting DIAL

  26. More information • DIAL home page: http://www.usatlas.bnl.gov/~dladams/dial • ADA (ATLAS Distributed Analysis) home page: http://www.usatlas.bnl.gov/ADA • Current DIAL release: http://www.usatlas.bnl.gov/~dladams/dial/releases/1.20 D. Adams DOSAR meeting DIAL

  27. Contributors • GANGA • Karl Harrison, Alvin Tan • DIAL • David Adams, Wensheng Deng, Tadashi Maeno, Vinay Sambamurthy, Nagesh Chetan, Chitra Kannan • ARDA • Dietrich Liko • ATPROD • Frederic Brochu, Alessandro De Salvo • AMI • Solveig Albrand, Jerome Fulachier • ADA • (plus those above), Farida Fassi, Christian Haeberli, Hong Ma, Grigori Rybkine, Hyunwoo Kim D. Adams DOSAR meeting DIAL

More Related