1 / 12

Datasets

Datasets. PPDG meeting Interactive analysis. David Adams BNL December 19, 2002. DIAL Dataset properties Dataset representations Dataset package status Future. Contents. DIAL. DIAL is Distributed Interactive Analysis of Large datasets DIAL described at

zayit
Download Presentation

Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Datasets PPDG meeting Interactive analysis David Adams BNL December 19, 2002

  2. DIAL Dataset properties Dataset representations Dataset package status Future Contents Datasets PPDG Interactive analysis

  3. DIAL • DIAL is • Distributed Interactive Analysis of Large datasets • DIAL described at • http://www.usatlas.bnl.gov/~dladams/dial/talks/021219_dial.ppt • Use DIAL to deduce dataset properties Datasets PPDG Interactive analysis

  4. Dataset properties • Dataset is a collection of data objects • Means to iterate over objects • Typically objects are also indexed with labels • Unique within dataset • For event data: event ID + type + string key • E.g. run 123, event 456, EM jet, cone_0.5 • Allows for random access • Data may be in a persistent store • Each object has a GUID Datasets PPDG Interactive analysis

  5. Dataset properties (cont) • Dataset has content • Indicates suitability for a particular analysis or other transformation • Might be expressed in terms of object labels • For ATLAS event data: • Event ID’s + type-keys for each (ATLAS) event • (Part of type in GriPhyN VDG) Datasets PPDG Interactive analysis

  6. Dataset properties (cont) • Data in dataset has a location • Persistent store where data may be found • List of files holding the data • File ID’s or LFN’s • Persistent store locates physical replicas • Or rows in RDB tables… • May be multiple locations for a dataset • Due to different representations • More later Datasets PPDG Interactive analysis

  7. Dataset properties (cont) • Dataset has a history • Transformation used to create the dataset • Executable, version, input parameters • (VDG transformation) • Input datasets • (VDG derivation) • Run-time properties (node, time, …) • Multiple values for distributed processing • (VDG invocation) Datasets PPDG Interactive analysis

  8. Dataset properties (cont) • Dataset has a unique identity (name) • So it can b referenced • Dataset has portable representation • Possible to carry around a description the content and location of a dataset without reference to any DB’s • Dataset package uses XML Datasets PPDG Interactive analysis

  9. Dataset representations • There are different ways to represent the data in a dataset • Simple datasets: • All data in a single file • Table in a RDB • Indexed list of GUID’s for a persistent store • Commercial ODB such as Objectivity • HES such as LCG POOL Datasets PPDG Interactive analysis

  10. Dataset representations (cont) • Compound datasets • Concatenation of datasets • Concatenation of content • Any overlap between content of constituent datasets must index identical objects • Subset of a dataset • Based on content • Result of an algorithm applied on a dataset • Virtual data Datasets PPDG Interactive analysis

  11. Dataset package status • Datasets • Generic implementation in place • http://www.usatlas.bnl.gov/~dladams/dataset • Assumes content is event data • Supported representations: • Single file • AthenaRoot format • ATLAS Monte Carlo generator output • Concatenation of events • Selection based on event ID Datasets PPDG Interactive analysis

  12. Future • Support other types of ATLAS event data • Add concatenation and selection based on event content • Add representation for POOL EventCollection • Add non-event data • Relevant conditions data objects • Derived metadata • Provenance and production history Datasets PPDG Interactive analysis

More Related