Reconstruction and Analysis on Demand: A Success Story

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA

Overview • Describe “Standard” processing model • Describe “On Demand” processing model • Similar to GriPhN’s “Virtual Data Model” • What we’ve learned • User reaction • Conclusion C. Jones CHEP03

Standard Processing System • Designed for reconstruction • All objects are supposed to be created for each event • Each processing step is broken into its own module • E.g., track finding and track fitting are separate • The modules are run in a user-specified sequence • Each module adds its data to the ‘event’ when the module is executed • Each module can halt the processing of an event Input Module Output Module Track Finder Track Fitter C. Jones CHEP03

Critique of Standard Design • Good • Simple mental model • Users can feel confident they know how the program works • Easy to debug • Simple to determine which module had a problem • Bad • User must know inter-module dependencies in order to place the modules in the correct sequence • Users often run jobs with many modules they do not need in order to avoid missing a module they might need • Optimization of module sequence must be done by hand • Reading back from storage is inefficient • Must create all objects from storage even if job does not use them C. Jones CHEP03

On-demand System • Designed for analysis batch processing • Not all objects need to be created each event • Processing is broken into different types of modules • Providers • Source: reads data from a persistent store • Producer: creates data on demand • Requestors • Sink: writes data to a persistent store • Processor: analyzes and filters ‘events’ • Data providers register what data they can provide • Processing sequence is set by the order of data requests • Only Processors can halt the processing of an ‘event’ Source Sink Processor A Processor B C. Jones CHEP03

Data Model A Record holds all data that are related by life-time e.g., Event Record holds Raw Data, Tracks, Calorimeter Showers, etc. A Stream is a time-ordered sequence of Records A Frame is a collection of Records that describe the state of the detector at an instant in time. All data are accessed via the exact same interface and mechanism C. Jones CHEP03

Data Flow: Frame as Data Bus Data Providers: data returned when requested Sources: data from storage Producers:data from algorithm Calibration Database Event Database TrackFinder TrackFitter Frame SelectBtoKPi EventDisplay Event List Processors: analyze and filter data Sinks: store data Data Requestor: sequentially run requestors for each new Record from a source C. Jones CHEP03

Callback Mechanism • Provider registers a Proxy for each data type it can create • Proxies are placed in the Record and indexed with a key • Type: the object type returned by the Proxy • Usage: an optional string describing use of object • Production: an optional run-time settable string • Users access data via a type-safe templated function call List<FitPion> pions; extract( iFrame.record(kEvent), pions); • (based on ideas from Babar’s Ifd package) • extract call builds the key and asks Record for Proxy • Proxy runs algorithm to deliver data • Proxy caches data in case of another request • If a problem occurs, an exception is thrown C. Jones CHEP03

Callback Example: Algorithm Processor SelectBtoKPi Producer Track Fitter FitPionsProxy FitKaonsProxy … Track Finder TracksProxy HitCalibrator CalibratedHitsProxy Source Calibration DB PedestalProxy AlignmentProxy … Raw Data File RawDataProxy C. Jones CHEP03

Callback Example: Storage Processor SelectBtoKPi Source Event Database FitPionsProxy FitKaonsProxy RawDataProxy … In both examples, same SelectBtoKPi shared object can be used C. Jones CHEP03

Critique of On-demand System • Good • Can be used for all data access needs • Online software trigger, Online data quality monitoring, Online event display, calibration, reconstruction, MC generation, Offline event display, analysis • Self organizes calling chain • Users can add Producers in any order • Optimizes access from Storage • Sources only need to say when a new Record (e.g., event) is available • Data for a Record is retrieved/decoded on demand • Bad • Can be harder to debug since no explicit call order • Use of exceptions key to simplifying debugging • Performance testing is more challenging C. Jones CHEP03

What We Have Learned • First release of the system was September 1998 • Callback mechanism can be made fast • Proxy lookup takes less than 1 part in 10-7 of CPU time on simple job that processed 2,000 events/s on moderate computer • Cyclical dependencies are easy to find and fix • Only happened once and was found immediately on first test • Do not need to modify data once it is created • Preliminary versions of data are given their own key • Automatically optimizes performance of reconstruction • Trivially added filter to remove junk events by using FoundTracks • Optimize analysis by storing many small objects • Only need to retrieve and decode data needed for current job C. Jones CHEP03

User Reactions • In general, user response has been very positive • Previously CLEO used a ‘standard system’ written in FORTRAN • Reconstruction coders like the system • We have code skeleton generators for Proxy/Producer/Processor • Only need to add their specific code • Easy for them to test their code • Analysis coders can still program the ‘old way’ • All analysis code in the ‘event’ routine • Some analysis coders are pushing bounds • Place selectors (e.g. cuts for tracks) in Producers • Users share selectors via dynamically loaded Producers • Processor only used to fill Histograms/Ntuples • If stored selections, only rerun Processor when reprocessing data C. Jones CHEP03

Conclusion • It is possible to build an ‘on demand’ system that is • efficient • debuggable • capable of dealing with all data (not just data in an event) • easy to write components • good for reconstruction • acceptable to users • Some reasons for success • Skeleton code generators • User only has to write new code, not infrastructure ‘glue’ • Users do not need to register what data they may request • Data reads occur more frequently than writes • Simple rule for when algorithms run • If you add a Producer, it takes precedence over a Source C. Jones CHEP03

Reconstruction and Analysis on Demand: A Success Story