TCK & Configuration

TCK & Configuration Or “How to configure the HLT (i.e. a Gaudi job) deterministically & repeatable, and provide accountability & introspection (from C++) – while allowing reconfigurations during running without introducing more deadtime than required…” Talk is based on an actual working, ‘proof-of-principle’ implementation: What is described here actually works in practice, but the implementation is not necessarily suitable for serious use, and needs to be ‘cleaned up’ (e.g.. It might not work on Windows ‘as-is’, uses an ad-hoc file format to describe configurations, … ) but at least it does work, and provides a starting point for a proper design…

The General Idea • There is a need for a trigger which can change “Configuration” ‘on the fly’ • E.g.. Due to decreasing instantaneous luminosity • Configurations might change during a ‘run’ as a full start-stop cycle takes far too long. • Need to know for a given event which configuration was used (reliably!) • Otherwise determining trigger efficiencies becomes a lottery, and analysis virtually impossible • So there are different “Trigger Configurations” • To distinguish between them, we label each of these with a “Key”, the “Trigger Configuration Key” • Given a TCK (and a TCK only!) one should be able to determine which configuration was used • i.e. which algorithms (& tools?) were used, in which order, and how they were configured (i.e. which cuts did they apply?) • be able to do this from C++ code • In order to insure accountability for each event, the TCK will be part of the event data (to be precise: part of the ODIN bank) • Given a TCK, the trigger (both L0 and HLT) will have to make sure they use the matching configuration when processing the event • Before processing an event, the trigger has to check whether the ‘current’ configuration still matches the TCK in the event which it is about to process • If yes, continue • If not, first reconfigure accordingly, then continue

Configurations: back to basics A Gaudi Program is configured by specifying ‘Properties’ • Either by their default properties, or by joboptions, or through python (e.g.. Configurables, but not restricted to them), or by explicit C++ calls, or … The Sequencing of Algorithms: • Which algorithms run, and the order in which they run, is specified by the TopAlg property of the AppMgr • And, in case one of those Algorithms is a Sequencer, by the ‘Members’ property of the Sequencer (recursively) • Or, better, by their subalgorithms (provided they are created by calling Algorithm::createSubalgorithm insteadof talking directly to the AppMgr – not the case for currently released GaudiSequencer) • Note: ignore the ‘Data-On-Demand’ service here on purpose: the trigger should avoid using it The Configuration of Algorithms • Specified by their Properties Note: ignoring tools & services for now…

Idea (inspired by the Gaudi HistorySvc) Given an (instance of an) Algorithm, provide an object which captures its configuration • name, type, version, properties • The HistorySvc uses the a ‘AlgorithmHistory’ class • It contains not just ‘configuration’ but also references other instances to track ‘changes of configuration’ • not needed here, and it mixes two functionalities: capturing not just state, but tracing it as well; would have been better to split the two. • Use a similar but simpler class (i.e. could improve integration here) which specifies a single configuration only: • Algorithm Type, Name, Version, and list of Properties, represented by strings. • Note: properties are quite nice: one can easily convert all of them to and from a text representation!

Example AlgorithmConfiguration • Can create it given an Algorithm: • const Algorithm *myAlgo = … • AlgorithmConfig cfg(*myAlgo); • Can stream it to and from an std::ostream: Name: HadSingleTFRZVelo Type: HltTrkFilter Version: unknown Properties: [ 'OutputLevel':2 'Enable':True 'ErrorMax':1 'ErrorCount':0 'AuditAlgorithms':False 'AuditInitialize':False 'AuditReinitialize':False 'AuditExecute':False 'AuditFinalize':False 'AuditBeginRun':False 'AuditEndRun':False 'MonitorService':MonitorSvc 'ErrorsPrint':True 'PropertiesPrint':False 'StatPrint':True 'TypePrint':True 'Context': 'RootInTES': 'RootOnTES': 'GlobalTimeOffset':0 'StatTableHeader': Counter | # | sum | mean/eff^* | rms/e….. 'RegularRowFormat': %|-15.15s|%|17t||%|10d| |%|11.7g| …. 'EfficiencyRowFormat':*%|-15.15s|%|17t||%|10d| |%|11.5g| …. 'UseEfficiencyRowFormat':True 'ContextService':AlgContextSvc 'RegisterForContextService':True 'HistoProduce':True 'HistoPrint':False 'HistoCheckForNaN':True 'HistoSplitDir':False 'HistoOffSet':0 'HistoTopDir': 'HistoDir':HadSingleTFRZVelo 'FullDetail':False 'MonitorHistograms':True 'FormatFor1DHistoTable':| %2$-30.30s | %3$=7d |%8$11.5g | %10$-11.5g|%12$11.5g |%14$11.5g | 'ShortFormatFor1DHistoTable': | %1$-15.15s %2% 'HeaderFor1DHistoTable':| Title | # | Mean | RMS | Skewness | Kurtosis | 'PassPeriod':0 'HistogramUpdatePeriod':1 'ConditionsName': 'HistoDescriptor':[ 'rIP,400,-1.,3.' , 'rIPBest,400,-1.,3.' , 'Calo2DChi2,100.,0.,20.' , 'Calo2DChi2Best,100.,0.,20.' ] 'DataSummaryLocation':Hlt/Summary 'SelectionName':HadSingleTFRZVelo 'IsTrigger':False 'PatInputTracksName': 'PatInputTracks2Name': 'PatInputVerticesName': 'InputTracksName':RZVelo 'InputTracks2Name':L0TriggerHadron 'InputVerticesName': 'PrimaryVerticesName':PV2D 'OutputTracksName':HadSingleTFRZVelo 'OutputVerticesName': 'MinCandidates':1 'Filters':[ 'rIP < 50' , ' Calo2DChi2 < 4' ] 'Lines':[ 'rIP = bindAbsMin(TrVRIP,_HltVertices)' , "HltRZVeloTCaloMatch = gaudi.toolsvc().create('HltRZVeloTCaloMatch' , interface = gbl.ITrackMatch )" , 'Calo2DChi2 = bindAbsMin(TTrMATCH(HltRZVeloTCaloMatch),_HltTracks)' ] ] No subalgorithms.

Overall Flow A complete configuration is just a set of AlgorithmConfigurations • Ignoring tools & services for now • more on creating/managing this set later To tell Gaudi about them is almost trivial • Wrote a service (without public interface, nobody talks to it) • On Initialize: • Register it with the IncidentSvc for the beginEvent incident • Cache a a set of ( set of algo configurations) corresponding to a specified few TCKs (to avoid I/O later on caused by reading configuration information) • Update the JobOptionSvc with the set of properties for a specified TCK • At this point, nothing more is needed until a ‘reconfigure’is required… • note: at this point, no calls to Algorithm::iniitalize have been made yet: services are initialized first.. • Next, algorithms get initialized, pull their properties from the JOS & things proceed as usual. • When receiving a beginEvent incident: • Check if TCK used to configure is still valid (right now, just increase TCK by one every 100 calls) • Note: I am assuming I will be able to access the ODIN bank for the upcoming event at this stage, from a service. • If yes, just return • If no, update JobOptionSvc with the the set of properties corresponding to the new TCK • Call sysReinitialize for a configured set of algorithms • The instances of these algorithms are obtained (by name) from the AppMgr • Could just call it on all TopAlgs (if not otherwise told), but that might not be required/overkill • If sequencers use ‘createSubAlgorithm’, then this insures that this call propagated properly to all of their subAlgorithms

A few words on sysReinitialize Most time-consuming part to implement, mainly because people have made shortcuts in their code assuming this does not happen… • ‘just’ have to make sure those algorithms actually running in the HLT do the right thing… A lot of (most?) algortihms don’t implement reinitialize (properly) • If you only directly use properties, (i.e. do not derive something from them), you’re OK… • GaudiSequencer and HltSequencer don’t propagate sysReinitialize to their subAlgoritms • Trivial fix: create subalgorithms by calling Algorithm::createSubAlgorithm instead of talking explicitly to AppMgr (probably the right thing to do regardless) • Yes, HltSequencer should be discontinued asap • For now: “cheat”, and only fix the few HLT algorithms that I really need right now, and only support some changes (i.e. different cuts) and explicitly disallow others (i.e. changes in flow) as that would require more invasive code changes • register callbacks for some properties that check the above rules in one type of HLT algorithm, and (artificially) limit differences between TCKs to the supported ones Supporting sysReinitialize properly in all algorithms (tools) which the HLT uses seems needed by the requirements regardless of any implemenation It’s work, not insurmountable, just takes time to get it right. Can be verified by first configuring a ‘dummy’TCK at initialize, then immediately switch to the correct one before starting the first event, and comparing the results to configuring the right TCK already at initialize…

Current ‘ad-hoc’ on-disk representation of configurations Disclaimer: not even intended as a final solution • Needed something that was easy to implement the idea, and to explore the possibilities • Already have some ideas for improvements in structure • Will use whatever is acceptable to the online system, provided it matches the requirements of analysis use as well

Current ‘ad-hoc’ on-disk representation of configurations • Remember, a TCK (a complete configuration) is just a collection of individual algorithm (and tool, svc) configurations • Create one file for each algorithm (tool, svc) configuration • A TCK is then just a directory (aka collection of files) • Basically, a directory, labeled with the value of the TCK, with as contents a set of Algorithm Configurations • But since different TCKs might share subsets of common configurations, implement this a lookup table: • Each TCK is a lookup table from ‘Algorithm Name’ to the file containing its configuration • Can be implemented as ‘a directory full of links’

Current ‘ad-hoc’ on-disk representation of configurations • So we have one file per unique algorithm configuration • Expect many (most?) algorithms not to change when updating to a new TCK, avoid duplication of information. • Separate generating unique (algorithm) configurations from ‘composing’ complete TCK configurations • There is a different ‘level of paranoia’between these steps: just adding a new configuration for some algorithm is not quite as drastic as adding a reference to an algorithm configuration from some TCK • Having ‘pre-cooked’ algorithm configurations available allows easier re-use of them when ‘composing’ a TCK from building blocks • As a result, would like to refer to these individual files from a TCK – thus need a way of creating a reference (think POOL ref, but with slightly different requirements)

How to recognize/refer/find an AlgorithmConfiguration • Do as POOL does, but slightly different: • Must be able to do this at any time, any where, but given the same configuration, should give the same unique ‘reference’ • Compute a cryptographic ‘digest’ given the human-readable output. • Basically, run ‘md5sum’ (or ‘sha1sum’, or …) on the stream of characters • Choice depends on your level of paranoia, and the amount of space you’re willing to spend on a reference • The reference to the configuration previously shown, using md5, happens to be: • 11d24cb1250f2ca664a15cff3f09c164 • If at another time the same configuration for the same algorithm is made (not unlikely), this can be automatically recognized • Name & type are part of the content, and thus of the MD5 checksum • Also recognizes the case of a single algorithm instance appearing more than once in a job • Can easily answer question like: which TCKs use, for my favorite algorithm, these settings, without inspect the contents of the files • ls –l tck/*/* | grep 11d24cb1250f2ca664a15cff3f09c164

Idea… (not yet tried) In order to make managing TCKs even easier would like to turn them into hashtrees • i.e. explicitly put the MD5sum of the dependants of a given configuration in its configuration file • Basic idea is that if I want to know if the configuration of a sequencer changed, one needs to check (recursively!) whether the configurations of its dependents has changed -- so why not put an explicit reference to the MD5 summaries of the configurations of the (direct) descendants into the configuration. • No impact on the interface to the Gaudi side: it just gets a set of (algorithm) configurations, doesn’t care how this set was obtained • Management now easier: only need to know the MD5 sum of the ‘top’ algorithm(s), and can now uniformly navigate and collect the configuration of all dependents. • No need to interpret the ‘Members’ property as being special • Answering questions such as which TCKs used the same ‘hadron alley’ configuration is now even easier: just get the (single!) MD5 sum of the ‘top’ of hadron alley, and verify which TCKs (eventually) refer to this. • Implicitly checks for all dependents • Can improve ‘divide and conquer’: ask each ‘alley’ for a (small) set of configurations, and the answer will be a (small) set of MD5 values (and the corresponding configuration files!). Now just need to ‘compose’ them appropriately. • Changes at a lower level can get automatically propagated ‘upwards’ • A TCK now just refers to (a vector of) MD5 values of the TopAlg(s) instead of being a directory • Variation on this theme: ditto, but keep the tree structure separate from the actual configuration data i.e. don’t put MD5 of dependents in configuration file directly, just keep a parallel tree structure which refers (by reference, i.e. MD5sum) to the configurations

Creating & managing sets of configurations • In order to generate the individual configurations, TCK, and their mapping, wrote an Algorithm • When its ::execute is called for the first time, it queries the AppMgr for a specified algorithm (e.g.. ‘Hlt’) and dumps its configurations, and then, if it has a property called ‘Members’, recursively dumps their configurations. • Skips dump if configuration already present in file system (algorithms can be in more than one sequence). • It also creates the directory corresponding to a specified TCK, and makes links from the dumps in the AlgorithmConfig directory to the TCK directory • Now that I’ve fixed GaudiSequencer (and its HLT twin HltSequencer) to use createSubAlgorithm can drop the special treatment of the ‘’Members’’ property, and just call ‘subAlgorithms’ to obtain a vector of dependencies • Cloning (and modifying) an existing TCK is (in this implementation) a matter of copying an existing TCK directory to a new one • Can copy certain configuration to new ones (note: never modify anything in the AlgorithmConfig directory!), modify them, compute their new MD5sum and put them in the AlgorithmConfig directory, and link them into the new TCK directory • Note: can always check the integrity of the AlgorithmConfig directory by manually computing the MD5sums of the files – those should match their filename • Once settled on the real implementation, can make some utilities to automate this type of operation

Known Issues / Discussion • Right now, all properties are part of a configuration • Easy to do, but maybe not what is wanted • Not all properties actually affect the behavior of algorithms • Adding, (re)moving properties (even if they don’t matter, even in a baseclass) will result in a new configurations • Completely ignored configuration of Tools and Services for now • But again it seems ‘just’ a matter of recording their properties, updating them, and calling reinitialize • Configurations are now created by interrogating the AppMgr during execution – but I suspect that one can do the same given the Configurables for a job. • But it requires a migration to Configurables first… • Cannot go through JobOptionSvc as it does not know the default values • Which is a problem when switching from a non-default setting back to the default, as need to specify the new setting to the JOS, so defaults need to be part of the configuraiton • Size doesn’t seem to be a problem: • A typical configuration file as plain text is about 1 to 2 KB • Assume 10 different configurations each day, 100 days per year -> 1000 configurations / year • Assume 100 algorithms per configuration • If every algorithm would change (really, really pessimistic!), this would be 100 -- 200MB / year • Setup could be used outside of HLT… • Grid, … • In general, I’d prefer a non-HLT specific solution that can be / is re-used by others .;-) • No thought yet on a ‘production’ / ‘online’ setup – would first like to know if the concept is acceptable, and find out what constraints need to be satisfied. • Very much open to suggestions & help

TCK & Configuration