90 likes | 210 Views
Validation related issues. Auger-Lecce, 10 November 2009 BuildBot- introduction BuildBot@pbsfarm Site Wide Installation Issues related to Install/Config/Valid Updates on ValidationTests Conclusions and Outlook. BuildBot – Introduction.
E N D
Validation related issues Auger-Lecce, 10 November 2009 BuildBot- introduction BuildBot@pbsfarm Site Wide Installation Issues related to Install/Config/Valid Updates on ValidationTests Conclusions and Outlook
BuildBot – Introduction BuildBot is the system used in Auger to automate the compile/test cycle to validate code changes. By automatically rebuilding and testing the tree each time something has changed, build problems are pinpointed quickly. By running the builds on a variety of platforms, developers who do not have the facilities to test their changes everywhere before checkin will at least know shortly afterwards whether they have broken the build or not. The overall goal is to reduce tree breakage and provide a platform to run tests or code-quality checks. The Validation environment uses BuildBot as “testing automated framework”. Buildbot works in a master/slave daemons scheme. The master receives notification changes from the SVN server and tells the buildslaves to checkout, build and test the code. Multiple slaves can run on different platforms. The slaves report their results to the master, which posts them in a waterfall display and sends an email to the appropriate person(s) in case problems are found.
BuildBot @ pbsfarm • Setting up BuildBot slaves on our nodes allow to automatically test the build/test process on our system platforms. • A system virtual machine provides a complete System Platform which supports the execution of a complete Operating System (OS). • On pbsfarm, 2 system virtual machines have been set up.: • auger-le64.le.infn.it Operating System: Scientific Linux 4.7 • Architecture: 64bit(x86-64) • auger-le32.le.infn.it Operating System: Scientific Linux 4.7 • Architecture: 32bit(i386) • They emulates the pbsfarm real nodes used for simulation/reconstruction. • The idea behind is to have BuildBot running on it, using a “site-wide” installation.
Site Wide Installation • Using APE. Installation done from the virtual machines and located under nexus06. • For using it: • In your .bashrc includes • For 64 bit architecture: • export PATH=/nfs/argo/nexus06/gabriella/AugerOffline64Last/ape-0.98/:${PATH} • export APERC=/nfs/argo/nexus06/gabriella/AugerOffline64Last/ape-0.98/ape.rc • For 32 bit architecture • export PATH=/nfs/argo/nexus06/gabriella/AugerOffline32Last/ape-0.98/:${PATH} • export APERC=/nfs/argo/nexus06/gabriella/AugerOffline32Last/ape-0.98/ape.rc • At log, for configuring the environment you need to do a: • eval • `ape sh Externals` (for setting only the Externals) • eval `ape sh Offline` (for offline settings) • NOTE It works also for tcsh usingeval `ape csh Externals` eval `ape csh Offline`after setting in .tcshrc the equivalent of export (setenv PATH .... setenv APERC)
Issues @ Installation/Configuration Problems during Aires build/install. (ape-0.98) In #ape-0.98/ape.rc ... [package Aires] fc = g77 ... Should allow the setting of g77 as compiler in use, but it does not work. The compilation stops since the gfortran (default compiler) is not found. I manually changed the compiler setting directly in build/Aires/2-8-4a/config (mods FortCompile=“g77”) and then I entered the command build/Aires/2-8-4a/doinstall 0 Apparently things wereOKbut in the auger-offline-config the build of Aires introduces a set of libraries in the system area, that address a boost installed in the system that conflicts with the Boost in external,crash at run-time. Solved changing manually the auger-offline-config. TRAC (#34) It is MANDATORY to have $APERC set
Issues @ Validation • After a few rounds of validation on le-32 le-64 (see waterfall page @ http://129.10.132.228:8010/waterfall) • In some cases the StandardApplications are very slow (particularly on le-32) and the buildbot-master kills the application otherwise lasting forever. The problem seems to be worse since a few days. Apparently no mods related. The StandardApplications run involves full Sd simulation,starting from a Corsika air-shower, and randomize the core position on the array. It can sometimes happens that a core lands very close to a tank. In such a case an enormous amount of particles is run through Geant (... It is not worth simulating them in such details since those stations are in any case “saturated”...).... Only a luck of luckiness sequence?!... (Notice the SdSim events are never reproduced)... • The example and standardApplication running shows a difference between le-32 le-64. In (le-64) severals: • FDTriggerSimulatorOG:MakeMirrorEvent...TAnalysedPixelData::Analyse() – found invalid 0x7f pattern! • Seen also by Mariangela- Present also in other 64 bits build machines (see example in waterfall)- Requests to Tom Paul ... +Ralf Ulrich ...+ Michael Unger and Steffen Mueller ...(FDEventLib ... responsability) ... + HJ Mathes.....
ValidationTests Mods for Module Sequences: (used StandardApplications -data Reconstruction- as reference) PLEASE CHECK! SRec FRec • EventFileReaderOG • EventCheckerOG • SdCalibratorOG • SdEventSelectorOG • SdPlaneFitOG • LDFFinderOG • SdEventPosteriorSelectorOG • SdRecPlotterOG • RecDataLister • RecDataWriter • EventFileExporterOG • SValidStore • EventFileReaderOG • EventCheckerOG • FdCalibratorOG • FdPulseFinderOG • PixelSelectorOG • FdSDPFinderOG • FdAxisFinderOG • FdApertureLightOG • FdProfileReconstructorKG • RecDataLister • RecDataWriter • EventFileExporterOG • SValidStore with this Module Sequences: the code is working. To do- update the input-event before commit
ValidationTests • IO work- Main idea: • checking that new releases of Offline can read files produced with older versions. • How to approach this: • Trigger the BuildBot build on EventIO change. • As Input – A list of reference Events with different versions • A script running a read test • A script running the hybrid Simulation+Reconstruction + writing the Event + reco/sim test. Code 1 Sim ref TAG 1 I/O Sim Code 2 ref TAG 2 I/O Code ... TAG …I/O ref Sim TAG N-1 I/O Code N-1 ref Sim DEV N I/O Sim Code DEV Rec
Conclusions and Outlook • 2 BuildBot(slave) have been set up. They allows to automatically test the build/test process on the system platforms we use. • The use of a site wide installation from emulating node machines running BuildBot maximize the pinpointing of problems from our side. (The build is from the trunk with fixed externals). An offline reference is available. • Possible evolution of virtualization- Worker Node on demand for GRID(?) – A possible conservative approach: Check feasibility then do it @CNAF-if OK/ then propose to the collaboration. What is the status of porting OFFLINE on GRID? • The First issues from Installation, Buildbot setting and Validation are under study. • For the old SREC FREC Validation tests. The ModuleSequence has been modified in order to update. Code working Feedback needed!