1 / 50

CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett). CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08. C. Zampolli. To be followed up. LHC12f  being processed smoothly LHC12e  done, waiting for LHC12d

dobry
Download Presentation

CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ever tried. Ever failed. No matter. • Try Again. Fail again. Fail better. • (S. Beckett) CPass0/CPass1 on LHC12f/e/d/cUpdated at 10:00 on 28/08 C. Zampolli

  2. To be followed up • LHC12f  being processed smoothly • LHC12e  done, waiting for LHC12d • LHC12d  being processed with Rev-22, failures in T0, and TRD • LHC12c  manual update ongoing • pA MC  test on LHC11f2 ongoing (merging will be submitted today) • pA data  run 187338, done at CPass0, but no info on MonALISA, not appearing at CPass1 snapshot • pA pilot run  CTP will have a Alias file with everything defined as kCalibBarrel (to my understanding) C. Zampolli

  3. To be followed up – II • kCalibBarrel triggers number published in logbook (waiting for news, promised by the end of August) • Validation codes from the detectors  still missing TPC and MeanVertex, will help for short runs failing in TRD due to statistics • Replicating the OCDB entries: new functionality implemented by Raffaele how should it be implemented in the calib code? • Member of the AliCDBManager? • Otherwise everybody will have to change their code… • Reprocessing of old runs  trigger classes defined, Aliases file to be created, then downscaling • Better selection of runs? Difficult in my opinion C. Zampolli

  4. Some diagnostics • During the last 2 months of CPass0/CPass1 processing, (quite) some manual intervention was needed • Fixing steering macros/scripts • Restarting CPass0 and/or CPass1 • Triggering CPass0 and/or CPass1 manually • Main reasons (to my memory… I might forget something) • Wrong AddTaskTPCcalib.C committed to the release by mistake during synchronization • Merging of syswatch trees not properly tested and consuming too much memory • TPC wrong OCDB update in makeOCDB.C macro for CPass1 • Wrong TPC gain threshold used for validation C. Zampolli

  5. Some diagnostics – II • Reprocessing of LHC12d due to a bug in the TRD reconstruction • Re-reprocessing of LHC12d due to a problem with TRD code in Rev-23 • Some LHC12e runs to be reprocessed after a fix in the aliases files due to “miscommunication” (mis = missing + wrong) between TRD, RC, Trigger, calibration • CPass1 manual triggering for runs failed in T0 at CPass0 (1 done, 20 to be done) • CPass1 manual triggering for a run for which CPass0 was merged manually (Raphaelle) • CNAF disk full • ALICE::CERN::T0 issue C. Zampolli

  6. Two more comments… • As already said in July, no modification in AliRoot that may affect the calibration should be requested to be ported to the Release if not properly tested in the calibration train on the grid • I cannot know whether changes may affect the calibration, the detector experts should • Since apparently it is not enough to show updates on Monday Offline, Tuesday RC, Thursday Offline Calibration Readiness and Friday Calibration usual meetings, I think it would be important that: • One person representing all the detectors taking part in CPass0/CPass1 should always be present at the calibration meetings • If the direct responsible(s) is not available, someone representing the corresponding detector should anyway participate, to propagate the information discussed there. C. Zampolli

  7. How to decide when to process a run • Currently, we process runs marked as good (DAQ flag), duration > 5min, GRP ok, with Beam • Could this be improved? Hardly to say… Not on the offline side at least… C. Zampolli

  8. LHC12f C. Zampolli

  9. Summary table – on 28/08 at ~ 10:00LHC12f • 69 in logbook • Filters used: LHC12f, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T0], with Beam • CPass0: • Snapshot: 69 • Reco+CalibTrain: 69 • Merging+OCDB: 69, 1 of which running • CPass1: • Snapshot: 49 • Reco+CalibTrain: 49 • Merging+OCDB: 44 C. Zampolli

  10. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f • COSMICS: 0 failure expected • EMCAL/PHOS/MUON: 13  failure expected • No triggers: 0  failure expected (too short run) • EE/EV/Expired: 0 memory issue during the merging (under investigation) • Running: 1 • Others (detectors): 5 (but all short runs) • Successful:55, but 1 (187338) has no logs in MonALISA • 55/(55+5) =91.7% success rate C. Zampolli

  11. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f 12 min, 4874 events/ 43825 tracks 6 min, 11111 events/ 114733 tracks 7 min, 11242 events/ 107505 tracks 7 min, 11089 events/ 138408 tracks 5 min, 11138 events/ 154946 tracks All failures due to too short runs (number of events/tracks in terms of events used by TRD calibration) C. Zampolli

  12. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f C. Zampolli

  13. Summary table – on 28/08 at ~ 10:00CPass1 – LHC12f • Of the55successful runs: • 49 at CPass1 reco+CalibTrain • 44 at CPass1 merging+OCDB C. Zampolli

  14. LHC12e C. Zampolli

  15. Summary table – on 28/08 at ~ 10:00LHC12e • 27 in logbook • Filters used: LHC12e, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T0] • CPass0, completed: • Snapshot: 27 • Reco+CalibTrain: 27 • Merging+OCDB: 27, 21 useful, 14 ok • CPass1, completed: • Snapshot: 15 • Reco+CalibTrain: 15 • Merging+OCDB: 15 C. Zampolli

  16. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e • COSMICS: 0 failure expected • EMCAL/PHOS/MUON: 6 failure expected • No triggers: 0  failure expected (too short run) • EE/EV/Expired: 0 memory issue during the merging (under investigation) • Running: 0 • Others (detectors): 10: 3 recovered so far for TRD, 7 remaining • Successful: 11  became 14 • 11/(11+10) = 52.4% success rate  became: 14/(14+7) = 66.6% C. Zampolli

  17. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e 14 min, events 14 min, events 14 min, events • TRD: • (*) suffered from missing class (CSPI8WU-S-NOPF-ALL) in the configuration during data taking • Fixed manually using CINT8WU-S-NOPF-ALL • Cpass0/1 should be re-run • (**) suffered from statistics – 186459 has CSPI8WU-S-NOPF-ALL but with zero triggers) • T0 suffers from high background, but limits will be increased • Re-running will be ok (but CPass1 should be triggered manually if Rev < Rev-23 will be used) C. Zampolli

  18. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e - REPROCESSING Failed (statistics) Ok CPass1 re-run! Failing again in CPass1 as expected, but T0 experts already fixed the OCDB C. Zampolli

  19. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e C. Zampolli

  20. Summary table – on 28/08 at ~ 10:00CPass1 – LHC12e • Of the 14 successful runs, 15 at CPass1 ( one more since 186601 was inserted manually!): • 15 at the snapshot 15 at CPass1 reco+CalibTrain • 15 at CPass1 merging+OCDB C. Zampolli

  21. Actions • COMPLETED • Since the period was too short, the manual update should be done together with LHC12d  waiting for this period to be completed C. Zampolli

  22. LHC12d C. Zampolli

  23. Summary table – on 28/08 at ~ 10:00LHC12d • 224 in logbook • Filters used: LHC12d, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T0] • CPass0 completed: • Snapshot: 220 • Reco+CalibTrain: 220 • Merging+OCDB: 220, 176 needed, 147 ok • CPass1 completed: • Snapshot: 148 (1 more than CPass0, triggered manually after CPass0) • Reco+CalibTrain: 148 • Merging+OCDB: 148, 148 needed C. Zampolli

  24. Difference between logbook and snapshot in MonALISA • In logbook, but not in MonALISA: • 184370 (EMCAL), 184645 (EMCAL), 185345 (ACORDE trigger), 185347 (ACORDE trigger), 185467 still in the migration process, checking with offline • In MonALISA but not in the logbook: • 185190 (short run, the quality flag was changed) C. Zampolli

  25. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d • COSMICS: 9  failure expected • EMCAL/PHOS/MUON: 33  failure expected • No triggers: 2  failure expected (too short run) • EE/EV/Expired: 1 memory issue during the merging, but then merged manually • Running: 0 • Others (detectors): 28 • Successful: 147 • 147/(147+28+1) = 83.5% success rate C. Zampolli

  26. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d Also TRD 16 recovered rerunning with looser constraints for validation (run 185460 not retried, since it failed anyway in TRD) C. Zampolli

  27. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d Hardware problem, fixed now C. Zampolli

  28. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d C. Zampolli

  29. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d Also TPC Merged manually C. Zampolli

  30. Summary table – on 28/08 at ~ 10:00CPass1 – LHC12d • Of the 147 successful runs: • 148 at CPass1 reco+CalibTrain • 1 more than CPass0 since CPass0 was merged manually and the objects were uploaded manually in the OCDB (184673) • 148 at CPass1 merging+OCDB… • …of which 147 successful (ignore the red TPC color)… • ...1 failed in TRD (184145)… • Different statistics for CPass0 and CPass1 • 480/480 chunks at CPass0 • 472/480 chunks at CPass1 C. Zampolli

  31. TRD issue • Due to a problem in the TRD reconstruction, some wrong OCDB entries were produced at CPass0; it is not possible to get the correct ones without re-running CPass0 • Some manual OCDB update is needed (after LHC12d is fully processed, ongoing for completed runs)  DONE • Then CPass0/CPass1 should be re-run with a Rev > Rev-18 • Rev-23 (the latest) was used • Changed in TRD code made the calibration not work properly • More tests, new re-running with Rev-22 • Will the failed runs be recovered? Waiting for experts’ reply still not known C. Zampolli

  32. Actions • CPass0completed • 20 runs failed at CPass0 due to T0 hardware problems • CPass1 should be triggered manually for these runs • To be done after reprocessing, since now it would be useless (they all contain TRD) • Re-running with Rev-22… ongoing C. Zampolli

  33. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing 12 min, 11490 events/ 208981 tracks, had not failed before C. Zampolli

  34. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing Hardware problem, fixed now C. Zampolli

  35. LHC12c C. Zampolli

  36. Summary table – on 28/08 at ~ 10:00LHC12c • 205 in logbook • Filters used: LHC12c, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T0] • Do not coincide with those in MonALISA, since runs were queued manually for CPass0 • CPass0 completed: • Snapshot: 208, 1 should be ignored (179444) • Reco+CalibTrain: 207 • Merging+OCDB: 207, 109 needed, 93 ok • CPass1 completed: • Snapshot: 93 • Reco+CalibTrain: 93 • Merging+OCDB: 93 C. Zampolli

  37. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c • COSMICS: 37  failure expected • EMCAL/PHOS/MUON: 58  failure expected • No triggers: 3  failure expected (too short, or not the right trigger configuration) • EE/EV/Expired: 0 • Others (detectors): 16 • Successful: 93 • 93/(93+16) = 85.3% success rate C. Zampolli

  38. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c C. Zampolli

  39. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c C. Zampolli

  40. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c C. Zampolli

  41. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c C. Zampolli

  42. Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c (*) Low statistics, recoverable (*) Low statistics, not recoverable (**) No SSD/SDD  number of contributors to Vertex Track = 0, TRD calibration failing, TRD fix in place; what about TPC? C. Zampolli

  43. Summary table – on 28/08 at ~ 10:00CPass1 – LHC12c • Of the 93 successful runs: • 93 at CPass1 reco+CalibTrain • 93 at CPass1 merging+OCDB… • …of which 84 successful in CPass1 (ignore the red TPC color)… • …and 9 failed in T0, but are MUON runs – they should have not gone through (different AliRoot, some changes in T0) • As soon as CPass1 is completed, 1 week of time will be given for manual update. If too little (QM, holidays), we’ll increase it. Then, Vpass should start C. Zampolli

  44. Actions • CPass0 completed; • 9 runs failed in TPC and TRD • Not recoverable, no CPass1 • 7 runs failed in TRD due to low statistics • TRD can recover them manually, but no CPass1 would be run after those • how will the other detectors mark these runs? • TOF, T0 bad • Mean Vertex good • TRP? TRD? • CPass1 completed on the available runs • In summary, ready for the manual update window 1 week for the manual update announced: deadline on Friday 31 Aug (so far, eventually extended to Monday) C. Zampolli

  45. Further comments C. Zampolli

  46. Interdependencies • Under discussion: does EMCAL runs need calibration triggers? (PHOS does not) • Seems not! C. Zampolli

  47. Further issues • Some reconstruction jobs fail with bad_alloc  under investigation • Grid tests with gdb ongoing  not many information retrievable, the jobs ran successfully • Valgrind test ongoing  did not show anything significant • Trying with Rev-21 on LHC12c, LHC12e • Many errors, but FPE, not bad_alloc • stack trace available • I could not reproduce the problem, still investigating C. Zampolli

  48. PPass • LHC12a and LHC12b Vpass validated  ready for Ppass • A patched Rev-16 was created to fix the TRD QA issue to be used to run Ppass • LHC12a completed, QA feedback last week • LHC12b completed, QA feedback last week C. Zampolli

  49. Calibration of old data • GRP/CTP/Aliases entries to be created, after defining the classes to be used for the reconstruction • Might be needed to apply some downscale • min(max(nevents/10,30000),nevents)/nevents, but we need to define nevents C. Zampolli

  50. pA • Since MB will be the main trigger, we propose to use that and downscale. • For the pA pilot run, all data are asked to be reconstructed, keeping ESDs, friends, and ITS RecPoints • Tests on the LHC11f2 ongoing  feedback will be asked C. Zampolli

More Related