1 / 10

Corrupted MC data chunks

Corrupted MC data chunks. Offline weekly July 7, 2012. The issue. As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks) Matching efficiency drop and incorrect normalization factors In the above production, the effect is 3.5%(+1.2%)

job
Download Presentation

Corrupted MC data chunks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corrupted MC data chunks Offline weekly July 7, 2012

  2. The issue • As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks) • Matching efficiency drop and incorrect normalization factors • In the above production, the effect is 3.5%(+1.2%) • Full report in Savannah • The effect is only in MC

  3. Forensics • If a file (Trigger.root) is not created during the simulation phase the string of detectors in the trigger cluster are left empty and all ITS layers are skipped (no ITS tracks) • The error generates only a warning in the reconstruction • W-AliReconstruction::GetEventInfo: No trigger can be loaded! The trigger information will not be used! • The conditions for this are always in the late part of the simulation, usually, but not always, during digitisation

  4. Forensics (2) • Two ‘events’ have been discovered so far • AliRoot aborts during a failed access to OCDB (biggest contriibutor) • Silent crash, no specific error • The AliRoot abort generates ‘Abort’ signal, which should have been printed in sim.log (redirect from standard error stream) • However in some of the cases it does not appear… • … and subsequently is not caught by the job validation script • The silent crash is not caught by any of the ‘per job’ validations

  5. Forensics (3) • The defective jobs are not caught by • validation script – parses only *.log, not stderr/stdout • Per job CheckESD macro, successful also in the ‘corrupted’ case • The per run QA – there is a ‘hint’, but it is dissolved as the error is on ~4% level • …In addition, the mean vertex cut eliminates the events

  6. Re-validation of the productions • Fast and indirect method – size of the sim.log LHC11b10a Good production Bad chunks, 4.9%

  7. Re-validation of the productions (2) • Other cases and Pb+Pb LHC11b10c – not straightforward PbPb, OK period

  8. ‘Suspicious’ cycles • Tested all 2010 (149 cycles), 2011 (104 cycles), 2012 (62 cycles)

  9. Past productions remedy • From the above table, scan rec.log for • ‘W-AliReconstruction::GetEventInfo: No trigger…’ • to positively identify affected chunks • Ongoing… • Rename the ESDs and AODs in the catalogue to ‘something else’, which will not show up in the standard analysis searches • Mild danger for analysis, which uses ‘prepared’ collections – jobs will fail… • Merged AOD (deltas) will have to be re-merged • For Pb+Pb, a cut on ‘zero ITS tracks’ will eliminate the bad chunks

  10. Code fixes • job validation – scan all files (implemented) • per job ‘checkESD’ macro – strengthen the script, positive feedback to validate the job • QA – to be discussed • reconstruction logic – abort in case the Trigger.root file is not found • Follow-up by Offline, discussion in the weekly meetings

More Related