1 / 42

Run Coordinator Report on behalf of everybody involved in Pit Operation

Run Coordinator Report on behalf of everybody involved in Pit Operation. Pit Operation - “Luminosity Production” - is in good hands with many devoted and competent people from experts to shifters But as conclusion will state, we need more to guarantee quality physics

hilda
Download Presentation

Run Coordinator Report on behalf of everybody involved in Pit Operation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Run CoordinatorReporton behalf of everybody involved in Pit Operation • Pit Operation - “Luminosity Production” - is in good hands with many devoted and competent people from experts to shifters • But as conclusion will state, we need more to guarantee quality physics •  Experts would also like to be able to devote a bit of time to physics analysis • Also, luckily we left behind a very good team at CERN meeting the challenges of this week with nominal bunch intensities!

  2. Agenda • Concentrate on global topics that are of concern interest to entire collaboration • Will not discuss the status of the individual sub-detectors unless affecting global operation • In the past presented a lot how we followed and participated to the beam commissioning • Main topics • Operation up to now • Operational status and efficiency • Luminosity • Data Quality • First experience with nominal bunches • Trigger • Organization • Tools to follow operation • Shifter situation, the working model, and the needs for the future

  3. Availability • Machine and Experiment Availability • Extremely low average failure rates * extremely high number of vital system = 0.50 • Thunderstorms daily now! • Tripped LHCb magnet already twice • Wednesday: • AFS problem • SPS down • Thunderstorm • VELO motion • …. • A lot for one day….. • Still we took 1h of physics! • Wrong fill number for 30min!

  4. Pilot Run  Physics Run Where the plan stopped at the RC report in March:

  5. Integrated Luminosity 2010Q2 MB<100 Hz (Hz/mb) MB<1 kHz HLT1 rejection Nominal bunches B down HLT1 rejection HLT2 pass-thru Minimum Bias, HLT pass-thru Bdown~7.6nb-1 Bup~5nb-1

  6. Operational Efficiency • Cumulative (In-)efficiency logging implemented since fill 1089 • Breakdown on HV, VELO, DAQ, DAQ Lifetime (trigger throttling) • Entered into Run Database Operational luminosity (in-)efficiencies May 10 – June 5

  7. Operational Procedures and Efficiency • LHCb dependence on LHC: • Short-hand page for LHC Operators and EICs • Completely automatized for LHCb Shifters requiring ‘only’ confirmations • Also Voice Assistance • VELO still to be fully integrated • Very advanced as compared to Atlas and CMS….

  8. Operational Procedure and Efficiency • LHCb State Control

  9. Operational Procedure and Efficiency • Shifter Voice Assistance • Draw attention to new information or changes • LHC Page 1, injection, optimization scans, etc • Instructions for LHCb State Control handling • HV/LV handling, BCM rearm etc • Undesired events… • Beams lost, run stopped, magnet trip, clock loss • DSS Alarms, Histogram Alarms to be added and voice quality to be improved • Related work in progress: Clean up shifters instructions on the consoles and add help button to all displays • Collapse of separation bumps simultaneous between all experiments • Golden orbit established with improved reproducibility • Good luminosity already during Adjust • Optimization scan right at start of Stable Beams starting with experiment with lowest luminosity • Full VELO powering during ADJUST (TCTs at physics setting and separation bumps collapsed) • Powering of VELO by central shifters next step • Future of VELO Closure by Shifter/Expert being discussed • Closing Manager now very user friendly • Aim for have “on-call” shifter for closing, preferably same as piquet • End of fill calibrations, automization?

  10. Operational Procedure and Efficiency • Work on automatic recovery from DAQ problems in progress • Added one after the other • Start testing Autopilot • Majority mechanism when configuring farm to start run being looked into • Farm sw and storage sw crashes still room for improvement • Exclusion/recovery of problematic (sub)farms on the fly while taking data • Routine procedure for shifters • Recovery of monitoring/reconstruction/calibration farms while taking data • Faster recovery of sub-detectors without stopping the run (only trigger) becoming routine maneuvers for most shifters

  11. Trigger Rate overview • Two numbers for Trigger Deadtime counting • TriggerLivetime(L0) @ bb-crossings • TriggerLivetime(Lumi) @ bb-crossings • Also major improvements made on monitoring of the HLT (histograms and trends) • Both technical parameters and physics retention

  12. Systems Status • Problem with DAQ and control switch seems solved • Storage problem earlier this year also solved • Purchase of farm during 2010Q3, install in November during ion run • Some outstanding subdetector problems: • Dieing VCSELs in the subdetectors is a worry • SPECS connections in the OT tracker • Control of ISEG HV for VELO seems solved changing from Systec to Peak • L0 Derandomizer emulation of Beetle – about to be addressed • … • System Diagnostics Tools is like an AOB on every agenda since always… • Alarm screen and Log viewer • Well, better solving the problem than adding the alarm if that works!

  13. Data Quality Data Quality is of highest importance now (together with trigger) • Main problem: we need more interest/participation from people doing physics analysis • Discover and document “which” problem is tolerable and not tolerable • Impact on data quality in order to know the urgency of solving a problem  operational efficiency • Aim at perfect data obviously more than 100% operational efficiency • But recoveries should be well thought through, well planned and swift, and to the extent that it is possible coordinated with other pending recoveries! • How to classify data quality problems for different physics analysis • Establish routine for use of Problem Database • Checking in and checking out entries, fast feedback • Procedure outlined for decision on detector interventions which may have an impact on data quality • Working group setup to address Online Data Quality tools and follow up • Improvements of histogram presenter, histogram analysis, alarms etc • Need for trend plots, trend presenter and trend database being looked into • Documenting quality problems and their impacts/recoveries • Reconstruction Farm and associated histograms • More interest from subdetectors would be welcome

  14. Histogram Presenter • Shifter catalogue • Most important/significant histograms with descriptions and references • Several iterations, still need improvements and links to severity/actions • Alarm panel from automatic histogram analysis • Associate sound/voice to alarms

  15. Data Quality • The tool for registering data quality problems – Problem Database • Shared between Online – Offline • http://lbproblems.cern.ch/ (“Problem DB” from LHCb Welcome page)

  16. Luminosity • Three sources of luminosity online • Counted by ODIN using non-prescaled L0Calo or L0Muon trigger from L0DU • Getting average number of interactions per crossing and pileup from fraction of null crossings • Correcting luminosity real-time  Recorded luminosity • Beam Loss Scintillators acceptance determined relative to L0 Calo • Luminosity corrected for pileup • LHC Collision rate monitors (BRANs) • Not yet calibrated but in principle only used for cross-checking • Combination gives delivered luminosity • Recorded in Online archive, Run Database, LHC displays and logging, and LHC Program Coordinator plots (delivered) for overall machine performance • Optimization scans are based on this combined luminosity • For offline lumi triggers containing luminosity counters – “nanofied” • Tool being finalized to obtain integrated luminosity on analyzed files • Constantly at 1 kHz • Careful changing thresholds/prescaling on sources of the lumi counters

  17. Run Database • http://lbrundb.cern.ch/ (“RunDB” on LHCb Welcome page) • Tool for anybody in the collaboration to get rough idea on data collected • Help/documentation should be linked

  18. Van der Meer Scan • Van der Meer scans • To a large extent automatic with ODIN connected directly to the scan data received from LHC real-time and flagging the steps in the data • Allows easy offline analysis • Has allowed a first determination of length scales (LHC/VELO) and of absolute luminosity: • Visible L0Calo cross-section to 60+/-6 mb (prel) • From MC: s(L0 CALO) = s(L0) x 0.937 = 63.7 * 0.937 = 59.7 mb • Many things still to be verified, another vdM scan is on our planning • Also allows another method to extract beam shapes and VELO resolution

  19. Experiment Condition Analysis Tool • Access to experiment condition archive in the online system • Machine settings • Beam parameters measured by machine and LHCb • Backgrounds measured by machine and LHCb • Trigger rates, luminosities, VELO luminous region, bunch profiles • Run performance numbers, etc • Tool also produces LPC files for luminosity, luminous region and bunch profile data

  20. Commissioning Nominal Bunches • Arrived at a dead-end with Qbunch ~ 2E10 (max 4-5E10) • More to understand with increasing Qbunch than Nbunch • Summer months with not all experts present • Keep up luminosity ladder for this year  June 9 - June 25 (16 days!) 3x3@1.15E11 7x7@5E10 3x3@0.9E11 3x3@0.8E11 13x13@2E10

  21. Luminosity Ladder • Increasing number of nominal bunches through July-August • 170 kJ  1.5 MJ • Gain experience • Understand already strange bunch/beam behaviour • LHC Operation does not feel ready for 0.5 – 1 MJ yet, work in progress 2x2 1e11 2 1 112 2.5E29 0.005 (1 fills) 3x3 1e11 3 2 168 5.0E29 0.03 (3 fills) 6x6 1e11 6 4 336 1.0E30 0.7 (10 fills) 12x12 1e11 12 8 672 2.0E30 2.1 (10 fills) 24x24 1e11 24 16 1344 4.0E30 4.9 (10 fills) Trains needed…

  22. Machine Protection • Complete two-day internal review of the Machine and Experiment Protection • >1.5 (3) MJ • Long list of actions • Will be followed by a complete external review • Dump following lightning strike and power blackout!

  23. First Experience with Nominal Bunches • Four fills with 3x3 • #fill Qbunch L0Calo Pileup PeakLumi Efficiency • 1179 0.8E11 7500 1.2 0.15 78% (VELO lumi-monitoring/BPM/new conf) • 1182 0.9E11 16000 1.7 0.46 68% (deadtime, HLT blocked) • 1185 1.15E11 19300 2.3 0.73 85% (RICH, VELO, • 1186 10000 1.3 0.22 To be patched (wrong fill number but stable) • 1188 16000 1.7 0.46 65% (Storage, HLT, VELO, Trigger OK) Rocky start!... Old L0 settings + HLT1+ HLT2Express (Stable but 15% deadtime) Reconfiguring: New L0 settings + HLT1+ HLT2Full (30 min) Memory and combinatorics – run died and tasks stuck… 2 hours to recover/reconfigure New L0 + HLT1 + HLT2Express Completely stable through entire night

  24. Nominal Bunch in LHCb • We’ve been sailing in light breeze up to now • Not only interaction pileup but also problem pileup • Pileup 2.3! • Occupancies • E.g. Problem with MTU size for UKL1 • Event size 85 kB (used to be 35 kB) • Storage backpressure • Running with 10% - 20% deadtime at 1500 – 2000 Hz at 85 kB (peak!) • Suspicion is that MD5 checksum calculation limits output (again) to 1 Gb/s • Lurking instabilities in weak individual electronics boards? • Desychronizations, data corruption, strange errors at beginning of fills….

  25. OT Occupancies • Peak occupancies 22%! Average >7.5% as compared to 5% in the past

  26. L0 Trigger Before and After 1E11 • (0x2710  0x1F) • L0-Mb (CALO, MUON, minbias, SPD, SPD40, PU, PU20)  Prescale by 100 • Physics • Electron 700 MeV 1400 MeV • Hadron 1220 MeV  2260 MeV • Muon 320 MeV  1000 MeV • Dimuon 320/80MeV  400 MeV • Photon 2400 MeV 2400 MeV • Yet another configuration prepared • L0xHLT1 retention 2%, including HLT2 would allow to go to 200 kHz • Would prefer not to use even if we have to run with a bit of deadtime • Changed to solve 10% - 20% deadtime problem • System completely stable with deadtime but long to stop in case of problems…. • 10 kHz of random bb-crossing and be-, eb-, ee-crossings according to • Weighting {bb:0.7, eb:0.15, be:0.1, ee:0.05}

  27. HLT Trigger • Technical problems in HLT • HLT1 (3D) OK with 7.5 % retention • HLT2Express stable but contains only J/y, L, KS, Dsfp, D*D0p, BeamHalo • HLT2Full (150++ lines) serious problems and surely a lot of unnecessary overlap • HLT2Core (81 lines) validated with FEST and data taken during weekend • Configured in pass-through now to test it and check output before we have to switch on rejection >6x6 • Best compromise we have for the moment together with L0TCK 0x1F • First impression is that it was working stable during fill this night • Processing time for HLT with HLT2Express observed to be 140ms… • 450 nodes x 8 task * 1/140E-3 = 26 kHz! • To be followed up • Should see how this developed with HLT2Core during this nights fill • Two measures to solve bad memory behavior partly and stuck tasks already done • Activating swap space on local disk of farm node improved significantly the situation • Automatic script prepared which would kill the leader • Requires careful tuning and testing since memory spread is narrow • Memory/disk in Westmere machines?

  28. Trigger Strategy • We managed to take a lot of data containing full natural mixture of pileup • Invaluable for testing, validating and debugging HLT • Lucky we got nominal intensity now with few bunches!... • We aim hard to be flexible and should keep this spirit • But converge quickly on compromise for physics and technical limitations • Most of all solve bugs and tune system • Avoid cornering ourselves in phase space now in panic by severe cuts • Exploring and understanding is now or never • Procedure for release of new TCKs works well now and efficient • But should not be abused!  • FEST is an indispensible tool for testing/debugging/validating HLT • Make sure it satisfies needs for future • More HLT real-time diagnostics tools to be developed • Effect of L0 derandomizer and trains…. • No proper emulation for Beetle and we are forced to only exploit half of buffer • We currently accept all crossings…  Filling scheme for autumn  25% L0 deadtime

  29. Luminosity Tuning • Two possibilities to reduce luminosity per bunch • Back-off on beta* • Requires several days – week of machine commissioning • Collision offset in the vertical plane • Beam-beam interaction with an offset between the beams can result in an emittance growth • Follow ongoing tests for Alice to reduce luminosity by a factor 30 • Hoped to detailed news from Alice beam offset tests • Attempt during end-of-fill study this morning but not completed due to control software • HOT NEWS while I was in the plane: Seems to work fine

  30. Organization • Daily Run Meeting ~30 minutes • EVO everyday • Chaired by Run Chief • 24h summary with Run Summary attached to the agenda (Lumi, Efficiency, Beam, Background) • LHC status and plan • Round table where experts comment on problems • Internal Plan of the Day • Minutes from Run Meeting and other postings on Run News serve two purposes • Expert follow up on problem • Inform collaboration about daily operation – strive for public language in 24h summary and plan for next 24h • Improve • Systematic follow up on data quality • Check lists • Checkup on Piquet routines • Invite more Run Chiefs – already discussed with several candidates • Meetings three days a week when we are ready for this (Monday – Wednesday – Friday) • Requires more discipline from piquets and efficient exchange of information directly with involved people • Synchronize piquets take-over with overlaps

  31. LHCb Operation on the web • http://lhcbproject.web.cern.ch/lhcbproject/online/comet/Online/ • (“Status” from LHCb welcome page)

  32. Shifter Training • Shifter Training • Completely overhauled and updated training slides • Refrsher course now as well • With EVO in future • Invite piquets to go through Shifter Histograms with Data Managers • Insist more on shifts with already experienced shifters as newcomer

  33. Contributions to LHCb • In my view the experiment consists of sort of three levels of activities: 1. Maintaining and developing all from electronics to the last bit of software in the common interest of the experiment. 2. Producing the data we use for analysis, basically carried out by four types of shifters: Shift Leader, Data Manager, Production Manager, Data Quality checker 3. Consuming the data and producing physics results • Activity 1 and 2 should not be compared and counted in the same "sum“ • Activities 2 and 3 are instead coupled:  "I contribute to produce the data that I analyze" • Huge benefit taking regular shifts, learn about data quality, and have the opportunity to discuss and exchange information about problems met in your analysis of real data • Shifter situation “Far from satisfactory” – What does it mean? • Means that “the situation is vital to improve” by: • Maintaining current commitments • And making an additional effort which is relatively modest spread across all of LHCb!

  34. Shifter Situation • Shifter model based on the idea of “volunteers” • Not synonymous with “offering a favour” to people heavily involved in operating LHCb • Based on the idea of feeling responsible, in particular for your own data • We need people interested in learning about the detectors and data they are hopefully going to use • Each group would normally find the representatives themselves, also to a large extent meaning an Experiment Link Person • Why this model? • Because we don’t have neither the tools, nor the time and strength to be bureaucratic • However, up to now not sufficiently clear on the size of the required commitments • November 2009 – July 2010 #/24h #Shifters #Shifts • Active Shift Leaders 3 30 660 • Active Data Managers: 3 61 (- Dec) 564 • Active Production Managers: 2 27 408 • Active Data Quality Checkers: 1 11 13 Total 9 129 1768

  35. Authors and Shifters per Institute November 2009 – July 2010 Current Normalized Contribution Institute

  36. Number of Shifts per Institute per Type November 2009 – July 2010 Institute

  37. Shifts per Shifter Nov 09 – July 10 (3 / 24h) Nov 09 – July 10 (3 / 24h) Nov 09 – July 10 (1 / 24h) Nov 09 – Dec 10 (2 / 24h)

  38. Theoretical Commitment Level • Assuming • Perfect uniform availability (no exclusion of weekends, nights) • Immediate replacement of people leaving and no lag in training new people

  39. Shift Committment Thumb Rule • Change in subdetectorpiquets coverage being increasingly assured by non-experts instead of experts • Should free the people with the ideal profile for shift leader shifts this year • “One available shifter taking 4-6 shifts every 2 months per 3 authors”, • Recruited 2010-2011 from: • Shift Leaders: A pool of 50-100 people with experience in commissioning/operation of LHCb • Data Managers: All authors making physics analysis • Production Managers: A pool of 50-100 people with experience with analysis on Grid • Data Quality: All authors making physics analysis

  40. LHC Schedule

  41. Conclusions • Experiment Conditions are good, machine is very clean • Data Quality • Requires fast reaction time and feed-back/good communication with offline • Establish the habit and routine • No Data Quality offline now for two weeks! • Find appropriate compromise for trigger is of absolute highest priority and solve technical issues • Dedicate time/luminosity intelligently now • System stability, individually is good but multiplied with the number…. • Sensibilize everybody to react to any anomaly and act quickly • Big step from 10 years of MC to real data • Masochistic exercise to produce shifter bstatistics • Need improvements and functions in ShiftDB tool • Great team work, spirit and perseverance • Join us to produce Your data! • LHC bunch evolution til end of August • Up to 24 bunches with 16 colliding in LHCb = 1.55 MJ/beam

  42. Accesses • Regular opportunities for access up to now • OT tracker opened 3 times to change FE box • Impact on data quality • Procedure for filing access request and handling works well • Taken care of very well by shifter, Run Chiefs and Access Piquet/RPs • Issue: • Still no instruments for radioactivity in magnetic field!  Complicates access where in principle magnet could be left on

More Related