Run coordinator report on behalf of everybody involved in pit operation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Run Coordinator Report on behalf of everybody involved in Pit Operation PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Run Coordinator Report on behalf of everybody involved in Pit Operation . First, 2010 is a major achievement! THANKS EVERYBODY! . Seasonal Vacation!?. All lot of work in a relatively short Winter Stop!  Not just a pit stop for tire exchange but rather engine overhaul….

Download Presentation

Run Coordinator Report on behalf of everybody involved in Pit Operation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Run coordinator report on behalf of everybody involved in pit operation

Run CoordinatorReporton behalf of everybody involved in Pit Operation

First, 2010 is a major achievement! THANKS EVERYBODY!


Seasonal vacation

Seasonal Vacation!?

  • All lot of work in a relatively short Winter Stop!

     Not just a pit stop for tire exchange but rather engine overhaul…

Was it just a near miss to disaster?

  • NO! Far from, but we are not over the hill for 2011


2010 challenges extreme conditions

2010 Challenges – Extreme conditions

  • Operational objective in retrospect:

    • Explore on LHCb physics potential

      • Explore and tune detector, trigger and readout performance

  • June MD to go to nominal intensity and THEN increase number of bunches very beneficial but a lot of uncertainty in the luminosity (evolution) per bunch

    • 80% of design luminosity reached with 344 colliding bunches instead of 2622…

Average number of visible interactions per crossing

?

LHCb Design Specs

  •  Faced with preparations without knowledge about the ultimate parameters

    • Cannot formulate running conditions and operate this way next year

July

September

October

August


2010 challenges commissioning

2010 Challenges - Commissioning

  • 89 physics fills

  • Very limited day-time to commissioning properly trigger and tuning with non- CERN based experts/developers and continued increase of pile-up/bunches


Global operational performance

Global Operational Performance

  • Main source of operational difficulty

    • Changing ‘surprise’ conditions rather than extreme conditions

EFF Upgrade!

  • CMS: 43.2 pb-1 / 47.0 pb-1 : 91.2%

  • 84% usable by any analysis

  • >92% for muons only

  • Atlas: 45.0 pb-1 / 48.2 pb-1 : 93.6%

  • 93 – 98 % efficiency

    • Except a few one-off problems and “shock-m”

  • Most luminosity delivered with largest geometrical reduction factor

    • We only got 42 pb-1 delivered out of the promised 50 pb-1


Luminosity discrepancy

Luminosity Discrepancy

  • Systematic luminosity difference IP1/5 and IP8 – Not understood

    • Geometrical factor

      • July – August: LHCb 2x270 mrad 8-9% as compared to Atlas/CMS with 0mrad

      • B up+aext: LHCb 2 x (270 – 100) mrad 3% as compared to Atlas/CMS with 200 mrad

      • B down+aext: LHCb 2 x (270 + 100) mrad 9% as compared to Atlas/CMS with 200 mrad

    • Normalization – work starting up to normalize via Alice

    • b* / Waist effect?  Observations of strange geometrical effects during scans

B down

July

September

October

August

  • Will not be an issue 2011 as soon as we reach our maximum total luminosity

    • 2*1032 – 5*1032 cm-2s-1


Trigger compromise

Trigger Compromise

  • We received individual requests, complaints, and praising for our struggle

    • It wasn’t always so easy, many PPG-OPG evening and weekend email exchanges to understand and find best solutions

HLT Rate (t0) ~ 2.5 kHz

Luminosity (t0) ~ 1.5E32

Mu (t0) ~ 2.5

Trigger deadtime (t0) ~ 1%

L0 Rate (t0) ~ 350 kHz

TCK change

2.2 pb-1

19.1 pb-1

12.7 pb-1


End of fill procedure

End of Fill Procedure

Beam Dump handshake

  • Lost almost 0.8 pb-1 between beam dump warning and actual dump in total!

  • Modification

    • Movable Device Allowed flag will become “TRUE” also in BEAM DUMP mode

    • Dump handshake remains the same

    • But we no longer “protect” the VELO by dumping the beam if the VELO is not in garage position when LHC intends to dump the beam….

    • May still retract VELO but more room for flexibility in software

    • INJECTION and ADJUST logic remains the same obviously

  • Total

    29 handshakes out of 58 fills

    Sum of difference Total – LHCb = 100min

    Luckily most fills were lost! ;-)

    DT(WARNING  READY)[min]

    LHCb


    Normalized fill efficiencies

    Normalized Fill Efficiencies

    • All fills normalized to 1 pb-1

    94%

    233 bunches

    High luminosity with

    high mu

    Commissioning trigger

    nominal bunches

    SD DAQ problem

    during 1.5h fill

    EFF Upgrade

    Detector Safety System

    Commercial Hardware fault

    July

    September

    October

    August


    Event filter farm real time upgrade

    Event Filter Farm ‘Real-Time’ Upgrade

    50 subfarms of 19 nodes

    • 100 servers x 4 farm nodes

    • Configuration/Start Run >20min  6min by custom-made NFS

    •  Installed and commissioning in three days 5-8 October, fully ready for fill 1408

    • 50 Subfarms with two new servers (= 2x4 farm nodes) installed in each

    • 19 farm nodes/sum-farm

      • 4 low-end with 8 trigger tasks, 7 middle-end with 12 tasks, 8 high-end with 20 tasks

    • Summary= 950 farm nodes with triple farm capacity

    • Another 100 x 4 farm nodes during winter stop


    Operational difficulties 2010

    Operational Difficulties 2010

    • Main sources of operational inefficiencies in short

      • Changing conditions rather than extreme conditions

        • Lack of real knowledge about luminosity evolution

      • Shifter experience and instructions (not the fault of the shifter!)

      • Operational Parameters and system limits

        • Trigger rate

        • CPU consumption

          • Trigger optimized for m ~ 1.6 and 350 bunches AND OLD FARM (luckily…)

        • Event size

        • One bottle neck was hiding another (HLT CPU  L0 bandwidth)

      • Detector stability

        • HV trips

        • HPD disabling

        • Wrong configuration

        • Desynchronization

        • OPC servers

      • Diagnostics tools, diagnostics tools, diagnostics tools!


    System performance school example

    System Performance – School Example

    • Impressive!

    • Many things to analyze, understand and tune

      • In particular with the complete farm

    • We “lose” some nodes during running

      • I.e. For some reason ODIN doesn’t receive their event requests all of sudden

    TCK change

    Trigger Livetime

    Luminosity

    Event request rate

    System Latency

    Lost nodes O(%)

    Available Farm Nodes

    Destination Search Time

    Fill 1453


    Global operation observations

    Global Operation Observations

    Readout 2011

    • 1 MHz L0 readout

      • Only proven on “paper” up to now (partially and momentarily with idle system)

      • Loaded system has a VERY different behaviour as observed already 2010

    • L0 bandwidth per TELL/UKL1: Work in progress  Test

    • Readout network and storage bandwidth: Should be OK but recabling and additional switch

    • CPU capacity: Extensive testing with 2011 trigger

      • Challenge and work intensive for the next 6 months

      • “Trigger Boundaries” reached within 6 months

    • Load balancing: Monitoring and diagnostics

       System Performance Overview panel

    • We need time to test all of this extensively!

    • Running at limits

      • Rate and event size (= potential deadtime) influenced by beam orbit variations with displaced beams, background (e.g. [email protected]), and de-bunching, …

      • Careful about running at margin


    Global operation observations1

    Global Operation Observations

    Farm management and Controls

    • Configuration speed consolidation

    • More dynamic farm control needed

       Majority logic in FSM on CONFIGURE and START RUN to go to READY/RUNNING

      • 10-20% is sufficient to start and prepare the rest on the fly similar to recovery mechanism

      • On CONFIGURE, de-centralize FSM logic to allow nodes to continue from state OFFLINE READY independently of the state of the other nodes

      • Reconstruction and Monitoring Farm not needed either to (start) take data, only to take GOOD data

         If in trouble get them going once data taking has already started

    • Monitoring of incomplete events by counters (in e.g. Node Status panel)

    • Farm system performance overview

    • Farm log messages: Global limit per message for entire farm and not per node….

    • More (proper) use of Message Levels


    Global operational observations

    Global Operational Observations

    Trigger

    • Operational (functional) diagnostics

    • Performance monitoring

    • We really don't have many knobs, HLT we have to do more in shorter time than before

      Support for detector performance monitoring

    • All sub-detector scans with beam

      • Some should be performed regularly (every n pb-1)

      • Devise proper scheme for each and combinations

      • Permanent Trigger Configurations (with downscaling)

      • Express needs in terms of integrated luminosity

      • Regular scans must be supported by automated recipes

        • Work and testing during shutdown for the recommissioning in March high priority!

          Data quality

    • To good in 2010?

      • Ad-hoc treatments of trips and other problems

    • “Should” become an issue in 2011…

      • Need to be attentive and have the tools and improve feedback

      • Watch closely experimental conditions and detector effects

    • RMS – Radiation Monitoring System should become important 2011

      • Proposal for new back-end readout to replace VME scaler

      • TFC HUGIN (throttle-OR) is a very flexible hi-speed multi-channel board!


    Approach to running condition s 2011

    Approach to Running Conditions 2011

    • Note on “decision” about m and L:

      • m has mainly hard limits – rates*event size, CPU time, reconstruction etc

      • L has soft limits – detector stability

      • Unknown domain of detector operation and unknown domain of accelerator operation

    • Optimize d/dmShi*eOP1/2 * [s/b1/2] = 0 for physics output

      • where hi importance factor for a specific physics analysis

    • Operational stability eOP = eDAQ+ edead-time > 95%

    • Of course we should also be able to store and process events in reasonable time

    • Ageing – No problem 2010

      • Not necessarily a problem if we assume linear relation with particle flux and we collect more usable luminosity in shorter time

      • LHCb lifetime is integrated luminosity not years

         Focus on understanding of ageing mechanism and prognosis

    • Technical ambition 2011

      • 2011: Luminosity increase 2-3x (In 2010: 500x between July and November)

      • Operationally aim for m~2.0 – 2.5

      • Total luminosity 2 – 5 x 1032 cm-2s-1 3-4 x 1032 cm-2s-1 realistic my feeling from last year

    • Main consequences

      • Careful to run at limit of capacity

      • Manpower to monitor and follow-up on experiment conditions and detector effects

      • Regular scans to understand ageing/detector effects and the associated luminosity penalty

      • Pre-prepared extreme and liberal alternative trigger configurations allowing for flexibility


    Luminosity leveling by collision offset

    Luminosity Leveling by Collision Offset

    X (IP  t=0)

    • Luminosity leveling applied several times during 2010

      • First time on July 17 and July 18

      • In the steps between trigger configurations

      • Followed bunch behaviour with VELO/BLS and no sign of problems

    • Two beam stability tests done

      • 152 bunches x 1E11 @ 150ns up to more than 1 sigma

      • 100 bunches x 0.9E11 @ 50ns up to 6sigma

      • Tests with several 100 bunches and high intensity not done

    Last but most important consequence:

    Luminosity leveling is crucial to run LHCb at optimum luminosity 2011


    L0 rate variation

    L0 Rate Variation

    • L0 rate sensitive to many effects

      • Collision offset  Orbit variations of 20% - 25% of beam sigma  up to 10% in rate

      • Background such a beam-gas

      • Luminosity control communication and application and information latency

    L0 Rate vs mu

    Luminosity reduction vs sigma

    L0 Rate vs sigma


    Beam gas and vacuum

    Beam-Gas and Vacuum

    • No visible effect of any increase vacuum in LSS8 during 2010

    • Sensitivity at L0 trigger

      • Expect a rate of potentially visible (one track in cavern) beam-gas in LHCb at normal pressure of 1E11/1.6E14 * 20% * 11.245 kHz = 1.4Hz/bunch

      • L0 selection efficiency 3.1%  16 Hz @ 368 bunches

      • Also, increased probability to accept single MB event when accompanied by beam-gas

      • For MB events with no pileup L0 selection 3.8%  6%

        • With high pileup effect is less visible  Estimated O(10Hz)

      • At nominal pressure few 10 Hz of beam-gas at 368 bunches

      • Even increasing pressure by 100x is a no worry

      • Increasing vacuum pressure locally will only have a partial effect

      • BUT it adds to particle flux (detector stability and occupancy)


    L0 rate impact on deadtime

    L0 Rate Impact on Deadtime

    • Pure L0 Rate limited by the “L0 Derandomizer” readout scheme

      • 1 clock cycles to put event in

      • 36 clock cycles to read event out : (36*25ns)-1 = 1.1111.. MHz

      • 16 deep

      • Common specs emulated by ODIN to regulate L0

        • Upper water mark 16 events, lower water mark 15 events

           However, write/read controllers more complicated

      • Exception to global specifications:

        • OTIS chip of OT – Proper emulation in ODIN all 2010

        • Beetle of VELO and ST – Work in progress

      • Consequence of no Beetle emulation 

        • Upper water mark 8 events, lower water mark at 3 events

    From L0 Pipeline

    on L0 accept

    Write/read controller

    To TELL1/UKL1


    Derandomizer l0 rate filling schemes

    Derandomizer & L0 Rate & Filling Schemes

    • Deadtime effect of running at high rate with few bunches

      • Deadtime worse with fewer bunches!!

    50ns , 600 colliding bunches

    50ns , 800 colliding bunches

    PHYSICS TRIGGER DEADTIME

    25ns , 2440 colliding bunches

    75ns , 670 colliding bunches


    Injection lhcb a sitting duck

    Injection – LHCb a Sitting Duck

    • Injection Losses from un-captured beam

      • Already difficult in 2010 with 0.3%

      • Expected to get worse 2011 with up to 1%

    • Culmination on October 30 with 8b-injections

      • Shot blew a fuse in CALO HV distribution!

      • 30% BCM levels agree with 30% BLM levels

      • We almost became show stopper

    • Immediate actions

      • LHC: SPS 800 MHz cavity problem and SPS scraping

      • LHCb: Disable 40ms logic during injection phase and raise thresholds (2x-3x)

        • Done in a few hours

      • LHC: Investigate using shifted Abort Gap Cleaning during injection

      • Check timing and origin of splashes with Beam Loss Scintillators

         Improved situation significantly and took us through the year

    B

    Beam 2 from SPS (TI8)


    Injection actions 2011

    Injection –Actions 2011

    • Actions for 2011

      • Switch off/lower HV AND LV of sensitive detectors in LHCb during injection

        • Complicated since we need to configure and run LHCb WELL BEFORE next data taking

        • Requires quite a lot of work on DAQ and CONTROL

      • Check timing and origin of splashes with Beam Loss Scintillator and BCM

        • Injection Quality information from BLS+BCM fed back to LHC on each injection

      • Shielding being investigated together with machine

      • Blind BCM during the injection shot using Injection Pulse on direct fibre from RF

      • No relaxed attitude…

    SPS satellites

    LHC uncaptured


    Injection schemes just an idea

    Injection Schemes – Just an Idea

    • For 75ns

      • ~100    (8) + 4 x (24)

      • ~200    (8) + 8 x (24)

      • ~300    (8) + 12 x (24)

      • ~400    (8) + 8 x (48)

      • ~500    (8) + 8 x (48)  + 4 x (24)

      • ~600    (8) + 12 x (48)

      • ~700    (8) + 8 x (72) + 4 x (24)

      • ~800    (8) + 8 x (72) + 4 x (48)

      • ~900    (8) + 12 x (72)

    • For 50 ns it will be similar progression to max 1400b (!!), maybe something like: 

      • ~100    (12) + 8 x (12)

      • ~200    (12) + 16 x (12)

      • ~300    (12) + 8 x (36)

      • ~400    (12) + 12 x (36)

      • ~5/600 (12) + 8 x (72)

      • ~700    (12) + 8 x (72) + 4 x (36)

      • ~800    (12) + 12 x (72)

      • ~9/1000 (12) + 8 x (108) + 4 x (36)

      • ~1200    (12) + 12 x (108)

      • ~1400   (12) + 12 x (108) + 4 x (36)


    Lhcb re commissioning plan 2011

    LHCb Re-commissioning Plan 2011

    PRELIMINARY

    • Luminosity ramp: back-up to 300 in 50 bunch steps with 75ns

      •  3 weeks  2-3 days per step


    Bunch ramp up

    Bunch Ramp Up

    • From Mike Lamont:

      • 2 to 3 weeks re-commissioning

        • Virgin set-up followed by full validation (loss maps, asynchronous dumps etc.)

      • 2011 – back-up to 300 in 50 bunch steps

        • Would imagine starting with 75 ns

        • 2010 around 4 days (minimum) per 50 bunch step

        • 50 – 100 – 150 – 200 – 250 – 300

        • Around 3 weeks to get back to 300 bunches

      • 100 bunch steps thereafter.

        • 400 – 500 – 600 – 700 – 800 – 900

        • 3 weeks minimum

    • Ultimate parameters for 2011 (Qb:1.6E11 x eN:2E6 x Nb:1400)


    Annual shift summary

    Annual Shift Summary

    • Summary includes 2008 – 2009 – 2010 because individual function counters were not reset

      • Total: 7660 shifts equivalent to 13.4 months of running

    Number of shifts per function

    Shifts

    Number of equivalent months

    Months


    Annual shift summary1

    Annual Shift Summary

    • Each author (507) should have contributed to 15.1 shift slots in this period

    • Total number of shifters: 297

      Each shifter contributed to 25.8 shift slots

    Number of Shifters

    compared to Authors

    Number of Functions


    Annual shift summary2

    Annual Shift Summary

    Pit Shifts

    Offline Shifts

    Number of Shifts

    Piquet Shifts


    Rise or sugar

    Rise or Sugar

    Normalized Shift Contribution


    Shifts 2011

    Shifts 2011

    • Current shift situation (number of shifters we have had):

      • Shift Leader: 53

      • Data Manager: 109

      • Production: 36

      • Data Quality: 58

      • How many are still active and how many are available 2011?  Poll

      • Answer to my mail about availability for 2011 if you are already shifter

      • Answer to my call for shifters at the beginning of next year

    • Refresher and trainings in February – March

      • HV training (sensibilization) and VELO closure

      • Improve training of SL and DM together with sub-detectors

    • Shifter online running instructions, helps and trouble shooting


    Conclusion

    Conclusion

    • A huge thanks to everybody who baby-sat, operated and nursed LHCb!

      • I don’t think we can repeat this enough!

    • Stop meeting and reporting and go back to the office to take care of Our New Year Promises

    • Since I only got 20 minutes for this talk, I’ll stop the conclusion here!

      MERRY CHRISTMAS

      A HAPPY END OF 2010

      HAPPY START 2011

    1 fb-1


    Spare slides

    Spare Slides….


    Workshops on operation 2010

    Workshops on Operation 2010

    • 2010 Running (Autopsy) Postmortem Workshop scope

      • Collect (recall!) flaws and drawbacks from 2010 operation

        • Hopefully with some associated solution

        • If not, what is needed, how do we address it?

      • Works and improvements during shutdown

        • Planning and manpower

      • Needs for re-commissioning and special runs 2011

        • Magnet OFF data preferably at 3.5 TeV

        • Etc

      • Main worries for 2011

      • Sub-detector guesstimates of luminosity tolerance

      • Manpower for next year

    • Will not summarize operational performance 2010 and whole workshop here (obviously…)

      • A veeery long do-list – just main points

      • http://indico.cern.ch/conferenceDisplay.py?confId=113227

      • Revisit situation end of January – beginning February

    • Also reported yesterday on all aspects of operation with beam to LHC in LPC meeting

      • Andreas reported on desiderata for 2011

      • Input to LHC workshop in Evian December and Chamonix

      • http://indico.cern.ch/conferenceDisplay.py?confId=111076

      • See Andreas’ talk next


    Detector operation 2011

    Detector Operation 2011

    • Purely in terms of operation all depends on detector stability

      • Operating at 50ns

        • Experiment conditions

          • Beam-beam effects from bunch behaviour

          • Background (electron cloud + IBS)

          • VELO foil temperature and HV trips

          • Displacing beams (up to several sigmas)

      • Spill-over/signal pileup

        • Spill-over effects in all detectors but RICH

        • Event size at L0

        • Reconstruction performance

           Short 50ns (1 fill @ 100 bunches) run allowed only partially address these

  • 75ns as long as possible and beneficial


  • Luminosity

    Luminosity

    • Two online sources, with several x-checks

      • LHCb detector

      • Beam Loss Scintillator

        • Independent from LHCb DAQ

        • Auto-calibrated with LHCb detector while running

        • Very reliable and versatile

          Combination sent to LHC as delivered lumi

    • Applications

      • Injection quality

      • Background with high time resolution

      • Beam-gas rate monitoring and veto in trigger

      • Luminosity

      • Debunched beam

    • Upgrade of BLS during shutdown

      • Faster PMT(quartz)+cable no spillover and additional scintillators


    Longitudinal scan

    Longitudinal Scan

    X (IP  t=0)

    t~+dT/2

    • Shifting timing of beam 2 +/-1ns

    Z

    ~30 mm

    (20mm with 90 mrad)

    ~10 mm

    (5 mm with 25 mrad)

    ~200 mm


    Longitudinal scan1

    Longitudinal Scan

    • Several questions about results:

    • Indicates something fundamental?

      • T0 good for VELO (Z~0)

      • Bad transversal optimization?

      • Lumi region z-size?

    • Should have done mini-scan after

    • Repeat next year!

    Ratio ~9%

    Did we loose optimization?

    Specific luminosity

    Nominal Physics

    Lumi region z-size decreases strongly when z<0?

    Luminosity


    Vdm scan lumi region movements

    VDM Scan – Lumi Region Movements

    Horisontal

    2 beams 6s

    Vertical

    2 beams 6s

    5mm effect from XY-rotation of 13 mrad

    ~90 mm

    (100mm with 90 mrad)

    ?

    ~40 mm

    (30mm with 25 mrad)

    1200mm

    (6s @170mrad)

    Courtesy C. Barschel


  • Login