Level-2 Interface Board Status David Saltzberg for L2 Group Level-Two Trigger Review December 7, 2001
Overview • Phase 1: L1interface, Clist, XTRPlist, SVTlist • Phase 2: ISOlist, RECES • Phase 3: Muon board (not in this talk) (Phase 1 and Phase 2 have been done in parallel)
Responsible Physicists • L1 interface: Greg Feild* • Clist: Monica Tecchio, Heather Ray* • XTRPlist, SVTlist: Matt Worcester *, Jane Nachtman, D.S. • RECES: Masa Tanaka *, Karen Byrum * • ISOlist: Steve Kuhlmann * , Bob Blair * * = lives within 50 miles of Fermilab ANL engineers (L1,reces,isolist): John Dawson, Bill Haberichter Special operatives: Stephen Miller, Ted Liu, Peter Wittich
Theory of Operation - I • Input data from “Clients” • L1 interface, RECES • one word/event, no handshake • Clist, XTRPlist, SVTlist, Isolist: • variable length data, buffered by FIFO’s • terminated by EE word • Some info transfer about BC, L2B or event count for sync checking
Theory of Operation - II • Output: Control and Data signals via Magic Bus • Master mode (currently all boards except Reces) • L2P issues “STARTLOAD” • When ready, Interface board requests “Boss” • Board is granted Boss from upstream • Board drives block mode data-transfer on Bus • Boss is released by interface board, and MOD_DONE asserted • When all MOD_DONE bits set, L2P begins processing • Slave mode • Board is addressed over Magic Bus and read in single-word transfers • Alternate Output (TRKlist boards) • VME readout
General Error Detecton & Handling • In L2P (every event--10’s kHz) • L2P has 600 sec timeout for all MOD_DONE signals • BC, L2B, or counters checked where possible event by event • Checks for exactly 1 magic bus word from L1 board • If error, pull CDF_ERROR (or equivalent) and ask for automatic Halt-Recover-Run to resynch FIFO’s. • In TrigMon (~ 2 Hz) • Check Number of words transferred for each board • Check BC across system • Exact bit-for-bit comparison of data vs. emulation and/or alternate source • Offline • Run select parts of TrigMon & Monica’s validation code on look area, stream-g, stream-b, l2-torture runs (~1M events lately)
Testing Performance of system • Without Beam • “L2 torture” nominally runs at ~20 kHz • Occasionally have run system at ~40 kHz • Runs system with high L2B occupancy • Test patterns in for COT tracks, SVT tracks, emulate clusters • 9 interface boards & up to 3 alphas connected • With Beam • Same config. • Get real XFT tracks but often have to run SVT test patterns (no SVX) • Have found other problems (sometimes systemwide) that tests w/o beam do not show. (Real world stuff that no teststand will anticipate) • Extensive tests before Oct. shutdown, preliminary Dec. tests.
Current Boss Arb. Kludge • Glitch on BOSSGROUT (pecl) when taking BOSS can lead to two boards taking boss. Since in hardware (not firmware), cannot make simple glitch protection • Solution: • Reduce collision rate by putting different delays in boards’ receiving of STARTLOAD (limits deadtimeless L1A rate at 20kHz--we should have such problems.) • Handle remaining collisions with L2P error handling • New Backplane • In a pinch, could it be fixed with TTL
Board -by-Board Status(follows...) • Status of “best” board • Highest rate tested & error rate • Limit on (or measurement of) bit error rate • Cooperation with other boards • Plans for further work • Status of spares • Number and status of spares • known problems? • Status of Documentation • Debugging tools, here and elsewhere • Plans • Other comments
L1 Interface Board • L2 torture tests • tested at 20-40 kHz no problems • tested ~1M events, no errors tested offline • no collisions with other boards (by construction) • Known problems • noisier than others, but protected in time • still have to connect ground sheild & see • Solving noise here may solve it elsewhere
L1 Interface Board Plots No errors in bit-for-bit comparison
L1 Interface Spares & Debugging tools • Spares • S/N 1 OK • S/N 2 OK (in crate) • S/N 3 3/4 stuffed • Debugging tools • Bit for bit check available offline • If more or less than one word is sent, L2P pulls error • (Pretty simple board, no need for complex diagnostics) • Teststand: Can set bit patterns, check in realtime or later • data source: FRED • data sink: MB to emulator board
L1 Interface Documentation/Plans • DOCS • CDFNOTE 4971 • Webpage: http://hepwww.physics.yale.edu/www_info/yale_cdf/l1crate.html • Schematics have control room hardcopy • PDF files recently sent to Greg-- will put on web and in trigger room • Plans • Keep running • Finish stuffing board #3 (2nd spare) and test • Look into noise problem, not urgent. Wait until after new MB installed
CList Board • Responsibles: Monica Tecchio, Heather Ray • Gets data by fiber from each Locos board • L2 torture tests • works at 20-40 kHz no errors • no errors found in ~1 M events offline • Known problems • crate 04-- had bit 02 is stuck low (probably trivial)
Clist board plots No errors in bit-for-bit comparisons
Clist Debugging tools • Bit-for-bit comparisons done in online/offline monitoring • If L2 buffer number disagrees L2P pulls error • Clusters can be set • pulling cable in DCAS crate makes a known cluster • in principle software exists to make arbitrary cluster pattern at B0 (need to verify) • Michigan teststand capabilities: • Standalone board tests using VME • Data source: Locos • Data sink: MB & L2P • Test full clustering chain DCAS ---> L2P via MB w/ tracer generating multiple L1A’s
Clist Spares/Documentation/Plans • Spares • S/N 1 OK (in system) • S/N 2 flaky VME, otherwise works. • S/N 3 being stuffed • Documentation • webpage for aces, experts & non-experts • http://www-cdf.fnal.gov/internal/cdfoperations/trigger/level2/my.html • will become general L2 webpage (need more disk space) • schematics online in Michigan • hardcopies in trigger room • Plans • Keep running stably with board #1, monitor robustness • Fix flaky VME on board #2 • Make board #3 a second “hot spare”
SVTlist Board Tests • Responsibles: Jane Nachtman Matt Worcester, D. Saltzberg • L2 Torture Testing: • 20-40 kHz L1A no errors (SVX off, running SVT test pattern) • Tested with ~1 M events no bit errors • Special run with checks inside alpha: BER<10-6 • No collisions with other boards • Problems • Gets confused if no EE word from SVT; L2P pulls error. • Due to SVX not sending info to SVX • Known problems in SVX have been fixed, others? • Bill A. thinking about an SVT timeout to pull error • Only happens with beam. Checked (painfully) before shutdown & it worked (could even have taken special oct. SVT runs with it.) • No firmware changes to TRACKlist boards in last 2 months!
Some SVTList Plots • No errors in bit-for-bit comparisons
XTRPlist Board Tests • Responsibles: Jane Nachtman Matt Worcester, D. Saltzberg • L2 Torture Testing: • 20-40 kHz L1A noerrors • Tested with 1 M events no detectable errors • XTRD bank has known errors that cause Ntracks mismatch • Correct at L2, wrong in readout • No errors when cut on Ntrack agreement • Handscan of other events looks okay • No collisions with other boards • Problems • Illinois to fix XTRD bank filling errors • One bad pT bit from one XTRP board
XTRPlist plots No errors in bit-for-bit comparisons when number of tracks agrees.
Spares for TRACKlist • SVTList & XTRPlist are both instances of one board: TRACKlist • CPLD change with JTAG connector • one jumper change • Six production TRACKlist boards • Currently 2 in L2P crate--permanent • Currently 2 in SVT crate --1 or both temporary? • one makes nominal SVTD bank. Convenient for booking SVT crate for test runs • having separate boards effectively makes a cable check • another board in SVT crate makes XTRP list---could be removed soon? • Six production boards, at least 2 required in system, maybe 3. Right now using 4.
TRACKlist spares • S/N 1 & 2: (Prototypes, no longer used.) • S/N 3 XTRPlist OK (in L2P crate) • S/N 4 SVTlist OK -- used for SVTD bank • S/N 5 XTRPlist OK --”hot spare” • S/N 6 SVTlist MB not working, bad connection • S/N 7 SVTlist stuck chisq bit for MB -- used for SVTD bank • S/N 8 SVTlist OK (in L2P crate) All boards work for VME readout
TRACKlist debugging tools • Can send arbitrary pattern from SVT easily • Can send arbitrary pattern from XTRP (more difficult) • Bit-by-bit checking in TrigMon • Can test BC from XTRP & SVT on every event • UCLA teststand: • data source: merger board • data sink: MB and emulator board and/or VME
TRACKlist plans • Keep running stably • Fix one SVT spare (bad connection makes MB error) • Fix one bad bit on another SVT spare • Wean SVT off of second SVT board • Make sure all six boards are “hot spares” • Print hardcopies of schematics & firmware
TRACKList Documentation • Web-pages: • Specs http://buggs.physics.ucla.edu/~nachtman/board/specifications_v1.ps • TIB instructions: http://www-b0.fnal.gov:8000/level2/tib/tib_main.html • TIB database: http://www-b0.fnal.gov:8000/level2/tib/tib_status.html • TIB schematics etc: http://buggs.physics.ucla.edu/~nachtman/tib.html • Schematics on web in .eps format • Need updated hardcopies printed out
ISOlist status • Responsibles: Steve Kuhlmann, Bob Blair • Calculates 5 isolation sums • DCAS->Iso Pick -->ISOlist • Clique ->Isoclique-> ISOlist • L2 Torture tests (or cosmics) • need to require eta-phi match (~1-3% failure) • perfect at 20-40 kHz in all 5 sums • Problems • with collisions see eta-phi match (still 1-3% failure), but L2P can check and pass the event • In 0.5% of events also scatter of expected vs. seen in all 5 sums (less than analog jitter in Run 1) N.B. the whole scatter comes from crate 1, eta=17.
ISOlist spares • In DCAS crates • Need 1 ISOclique (have 2) • Need 6 isopicks (have 8, 1 with stuck bit) • In L2P crate • Need 1 ISOlist (have 2) • All spares are “hot spares” except for 1 isopick with stuck bit.
ISOlist Debugging Tools • Standard running • ISOpick times out if DCAS does not send data • Standalone code: • writes to ISOclique (only board with VME) a seed • tell it to read out fixed values to ISOlation system • can load different values for different buffer numbers • with a switch, can read energies from DCAS. Essentially this “factors” the problem. • TrigMon & Offline Code • Incorporated isolation variables into Monica’s code • Need to debug some boundary values against the hardware • Teststand at ANL • data source: ISOpick • data sink: MB to emulator board
ISOlist Documentation/Plans • DOCS • CDFnote 5788 • Schematics in hardcopy in binders at ANL but will come to trigger room • PDF files of schematics (firmware & hardware) are available, will be placed on web by Heather • Plans • Continue running & monitor robustness • Go after eta/phi mismatch (needs coordination between ANL and Michigan) • Find & fix flaky bit in DCAS crate
RECES status • Responsibles: Masa Tanaka, Karen Byrum • Four boards in L2P crate receive information from SMXR by fiber • During L2 Torture tests (36 kHz) • In crate, on backplane, but not used by default table • No negative interactions • Special L2 executable (TEST_RECES table) • L1 input is crossing trigger and 4 GeV elec, 8 GeV photon • runs at 20kHz L1 input, 100 Hz L2A • Maybe small bit errors -- few thousand events • All SMXR to RECES is okay (at end of shutdown) • Problems • Accidental collisions on Alpha readout • Sol’ns: Arnd’s special retry readout code. Stephen will modify FPGA • possible bit errors (10-3)
RECES Spares/Docs/Plans • Need 4 Reces boards in system • 4 in top crate OK • 2 spare boards OK • Docs • CDF 5132 • Need to put schematics on web & hardcopies in trigger room. • Plans • Keep RECES on backplane during default running • Fix readout problem • Search for BER < 10-4 in standard datataking & fix
Reces Debugging tools • Special standalone code • VME based. Set trigger threshold, load SMXR’s • Send bit patterns to RECES board, Alpha reads through VME • Check bit-for-bit (checks all bits) • 10 Hz (tens of thousands of events OK) • ANL teststand • Not needed any more • TrigMon plots • temperature plots • checks bit-for-bit errors
Interface Boards:The Bottom Line • L2 crate with Clist, XTRPlist, SVTlist, L1 interface, ISOlist all work at up to full speed 20 kHz as-is. • Their bit-error rates are measured < 10-6(RECES not tested to this level yet.) • Essentially all documentation exists. Some tweaks in progress • There is at least one working spare for every board. • Every board has a real expert living close by • Work in progress fixing up extra boards’ bad bits etc. • In current configuration we can fulfill the charge of running jets, electrons and SVT at 5e31 right now, as-is (assuming all clients are working)---”backups” will only distract.
Goals of Sept. workshop(for interface boards) • sync errors <10-6DONE • cut on jets/ “reliable Clist” DONE • “reliable L1 board” DONE • automated HRR DONE • “solve XTRP problem” DONE (don’t remember what is was, but it works) • reliable SVTlist DONE • SVT kludge path DONE • alpha code for cutting on SVT: Simple code DONE, complete cdf4718-lite underway • Solve clist eta/phi errors for electrons: DONE for electrons (iso needs work) • alpha electron code Debugging • prepare firmware without delays for MB testing DONE • test boards on new MBNOT DONE • test isolist and reces DONE • “improve documentation” DONE -- more to do, as always
Suggestions-I • Spares should not be kept in lower crate unless being used. Otherwise water leak (it has happened before!) will destroy all boards. Currently squatting on other spare space...could use space allocated specifically for L2 spares • Need more disk space for L2 webpages on B0 machine. • SVT group should use XTRP list in TL2D and free up spare TRACKlist board • “Clients” should be kept in stable configuration • D-sized plotter in B0 for printing updated Firmware schematics (.eps or .pdf)
Suggestions-II • Need more of the “good” jumpers (white) • Make MagicBus document a CDFNOTE • File cabinet for all L2 docs. Can be different sized schematics and also text documents so folders would work better than one binder. • web “clearing house” for all L2 web documents. Good documentation exists for all boards, just need a list of links (Heather is working on this.) I think we should not over-structure this at this point...leave the microstructure to the individual groups • When given choice of testing kludge path vs. real path, try real first
Suggestions -III • In next 3-6 months, experts (and their supervisors) should think about training their successors. • Need to implement bit-for-bit emulation SIXD--> TL2D into TrigMon • Need someone to write/ implement XFLD-->XTRD emulation • A MB “display” module would be a critical debugging tool (LED’s on each line) much like the old Fastbus display module