180 likes | 281 Views
This document presents the "Road Grader" algorithm developed by Joshua Sopher and David Nygren to enhance data processing in the IceCube project. The algorithm focuses on recognizing and handling simple single photoelectron (SPE)-like waveforms while suppressing irrelevant data to ensure efficient data transmission without loss of critical information. The methodology includes zero-suppression, run-length encoding, and Huffman encoding to reduce data rates, allowing compliance with strict transmission requirements. This approach prioritizes simplicity and stability, ensuring reliable performance in data acquisition.
E N D
"Road Grader" • Joshua Sopher & David Nygren, LBNL • March 20, 2005 • IceCube Collaboration Meeting
Historical perspective • Original notion in PDD: • Most pulses are simple SPE-like waveforms • “Recognize” SPE pulses & process waveforms • Report derived Q & time for these pulses • Don’t process all other complex waveforms • No zero-suppression, report raw waveform • Algorithmic implementation: unpleasant Two processing methods: bad idea
“Road Grader” Algorithm • Perspective: “Simple is Good” • Road grader scrapes up all good data: • zero-suppression + data compression • Samples near baseline & below threshold are unimportant for timing & charge • All fADC & ATWD waveforms treated identically • Very few parameters to meddle with • (and lose track of!) Stability of data guaranteed
Project Goals • Suppress and compress data to meet the data rate requirement for DOM-to-surface data transmission: < 20 kbytes/s/DOM • Realize compressor in firmware to minimize processing time. • Efficient operation within DAQ FPGA design • CPU to be used for state control, message management, etc, not for data processing
Technical description • Waveforms are similar to fax scan lines: • Run-length encoding, followed by Huffman encoding • Suppression replaces baseline data with zeroes • Run-length encoding counts the repetitions of same valued data • Huffman “lite” encoding replaces “zero” bytes with a “zero” bit
Suppression • ATWD and fADC data words are 10 bits wide. • Data below a threshold is replaced by zeros, and data above a threshold is left unchanged • This produces a large run length of zero valued data, for a typical single-pulse waveform
ATWD pre-pulse behavior • Baseline noise is small, ± 2 counts peak-to-peak. • Occasional pre-pulse baseline “shift”: -3 counts • Threshold is set 4 counts above the baseline. • Maximum threshold: 8 - 9 counts • Typical SPE: 200 counts at peak. • Pulse samples with amplitudes above 8 -9 counts (~4% of an SPE) are never suppressed.
Threshold impact • Threshold causes not more than one sample of uncompressed data to be lost. • There will be virtually no loss of useable waveforms due to compression. • Pulse (non-zero) data will be identical to uncompressed data. • Reconstructed pulse has negligible errors.
Run length encoding • Zero-suppressed data is run-length encoded. • Run-length is zero for non-repeated data. • Run-length encoding produces number pairs: data followed by the number of repetitions. • Pre-pulse: 0 0 0 0 0 0,4 • Pulse: 43 89 22 43,0 89,0 22,0
Huffmann encoding • A zero-valued 10-bit word is replaced by a 1-bit wide “zero flag”. • A non-zero flag bit is added to non-zero data, forming a 11-bit word. • Decompression of data requires an additional flag bit 12 bit words.
Compressed data • The compression ratio depends on the sampling rate, the pulse width, and waveform complexity. • Compressed data is 12 bits wide for both repeated zeros & non-repeated non-zero data values. • For a pulse 8 samples wide, with leading and following zeroes, compressed data = 12 + (8 x 12) + 12 = 120 bits = 15 bytes • For a pulse 4 samples wide, with leading and following zeroes, compressed data = 12 + (4 x 12) + 12 = 72 bits = 9 bytes
Data compression ratio • For a 8 samples wide ATWD pulse: the compression ratio = 128 x 10/120 = 10 • For a 4 samples wide fADC pulse: the compression ratio = 256 x10/72 = 35 • Every hit also has an 8-byte header that includes the coarse time-stamp ( 32 bits) + various hit descriptor bits ( 32 bits)
Basic rates • String 21 measured PMT rate = <750 Hz> • LC tag rate (nearest neighbor only) = ~15 Hz • Non-tag rate (mainly SPE) = ~735 Hz • Data rate requirement < 20,000 bytes/s/DOM • This keeps data flow below danger zone: • Network occupancy >50% not allowed
Data flow rate - “Hard” LC • Mode: HardLocal Coincidence • LC tag present: Header + ATWD + fADC data • LC tag absent: no data at all! Hit discarded! • Data rate = (header + fadc + atwd) x tag rate = (8 + 15 + 9) x 15 Hz = 480 bytes/s • Compression is not really needed…but, • All isolated hit data is lost
Data flow rate - “Soft” LC • Operating mode: Soft Local Coincidence • LC tag present: Header + ATWD + fADC data • LC tag absent: Header only, no ATWD, no fADC data • Tagged data rate = (header + (fadc + atwd)) x tag rate = (8 + 15 + 9) bytes x 15 Hz = 480 bytes/s • Non-tagged rate = 8 x 735 Hz = 5880 bytes/s • Sum = 6360 bytes/s • Zero-suppression & run-length encoding needed
Data rates - “Flabby” LC • Mode: Flabby Local Coincidence • LC tag present: Header + ATWD + fADC data • LC tag absent: Header only + fADC data, no ATWD • Tagged data rate = (header + fadc + atwd) x tag rate = (8 + 15 + 9) bytes x 15 Hz = 480 bytes/s • Non-tagged data rate = (8 + (1 + .2) x 9) bytes x 735 Hz = 13,818 bytes/s • Sum = 14,298 bytes/s (reasonable margin)
Possible issues • ATWD baseline may need monitoring • Baseline drift, if any, needs to be tracked • Easy to imagine auto-tracking capability • ATWD pulses over-sampled @ 300 MHz • Typical: ~11 samples/pulse • Why is this? Pulses are wider than expected • Delay line + amps + ATWD driver affect r • PMT gain is probably higher than we need • Pulse tail adds many samples, little information
Summary • “Road grader” is conceptually simple. • Reconstructed pulse fidelity is excellent. • Compression ratio meets project goals. • Implementation is pretty well-tested. • Incorporated in the new FPGA for DAQ. • ATWD issues may need some attention. • No obvious flaws preventing utilization.