Wojtek Skulski University of Rochester

The Universe and the FPGA Digital Pulse Processing for Dark Matter Search Wojtek Skulski University of Rochester W.Skulski University of Rochester, July/10/2008 1

Contributors • Wojtek Skulski (University of Rochester and SkuTek Instrumentation). • Frank Wolfs (University of Rochester). • Eryk Druszkiewicz (University of Rochester). • Large Underground Xenon Detector Collaboration (LUX). W.Skulski University of Rochester, July/10/2008 2

Outline • Introduction: Dark Matter Search. • Pulses from the Dark Matter LnXe detector. • Overview of signal sampling and Digital Pulse Processing. • Why do we need a trigger? • Overview of a few trigger architectures. • Self-triggered digital DAQ becomes Digital Trigger. • Present status of LUX Digital Trigger. • Conclusion. W.Skulski University of Rochester, July/10/2008 3

Digital Pulse Processing for Dark Matter Search • Outputs from the Dark Matter LnXe detector are digitized using flash A/D converters with 12 or 14 bit resolution @ up to 100 MSPS. • Pulses due to radiation and/or DM events are detected in real time. • Waveforms containing events of interest are recorded for analysis. • Electronics has to provide the following: • Low-noise analog front end which receives the signals from the phototubes. • Flash A/D converters, 12 or 14 bit @ up to 100 MSPS. • Field-programmable gate arrays which detect the pulses. • Data readout from the FPGA and archiving for offline analysis. • GUI for diagnostic data display and control of the experiment. • Rochester group is responsible for designing the LUX Trigger System. W.Skulski University of Rochester, July/10/2008 4

History of digitizer development by the author • 2002. The first single-channel digitizer DDC-1: 12 bits @ 48 MSPS. • 2003. The first 8-channel digitizer DDC-8: 10 bits @ 40 MSPS. • Originally developed for PHOBOS experiment (advanced trigger). Unfortunately, PHOBOS was discontinued before the DDC-8 could be used. • 2004. DDC-8/XLM: a minor modification of DDC-8. • 2006. DDC-8 Pro: 12/14 bits @ 64 MSPS. • Developed for student labs. Became the first iteration of the LUXcore trigger: a single, 8-channel board serving 8 phototubes. • 2007. DDC-8 LUX: 12/14 bits @ up to 80 MSPS + System Connector. • More than 8 phototubes need to be served  multiple boards are needed. System Connector was developed to link together multiple DPP boards. The first version of the Event Builder was developed as the “receiving end” for System Connectors. • 2008. DDC-8 DSP: 12/14 bits @ up to 125 MSPS and a large FPGA. • All previous digitizers used flat-pack FPGAs, whose capacity was exceed by our DSP algorithms. A transition to the BGA package was made (finally). • 2008/2009. A new version of the Event Builder with a BGA FPGA is also planned. W.Skulski University of Rochester, July/10/2008 5

A few facts concerning Dark Matter Search W.Skulski University of Rochester, July/10/2008 6

The biggest mystery: where is almost Everything? • Most of the Universe is missing from the books… • Ordinary matter accounts for only 5% of the Universe. We are here Source: Connecting Quarks with the Cosmos, The National Academies Press, p.86. W.Skulski University of Rochester, July/10/2008 7

The 1st smoking gun: galactic rotation is too fast. • Gravitational pull reveals more matter than we can see. Rotation curve of the Andromeda galaxy. Orbital velocity. Observation. Prediction based on visible matter. Distance from the center. Source: Connecting Quarks with the Cosmos, The National Academies Press, p.87. W.Skulski University of Rochester, July/10/2008 8

The 2nd smoking gun: large-scale gravitational lensing. • Light from distant sources is deflected by clusters of galaxies. • Visible mass cannot account for the observed lensing pattern. • Reconstructed mass distribution shows mass between galaxies. Observed lensing. Reconstructed mass distribution. Source: Connecting Quarks with the Cosmos, The National Academies Press, p.89. W.Skulski University of Rochester, July/10/2008 9

What is the Dark Matter composed of? • Nobody knows, but there are candidates predicted by the theory … • Axions: light particles that may explain CP violation. • Neutralinos: heavy particles predicted by SUSY. • The neutralino is neutral, weakly interacting, and as massive as an atom of gold. • Very rarely it will bounce off an ordinary nucleus and produce some ionization. • Our experiment will attempt to detect neutralinos deep underground where the background from cosmic rays is very low. • We will use a two-phase liquid xenon (LNXe) detector named Large Underground Xenon detector (LUX). W.Skulski University of Rochester, July/10/2008 10

Detectors for Dark Matter Search W.Skulski University of Rochester, July/10/2008 11

Underground low-background laboratory Cosmic particles stopped by 1 km of rock. Dark Matter particles penetrate freely. NB: LUX will be at DUSEL, but here I am showing Boulby. W.Skulski University of Rochester, July/10/2008 12

LUX detector prototype W.Skulski University of Rochester, July/10/2008 13

LUX detector consists of many channels Gas Xe 144 phototubes Each phototube requires an independent ADC and the data-processing channel Liquid Xe W.Skulski University of Rochester, July/10/2008 14

The principle of 2-phase xenon detector Gas inlet HV HV gas 1.5 cm Grids liquid S2 2.5 cm S1 S1: scintillation in liquid Xe. S2: electroluminescence in gas Xe. Quartz PMT Figure from: J.T.White, Dark Matter 2002. http://www.physics.ucla.edu/hep/DarkMatter/dmtalks.htm Figure from: T.J.Sumner et. al., http://astro.ic.ac.uk/Research/ Gal_DM_Search/report.html W.Skulski University of Rochester, July/10/2008 15

Explanation of signal from a 2-phase xenon detector Primary pulse from the liquid Xe. Electron drift time in LnXe. Amplified pulse from the gas Xe. Time flows this way Figure from: T.J.Sumner et. al., http://astro.ic.ac.uk/Research/Gal_DM_Search/report.html W.Skulski University of Rochester, July/10/2008 16

Signal processing in a single channel LnXe detector Primary scintillation in liquid phase. Secondary scintillation in gas phase (electroluminescence). • Extract the areas under S1, S2, and the separation time between the S1 and S2. • Time-stamp the data in order to correlate pulses in different channels. Figure from: T.J.Sumner et. al., http://astro.ic.ac.uk/Research/Gal_DM_Search/report.html W.Skulski University of Rochester, July/10/2008 17

Multi-channel signal processing • Each phototube is connected to an independent ADC and the data-processing channel, which extracts S1, S2, and the time interval between S1 and S2. • The distribution of light among the PMTs tells where the interaction happened within the volume. • Channels are not independent. They are correlated. In addition to processing individual channels, the correlated inter-channel processing is also necessary. • The data acquisition system has a fairly advanced architecture explained in the next section. 144 phototubes W.Skulski University of Rochester, July/10/2008 18

Electronics for Dark Matter Search W.Skulski University of Rochester, July/10/2008 19

Digital Pulse Processing (DPP) • The pulse-processing electronics can be either traditional analog, or digital. The latter has advantages over the former: higher integration, more flexibility, and lower cost. It also has a slight disadvantage: it needs to be programmed. • Outputs from the Dark Matter LnXe detector are digitized using flash A/D converters with 12 or 14 bits @ several tens MSPS (e.g., 64 or 100 MSPS). • Pulses are detected in real time  Digital Pulse Processing has to be implemented. • Electronics has to provide the following: • Low-noise analog front end which receives the signals from the phototubes. • Flash A/D converters, 12 or 14 bits @ several tens of MSPS. • Field-programmable gate arrays which detect the pulses with DPP algorithms. • Data readout from the FPGA and archiving for offline analysis. • GUI for diagnostic data display and control of the experiment. W.Skulski University of Rochester, July/10/2008 20

Functional diagram of a single DPP channel DPP = Digital Pulse Processing Gain and offset control Sampling clock Analog input stage Nyquist filter ADC Sample rate processor Event rate processor Analog signal input Pulse information output Waveform memory Optional external trigger in/out Trigger Individual channel trigger output analog digital W.Skulski University of Rochester, July/10/2008 21

Functional diagram of a multichannel DPP board One DPP board Single channel Board-level event processor (Formatting, compression, etc.) Digital interface: readout, monitoring, and setup Analog ADC Single channel Analog ADC To event builder Single channel Analog ADC Single channel Analog ADC Board-wide trigger logic Slow control To slow control From trigger subsystem W.Skulski University of Rochester, July/10/2008 22

Functional diagram of a multiboard DPP system DPP board Event-builder DPP board Signals from detectors DPP board Recording DPP board DPP board Subset of signals from detectors Trigger system Slow control monitor Network W.Skulski University of Rochester, July/10/2008 23

Which data is interesting? Useful data Not useful Baseline, not useful Time flows this way • Select useful data (so-called events) and reject baseline data. • Typical rejection ratio is larger than 1000. Figure from: T.J.Sumner et. al., http://astro.ic.ac.uk/Research/Gal_DM_Search/report.html W.Skulski University of Rochester, July/10/2008 24

Estimated rates of “interesting” data samples Assumptions: 8 channels, 14-bit @ 64 MHz 200 ms ADC trace per event (contains only the “interesting” samples) 1000 events per second, and 200 ms fully recorded per event The period of “200 ms” covers the electron drift time in LUX detector 112 megabytes / second per channel 22.4 kilobytes / second per channel 448 kilobytes / second per DPP board Single channel Board-level event processor (Formatting, compression, etc.) Digital interface: readout, monitoring, and setup Analog ADC To event builder Single channel Analog ADC Single channel Analog ADC Single channel Analog ADC 8 channels One DPP board W.Skulski University of Rochester, July/10/2008 25

Why is an FPGA necessary? Sampling clock 64 MHz Sample rate processor Event rate processor ADC 14 bit Waveform memory Trigger 112 megabytes / second per channel Times 8 channels  896 megabyte / second (each board) FPGA is the only device which can continously process such data rates. W.Skulski University of Rochester, July/10/2008 26

Why is trigger necessary? Unrestricted data rate 896 megabyte / second (each board) Such data rates can be neither managed nor recorded. Pre-selected data rate becomes manageable and can be recorded. DPP board Event-builder DPP board Signals from detectors DPP board Recording DPP board DPP board Subset of signals from detectors Trigger system Trigger subsystem pre-selects only “good data” to be recorded. W.Skulski University of Rochester, July/10/2008 27

Limitations of the backplane architecture and the solution: point-to point fast serial links W.Skulski University of Rochester, July/10/2008 28

Limitations of the VME backplane readout • Assume 200 msec of waveform memory per channel. • 14-bit ADC means 15 bits (because of the ADC overflow bit). • For simplicity let’s say 1 sample = 16 bits. • Full waveform is 12,800 @ 64 MSPS, or 20,000 @ 100 MSPS (one ADC channel). • Trigger data means (pulse area, pulse width, time stamp) = four 16-bit words = 8 bytes. • If using VME interface, then two types of data transfer cycle are available: • MBLT transfer, 4 bytes (32 bits) per transfer  40 MB/s = 40 bytes / ms • 2eVME transfer, 8 bytes (64 bits) per transfer  80 MB/s = 80 bytes / ms Trigger data 4 * 16-bit Full event data (full waveforms) 20,000 16-bit words per channel Trigger data  event builder. 144 channels * 8 bytes = 1152 bytes = 29 ms (MBLT). Waveforms event builder. 144 channels * 40 kbytes = 5.760 Mbytes = 144 ms (MBLT). It means that the VME system can read only 7 full events per second, using 32-bit MBLT protocol, or 14 events per second using the 2eVME protocol. The limitations of the PCI readout will be roughly similar . W.Skulski University of Rochester, July/10/2008 29

Implications of the backplane performance estimate The VME system can read no more than 7 full events per second, using the 32-bit MBLT transfer @ 40 MB/s. (No more than 14 full events/s using the 2eVME protocol @ 80 MB/s). Moreover, these rates will be achieved at 100% deadtime, what is not a good situation. What can we do? 1. Use the trigger to pre-scale the event rate to only 7 “interesting events per second”. 2. Do not read full waveforms. Read only the pulses, and skip the quiescent baseline. 3. Compress the waveforms to reduce the transfer rate. 4. Use faster transfer rate. Ad 1. Now we see, why the trigger is of such importance in this project. Ad 2. “Baseline suppression” seems unavoidable in this situation. Ad 3. Real-time compression requires appropriate FPGA resources. Ad 4. A point-to-point serial data link offers MUCH HIGHER data transfer rate than VME. W.Skulski University of Rochester, July/10/2008 30

Backplane switching currents may induce noise An additional consideration is digital noise which may be injected into the sensitive analog inputs by large switching currents inherent in the single-ended backplane such as VME or PCI. Both are old-style single-ended bus interfaces involving large transient currents. In such high-current environment it may not be possible to attain millivolt noise level. It means, that during the measurement the VME interface has to be kept inactive. The measurement cannot be restarted until the VME readout is over. The VME readout period is defining the dead time of the DAQ system. Vendors of commercial VME or PCI digitizers claim that the above does not happen. However, in an experiment which is pushing the detection limits towards low-amplitude signals, it is prudent to verify whether or not digital switching currents are indeed harmless. A radical solution is to use low-noise standard such as USB-2 or HDMI, which employ Low-Voltage Differential Signalling (LVDS). Only 3.5 mA per link is being switched in a differential mode, which avoids inducing noise in nearby circuits. W.Skulski University of Rochester, July/10/2008 31

Fast serial link  fast trigger decision, but similar event readout • Do not use a backplane such as VME or PCI. Use differential point-to-point data links. • HDMI: four differential pairs carry up to four DDR data streams from each DPP board to the event builder. Very high data rate can be pushed through a short HDMI cable. • Flat-pack FPGA packages limit the signaling rate to 200 Mbps. (Implies limitation: 200 Mbits/s * 4 links = 100 Mbytes/s per cable.) • BGA packages allow the signaling rate of 622 Mbps. (Implies limitation: 622 Mbits/s * 4 links = 311 Mbytes/s per cable.) • 100 MB/s from each board means a huge improvement over a backplane readout, because all links operate in parallel. • The most dramatic improvement is when transferring short trigger packets from the DPP boardsto the Event Builder, because the trigger information can fit into the on-chip FPGA memory at the receiving end. The trigger readout rate is dramatically improved. • There is not enough memory in the receiving Event Builder FPGA to accept the entire waveforms from all DPP boards. They need to be transferred out at the USB-2 rate, which is similar to the backplane rate. Therefore, the full event readout rate is not improved. W.Skulski University of Rochester, July/10/2008 32

Bottlenecks of the digital pulse processing system W.Skulski University of Rochester, July/10/2008 33

Three bottlenecks of the digital recording system • A typical digital system has three bottlenecks, which have to be tackled in system design. • Raw Data Bottleneck. • The rate at which a digital system can produce raw data is staggering. For example, a modest system of 100 channels, 14 bits @ 100 MSPS produces 17.5 gigabytes per second. No matter, how large are on-board waveforms memories, they will quickly overflow with raw data. A digital system must either provide a) method to offload the waveforms memories at the same rate, at which they are being filled, or b) it must limit the rate of data production to a more manageable level. • Data Recording Bottleneck. • Data must be recorded. A typical recording medium (a disk or a tape) can take roughly 100 MB/s. A disparity between #1 and #2 is roughly 103. • Analysis Bottleneck. • Data must be analyzed. If we record at 100 MB/s, then we are recording 1 GB every ten seconds. Analysis can take 10x more CPU effort than recording (optimistically). Therefore, improving the recording rate (bottleneck #2) will exacerbate the analysis glut. W.Skulski University of Rochester, July/10/2008 34

How to remedy the three bottlenecks? Some digital DAQ systems take only occasional records of data, followed by long periods of inactivity (e.g., a digital oscilloscope which is recording occasional pulses). In such cases, the average data rate is low, and there is no problem. However, in some cases the DAQ has to work full time, all the time. In such cases we have to tackle the aforementioned bottlenecks. Raw Data Bottleneck can be addressed with real-time data reduction (i.e., digital pulse processing) which extracts only the relevant pulse characteristics, such as amplitude, duration, and pulse shape parameters. Rather than storing the full waveform, we can store only a few numbers. The reduction factor can be 10x or more. Even larger reduction can be achieved, when pulses are infrequent, separated by long stretches of uninteresting baseline. The baseline does not have to be recorded at all. Data Recording Bottleneck cannot be improved, because any improvement in this area will impact the next bottleneck. (Whatever is recorded, has to be analysed.) Analysis Bottleneck can be eased if the pulse data have already been preprocessed. Real-time pulse processing is the key to designing a large digital DAQ system. W.Skulski University of Rochester, July/10/2008 35

What is needed to implement real-time pulse processing? Real-time pulse processing is the key to achieving the data reduction necessary in a large digital DAQ system. There are three key ingredients of real-time digital pulse processing. Adequate raw processing power needs be present. One has to employ FPGAs on the digitizer boards because only the FPGAs provide parallel operation on parallel sample streams. Both FPGAs and digital signal processors (DSPs) may be used on the downstream boards, depending on requirements of a particular DAQ system. Pulse processing algorithms have to be implemented in the FPGA fabric. The FPGA implementation of even simple algorithms (e.g., pulse detection) can be tricky. It requires expertise, which is not always available in physics community. Confidence, that the digital pulse processing is working and it is safe. Physicists are very reluctant when it comes to discarding the data. Data reduction does lead to discarding the data, and therefore it is viewed with suspicion. From the safety point of view, the prefered solution is to record every sample, and then to analyze the recorded data from disk. Such a preference leads to bottlenecks mentioned on the previous slide. W.Skulski University of Rochester, July/10/2008 36

Can we trust real-time pulse processing? • Physicists are very reluctant to discard the data. Digital pulse processing leads to discarding the data. It is therefore a valid question, whether real-time digital pulse processing is a good idea? • Pulse processing needs to be adequately developed, understood, and tested in order to be reliable. • The critical portion of waveforms can be recorded in order to perform offline crosschecks with the DPP results. • In the good-old-days the analog pulse processing was performed all the time. We did not worry, that by doing so we were discarding good data, because there were no “good data” to be recorded other than the output from our shaping amplifiers. Nowadays we need to realize, that “shaping amplifiers” could as well be named real-time analog filters. Such analog filters are neither better nor worse than the digital filters. If we used to trust the former, we should also accept the latter. Whether the filter is digital or analog, it will be equally trustworthy, when it is properly designed. • Digital pulse processing is necessary. It needs to be accepted after it is understood. W.Skulski University of Rochester, July/10/2008 37

How is trigger implemented? W.Skulski University of Rochester, July/10/2008 38

What is trigger doing? Interesting events Background events Noise From detector Strobe signal to DAQ Trigger system Trigger subsystem pre-selects all interesting events, a representative sample of background events, and also some noise. A strobe signal (a pulse) is sent to DAQ to trigger its operation, such that the accepted data can be digitized and recorded. W.Skulski University of Rochester, July/10/2008 39

How is trigger implemented? • Trigger is such an important topic, that we need to look at the implementation details of the trigger. • Three different ways of implementing the trigger: • Analog computer built from off-the-shelf NIM1) modules. • Small DPP system processing a subset of the detector signals. • The main DPP system also performs the triggering function (i.e., the DAQ system is self-triggered). • 1) Nuclear Instruments and Methods modules (NIM) are shown later. W.Skulski University of Rochester, July/10/2008 40

How is trigger connected to the DAQ? The role of the trigger is to pre-select “interesting events”. DAQ DPP board Event-builder DPP board Signals from detectors DPP board Recording DPP board DPP board Subset of signals from detectors Trigger system Strobe signal (a pulse) to the DAQ. W.Skulski University of Rochester, July/10/2008 41

Trigger 1: analog computer built from NIM modules Analog signals processed in analog domain Five crates full of NIM modules Very many LEMO cables W.Skulski University of Rochester, July/10/2008 42

Pros and cons of NIM implementation • Advantages: • Knowledge about programmable logic is not needed. • Very little training required to work on a NIM system. (Every physicist knows how to use an oscilloscope and how to adjust trimming potentiometers.) • Practically unlimited number of analog inputs. (However, system complexity grows very quickly with the number of analog inputs, as shown in the previous slide.) • Disadvantages: • Tedious to document. The only documentation are oscilloscope screen shots and a written record in a logbook. • Very difficult to reproduce the settings. • Flaky operation because of many knobs, switches, and connectors. • Tuning cannot be performed remotely.continued... W.Skulski University of Rochester, July/10/2008 43

The main advantages of NIM implementation Why is the antiquated analog approach still being used? • Very little learning is required to operate the equipment. • Signals in NIM electronics can be examined anytime with oscilloscope. No planning is required concerning signal diagnostics. • Very wide bandwidth and fast rise time of pulses (approx. 1 ns). • Timing is adjusted in analog domain. According to common wisdom, analog means infinitely fine time adjustments between pulses. (But in practice the fine time adjustment is very tricky.) • Timing can be measured and adjusted to a few tens of picoseconds. W.Skulski University of Rochester, July/10/2008 44

Trigger 2: a single Digital Pulse Processor DDC-8 Pro (LUXcore trigger system) Analog signals digitized and then processed in digital domain FPGA Spartan3-400 performs digital processing in real time 8 analog signals from phototubes NIM IN, 2 lines from optional NIM analog computer NIM OUT, 2 lines The NIM OUT pulse is sent to DAQ system W.Skulski University of Rochester, July/10/2008 45

Pros and consof a single DPP solution • Advantages: • Tuning can be performed remotely. • Trigger can be documented in full by saving all the settings and configuration files. • Settings can be reproduced exactly by reloading configuration files. • Mechanical knobs, switches, and connectors are eliminated  reliable operation. • Disadvantages: • Only a few out of many phototubes can participate in the trigger decision. • Knowledge about programmable logic is needed to develop or modify any DPP system. Very few physicists have working knowledge about programmable logic. W.Skulski University of Rochester, July/10/2008 46

Trigger 3: a few Digital Pulse Processors This architecture will be used for the LUX trigger A few front-end boards A single Level-2 decision board Decision pulse sent to DAQ system Fast uni-directional data links W.Skulski University of Rochester, July/10/2008 47

Pros and consof the several DPP solution • Advantages: • (A few advantages similar to the previous case of a single DPP board.) • Much larger number of inputs. • Disadvantages: • (A few disadvantages similar to the previous case of a single DPP board.) • This architecture is in fact the same as the DAQ architecture. Plainly speaking, this is a small and less powerful DAQ system. Why invest time and effort in trigger-only solution? Why not design and build a COMPLETE self-triggered DAQ instead? Why bother having two PARTIAL systems rather than one COMPLETE system? W.Skulski University of Rochester, July/10/2008 48

Trigger 4: self-triggered COMPLETE DAQ system Front-end DPP boards Level-2 event builders Level-3 event builder Recording Fast bidirectional data links. (Bidirectional in order to send the trigger decision back to the front-end boards.) W.Skulski University of Rochester, July/10/2008 49

Self-triggered DAQ system: how it works. Front-end DPP boards Level-2 event builders Level-3 event builder Recording Trigger data Full event data (full waveforms) 1. All front-end boards send “trigger data” downstream to the central event builder. 2. The event builder makes the trigger decision (accept, reject) and sends it back. 3. If the decision is “accept”, then the front-end boards send the waveforms. The communication is carried over very fast, point-to-point, bidirectional data links. W.Skulski University of Rochester, July/10/2008 50

Wojtek Skulski University of Rochester

Wojtek Skulski University of Rochester

Presentation Transcript

DSpace at the University of Rochester

University of Rochester Participation in CDF

Dr. Don-yun Chen University of Rochester

University of Rochester Fusion Science Center

University of Rochester

PATENT REFORM University of Rochester

闫锐 University of Rochester

Kevin O’Connor, University of Rochester Lisa Perhamus, University of Rochester

Amnon Harel aharel@fnal University of Rochester

UNIVERSITY OF ROCHESTER Established 1850

Kevin McFarland University of Rochester

Steven Barnett MD University of Rochester

University of Rochester July 28, 2005 steven.manly@rochester

University of Rochester Participation in CDF

Elizabeth Groves University of Rochester

UNIVERSITY OF ROCHESTER CTSI

Neil Blumberg MD University of Rochester Medical Center, Rochester, NY

by Wojtek Graniczewski

University of Rochester Purchasing Card Program Summary

University of Rochester, NY

DSpace at the University of Rochester