Novel approach to the particle track reconstruction based on deep learning methods

XXVI Symposium on Nuclear Electronics and Computing -NEC'2015 Montenegro, Budva, Becici, 25-29 September, 2017 Novel approach to the particle track reconstruction based on deep learning methods P.Goncharov, S.Mitsyn, G.Ososkov, LIT JINR, Dubna, Russia ososkov@jinr.ru A.Tsytrinov Gomel State Technical University, Gomel, Belarus G.Ososkov et al, Deep learning for tracking NEC-2017

NICA-MPD-SPD-BM@N General view of the NICA complex with the collider experiment MPD and experiments MPD, SPD, BM@N G.Ososkov et al, Deep learning for tracking NEC-2017

BM@N experiment Our problem is to reconstruct tracks registered by the GEM vertex detector with 6GEM-stations inside the magnet. All data for further study was simulated in the framework MPDRoot with Geant-4 program Baryonic Matter atNuclotron(fix-target experiment) General view of the GEM detector inside the magnet UrQMD event Au-Au, 4 A·GeV, average multiplicity~500 Simulated C-C event() the same event, view along beam G.Ososkov et al, Deep learning for tracking NEC-2017

Problems of microstrip gaseous chambers The main shortcoming is the appearance of fake hits caused by extra spurious strip crossings Forn real hits one gains - nfakes • Real hit (electron avalanche center) - Spurious crossing One of ways to decrease the fake number is to rotate strips of one layer on a small angle (5-15 degrees) in respect to another layer mμ 160 800 680 Two dimensional readout board. Note: Strips are crossing by a small angle Upper layer strips are more narrow than lower layer strips Angle between strips 90 degrees, UrQMD event Au-Au, 4 A·GeV Angle between strips 15 degrees, UrQMD event Au-Au, 4 A·GeV - hit - fake Although small angle between layers removes a lot of fakes, pretty much of them are still left G.Ososkov et al, Deep learning for tracking NEC-2017

Event reconstruction is the key problem of HENP data analysis • Event reconstruction is one of the most important problems in the modern high energy and nuclear physics (HENP). It consists on determination of parameters of vertices and particle tracks for each event. • Traditionally tracking algorithms based on the combinatorial Kalman Filter have been used with great success in HENP experiments for years. • However, the initialization procedure needed to start Kalman Filtering requires a tremendous search of hits aimed to obtain so-called “seeds”, i.e. initial approximations of track parameters of charged particles. • Besides these state-of-the-art techniques are inherently sequential and scale poorly with the expected increases in detector occupancy in new conditions as for planned NICA experiments. • Machine learning algorithms bring a lot of potential to this problem due to their capability to model complex non-linear data dependencies, to learn effective representations of high-dimensional data through training, and to parallelize on high-throughput architectures such as GPUs. G.Ososkov et al, Deep learning for tracking NEC-2017

Tracking by directed search BM@N tracking means a combinatorial search through many hits and thousands of fakes situated on sequential stations for namely such hits that belong to some of tracks, i.e. lying on a smooth curve. Starting from every hit on some station one should search for a corresponding hit on the next station. One of ways to reduce immense combinatorics is to use the curve smoothness to predict some smaller area for searching on the next station. Hits belonging to some of tracks are situated on sequential stations along a particle way through the detector. They form a dynamical system like a movie, but unfortunately, ordinary neural networks, even deep ones with many layers, deep belief networks and such advanced nets as convolutional neural networks are designed to manipulate with static objects. To handle dynamic objects neural net should possess a kind of memory. Therefore we choose two step tracking, starting from a preprocesing intended to find all possible track-candidates by a directed search followed then by applying a deep recurrent neural network G.Ososkov et al, Deep learning for tracking NEC-2017

Two step tracking 1. Preliminary search Magnetic field direction BM@n magnetic field is not homogenious, but fortunately, due to its vertical direction we have track YoZ projection close to straight line and XoZ projectionclose to a circle. Z Hence we choose 2D combined search. Сonsidering the approximate vertex position as a “virtual zero hit” , we apply least square fit in YoZ for straight line from h0 sequentially through every stations with selection of acceptable hits. In XoZ the selection between corresponding hits to join them into track-candidats is done with “sinus criterion” of closeness of angle sinesbetween three adjacent segments. Due to applying KD-tree algorithm (https://arxiv.org/abs/cs/9901013) we speed up our algorithm significantly reducing the search area on the every next station to a rather small elliptic square. G.Ososkov et al, Deep learning for tracking NEC-2017

Sinus criterion in XoZ projection In XoZ the selection between corresponding hits to join them into track-candidats is done with “sinus criterion” of sinus closeness of angles between three adjacent segments. Admissibility of hitscan be expressed as where and Wfield is a correction for the magnetic field inhomogeneity Wfield, h2 h0 Upper track 1 is accepted, while track 2 is rejected G.Ososkov et al, Deep learning for tracking NEC-2017

Preprocessing results Input data for the first step algorithm were simulated by GEANT in MPDRoot framework for the real BM@N configuration. Real track True found track White dots are both hits and fakes Ghost track G.Ososkov et al, Deep learning for tracking NEC-2017

2d step. Track-candidate classification After the preprocessing, we obtain a set of track-candidates, which should be divided in two groups – tracks and ghosts. • Any of track-candidate presents as a sequence of points indicated by three coordinates - features. Each timestep of sequences is the specific point on kth station of detector, so for 6 detector stations we have a matrix with 3*6=18 features to define a track-candidate. • As a result of our study, we found the neural net family named Recurrent Neural Networks (RNN) which are able to process sequential data. RNN are networks with loops in them, that allows information to be passed from one step of the network to the next. In our study we have to choose the most proper RNN type, to optimize their structure and parameters in order to achieve higher efficiency and speed up training. • In particular, we found that normalization of input is rather harmful in our case, since when it is applied, the training efficiency just stuck around 65% and optimization doesn’t converge to the global minima. • Eventually the deep network constructed from gated Recurrent Unit (GRU) was found as the most suitable for tracking and allowed us to obtain the first results on data from the 1st step algorithm. G.Ososkov et al, Deep learning for tracking NEC-2017

The first results of track-candidate classification by deep neural networks After series of experiments we found the best architecture and parameters for our deep neural classifier of track-candidates. We trained our network on two datasets:(thanks to D.Baranov) • small dataset with 80K real tracks and 80K ghost seeds • big dataset with 82 677 real tracks and 695 887 ghosts • Testing efficiencyis the same for both attempts, trained on small and big dataset, and equals to 97.5%. G.Ososkov et al, Deep learning for tracking NEC-2017

Recurrent neural networks One fragment A of RNN is shown on scheme. It takes input value xtand outputs value ht.There is just a common NN with one hidden layer inside of this cell A. A loop allows information to be passed from one step of the network to the next. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. An unrolled recurrent neural network This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They are the natural architecture of neural network to use for such data. However in order to let RNN to be able to remember information for long periods of time, it is necessary to improve its structure up to Long Short-Term Memory (LSTM) network. LSTM is a special kind of RNN capable of learning long-term dependencies. G.Ososkov et al, Deep learning for tracking NEC-2017

Long Short Term Memory (LSTM) The core idea of LSTM is a kind of memory named the cell state that works like a conveyor belt for running information. Instead of having a single neural layer, as in RNN chain like structure of LSTM includes four layers interacting in a very special way. These layers are capable to protect and control the cell state with the mechanism of gates – filters that optionally let information through. They are composed out of a sigmoidal layer and a pointwise multiplication operation and have the ability to remove or add information to the cell state. • LSTM has four of these gates what are operating as follows: • decide what information we are going to remove from the cell state. • 2. decide what new information we are going to store in the cell state.Realized • In two layers • 3. update the old cell state, Ct−1 , into the new cell state Ct. G.Ososkov et al, Deep learning for tracking NEC-2017

Gated Recurrent Unit (GRU) A slightly simpler version of LSTM is GRU. It combines the «forget» and «input» gates into a single «update gate». In our work we prefer to use GRU because switching to LSTM gives almost the same efficiency, while slowing down an execution speed of one training epoch, e.g. 108s with LSTM vs 89s with GRU. G.Ososkov et al, Deep learning for tracking NEC-2017

Choosing method of optimization Four RNNs with one hidden layer which consists of 64 GRU neurons were simultaneously trained by different optimizersto compare validation efficiencies. G.Ososkov et al, Deep learning for tracking NEC-2017

Root Mean Square (RMS) Backpropagation The key idea of RMSProp is to update less weights, which are updated too often, but instead of the full amount of updates, we will use the square of the gradient averaged over the history. Consider is the gradient of the objective function w.r.t. to the weight at the time step , so the mean square gradient value at the step , , will be: where is a weight decay parameter. Hinton suggests to be set to 0.9. Weights updates as follows: where is a learning rate and epsilon approximately equals to 1e-10. G.Ososkov et al, Deep learning for tracking NEC-2017

Model selection MaxPool is a maximum pooling layer which is done by applying a max filter to non-overlapping subregions of the input of this layer. G.Ososkov et al, Deep learning for tracking NEC-2017

Dropout vs overfitting An issue with LSTMs is that they can easily overfit training data, reducing their predictive skill. Dropout helps prevent weights from converging to identical positions. It does this by randomly turning neurons off when forward propagating proceeds. It then back-propagates with the only neurons turned on. We applied 30% dropout to each of GRU layers.Using of special recurrent dropout prevents overfitting, althought reduces total efficiency, but by half a percent only. G.Ososkov et al, Deep learning for tracking NEC-2017

The final structure of the deep neural classifier • After series of experiments we found the best architecture for our track-candidates classifier with five hidden layers Processing 6500 track-candidates takes only 1s on HybriLIT virtual machine with one Nvidia Tesla M60. • Testing efficiency achieved is 97.5%. G.Ososkov et al, Deep learning for tracking NEC-2017

Conclusion and outlook • Two-step tracking algorithm was proposed; • Preprocessing based on greed KD-tree2D search allows to extract possible track-candidates on the first stage of trackng; • Deep recurrent network on the second stage classifies track-candidates in two groups: true tracks and ghosts; • Test classification efficiency is on the level of 97.5%; • Trained RNN can process 6500 track-candidates in one second on the single Nvidia Tesla M60; • We are going to speed up the preprocessing stage by using either parallel computations or proper neural network. G.Ososkov et al, Deep learning for tracking NEC-2017

Thanks for your attention! G.Ososkov et al, Deep learning for tracking NEC-2017

Back up slides G.Ososkov et al, Deep learning for tracking NEC-2017

Sigmoid cross-entropy loss As there are only two classes: tracks and ghosts, – the output of neural network will be a single neuron with sigmoid activation, that represents the probability of track-candidate to be a track. In non-track case the probability value should be close to zero. Sigmoid activation function: Cross-entropy loss: – true label – predicted probability by RNN G.Ososkov et al, Deep learning for tracking NEC-2017

Convolutional Neural Networks for image recognition Recall to motivate: 1. DBN training time is too long 2. unsupervised learning algorithms for training RBMs by approximate maximum likelihood meet an intractable problem of measuring the likelihood that is to be optimized. 3. Direct applying regular neural nets to image recognition is useless because of two main factors: (i) input 2D image as a scanned 1D vector means the loss of the image space topology; (ii) full connectivity of NN, where each neuron is fully connected to all neurons in the previous layer, is too wasteful due to the curse of dimensionality, besides the huge number of parameters would quickly lead to overfitting. . Instead, neurons of Convolutional Neural Networks (CNN) in a layer are only connected to a small region of the layer before it (Le Cun & Bengio, 1995, see also http://cs231n.github.io/convolutional-networks/) G.Ososkov.P.Goncharov, Deep networks MMCP-2017

Basics of CNN architecture The CNN architecture is a sequence of layers, and every layer transforms one volume of activations to another through a filter which is a differentiable function. There are three main types of layers to build CNN architectures: Convolutional Layer, Pooling (subsampling) Layer, and Fully-Connected Layer (just MLP with backprop). There are also RELU (rectified linear unit) layers performing the max(0,x) Convolutional layer Activation: ReLU Input or feature map Example of classifying by CNN Each Layer accepts an input 3D volume (x,y,RGBcolor) and transforms it to an output 3D volume. To construct all filters of convolutional layers our CNN must be trained by a labeled sample with the back-prop method. See https://geektimes.ru/post/74326/ in Russian or tps://en.wikipedia.org/wiki/Convolutional_neural_network Max pooling with a 2x2 filter G.Ososkov.P.Goncharov, Deep networks MMCP-2017

Convolutional feature extraction Using only three features per time step is too small for our problem, so we perform 32 one dimensional convolutions for extracting more features. In convolutional neural networks (CNNs), 1D and 2D filters are not really 1 and 2 dimensional. Each 1D filter is actually 3x3 filter, which performed only in one dimension across 6x3 matrix of features. As we applied padding, i.e., adding zeros across bound of features matrix, each 1D filter convolution gives a 6x1 vector. The Convolution layer will eventually output a matrix of 6x32. G.Ososkov et al, Deep learning for tracking NEC-2017

Bidirectional RNNs Bidirectional RNNs based on the idea that the output at time t may not only depend on the previous elements in the sequence, but also future elements. It realized by stacking two RNNs on top of each other. The output is then computed based on the hidden state of both RNNs. Using of one bidirectional and one regular GRUlayers brings faster convergence and more robustness with great amount of data. G.Ososkov et al, Deep learning for tracking NEC-2017

Novel approach to the particle track reconstruction based on deep learning methods

Novel approach to the particle track reconstruction based on deep learning methods

Presentation Transcript

sPHENIX Track Reconstruction

A Learning-Based Approach to Reactive Security *

An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web

Novel Simulation Methods in the Particle-In-Cell Framework Warp

Deep Learning on Hadoop

Track Reconstruction: the trf toolkit

An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web

Fast Track Methods to Baldrige- Based Organizational Assessments

Summary on T2 Track Reconstruction

Individual Particle Reconstruction

Track Reconstruction in the STAR TPC with a Cellular Automaton Based Approach

Progress on SUSY particles reconstruction methods

Problem-based Learning Approach

Project-based learning assessment methods: students influence on their learning

Novel Approach on Performance Based Aseismic Design Based on FEMA Requirements

Deep Learning based Pain Treatment

A Baseline Based Deep Learning Approach of Live Tweets