Neural Network Classification of Simulated AUGER data

Neural Network Classification of Simulated AUGER data Giuseppe Longo 1 on behalf of the Naples team (Aramo, Ambrosio, Donalek,, Tagliaferri) Department of Physical Sciences - University of Napoli “Federico II”, Sezione di Napoli; longo@na.infn.it Valencia, october 2003

SOM • Self-Organizing Map (SOM) is an unsupervised neural network (Kohonen, 1982, 1988) tailored for the visualisation of high-dimensional data. • A SOM consists of neurons (10 ~ 1000) organized on a regular low-dimensional grid. • Each neuron is represented by a d-dimensional weight vector, where d is equal to the dimension of the input vectors. Neurons are connected to the adjacent neurons by a neighborhood relation, dictating the topology (structure). • They are automatically organized into a meaningful two-dimensional order in which neurons modelling similar data, are closer to each other in the grid than the dissimilar ones. In this sense the SOM is a similarity graph, and a clustering diagram , too. • A SOM is trained iteratively. For each vector x from the input data set, distances between it and all weight vectors of the SOM are calculated using some distance measure (typically euclidean distance). The neuron whose weight vectors is closest to the input vector x is called the Best-Matching Unit (BMU).

Relation between U-matrix and maps • In each figure, the hexagon in a certain position corresponds to the same map unit. In the U-matrix, addional hexagons exists between all pairs of neighbouring map units. • The component plane (parameter) and U-matrix may have colour bars showing the scale for the variable. The scale shows by default the values that the variables have in the map structure.

Data Mining: SOM - U Matrix - plain components (CDF)

Upper: cell structure Lower: smoothed (close to confidence levels) U-matrix The Unified distances matrix (U-matrix) visualizes distances between neighbouring map units, and thus shows the cluster structure of the map. Similar color area correspond to the neurons that recognize similar objects. It is calculated using all variables.

Data Mining: SOM - U Matrix - plain components

Validation • where on the map a specific data sample is located • The simplest answer is to find the BMU of the data sample. • localization can be performed also using only a subset of parameters. S S

One can also investigate whole data sets using the map. Here is the response for of a set of 50 similar objects

Other Visualization 3D U-Matrix Similarity coloring BARPLANE shows a barchart in each map unit.

Intriguing pattern recognition problem: Curves are very similar in shape

UNSUPERVISED SOM (120 nodes) SOM similarity coloring map: each hexagon represents a neuronand different colors denote different clusters. neurons are labeled using simulated data: (A=proton; B=Helium; C=Oxygen; D=Iron). P = 34% success rate, He = 30%, O = 28%, Fe = 41%

SUPERVISED MLP Results for two MLP (22 hidden neurons, SoftMax activation function) Panel a: coniugate gradient optimization algorithm. Panel b: discendent gradient optimization function.

Something which is going on in Napoli (DSF-INFN) • Centro Calcolo Parallelo e GRID Dipartimento di Scienze Fisiche, CNR, INFN, INFM, INVG (3 Beowulf ca. 512 nodi, 1 IBM 64 processori)

The future: Generative Topograhic Mapping The Generative Topographic Mapping (GTM) model was introduced by Bishop et al. (1998) as a probabilistic re-formulation of the self-organizing map (SOM). It overcomes the limitations of the SOM, while introducing, no significant disadvantages.

S.O.M. versus G.T.M. The SOM algorithm is not derived by optimizing an objective function • SOM does not define a density model • Neighbourhood preservation is not guaranteed by the SOM procedure. • There is no certainty that the code-book vectors will converge using SOM. • GTM instead: • In GTM the neighbourhood-preserving nature of the mapping is an automatic consequence of the choice of a smooth, continuous function y(x; W). • GTM defines an explicit probability density function in data space. In contrast, • Convergence of the batch GTM algorithm is guaranteed by the EM (Expectation Maximization) algorithm

How GTM works We define a probability distribution p(x) on the (projected) latent variable space, this will induce a corresponding distribution p(y|W) in the data space. p(x) is a prior distribution of x. Since in reality the data will only approximately lie on a lower-dimensional manifold, it is appropriate to include a noise model for the t vector, for example Spherical Gaussian.

The pdf in t-space, for a given value of W, is then obtained by integration over the x-distribution (1) we take p(x) This form of p(x) allows the integral in (1) to be performed analytically and the distribution function in data space takes the form of a Gaussian mixture model :

To maximise it respect to W and b, we can use the likelihood However, usually is more convenient to use the log-likelihood

EM algorithm for GTM Given some initial values for W and b , the E-step for the GTM is the same as for a general Gaussian mixture model, computing the responsibilities That corresponds to the posterior probability that the n-th data point was generated by the k-th component. Then M-step maximize parameters W and b

Summary of the GTM algorithm

The GTM Learning Process The plots show the density model in data space at iteration 0 (the initial configuration), 1, 2, 4, 8 and 15. data points are plotted as ‘o’ while the centres of the Gaussian mixture are plotted as '+'. centres are joined by a line according to their ordering in the latent space.

Visualization An important potential application for the GTM is visualization. We define a probability distribution in the data space conditioned by the latent variable. We can therefore use Bayes' theorem, in conjunction with the prior distribution over latent variable, p(x), to compute the corresponding posterior distribution in latent space for any given point in data space, t, as

Plane Latent Space Latent space posterior probability distribution for all dataset points

Latent space Plane Latent Space posterior probability distribution for all Galaxies

Latent space Plane Latent Space posterior probability distribution for all objects class. A

All points B A Not classified

Neural Network Classification of Simulated AUGER data

Neural Network Classification of Simulated AUGER data

Presentation Transcript

application of neural network

Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis

Channel Assignment using Chaotic Simulated Annealing Enhanced Hopfield Neural Network

Neural Network-based Sleep Wake Classification

Neural Network

Neural Network

Application of simulated data

Artificial Neural Network (Back-Propagation Neural Network)

Neural Network

Neural Network

Neural Network

Breast Cancer Diagnosis via Neural Network Classification

Neural network (II) — HNN Hopfield Neural Network

Neural-Network Combination for Noisy Data Classification

Neural Network Analysis of Flow Cytometry Immunophenotype Data

-Artificial Neural Network- Hopfield Neural Network(HNN)

NEURAL NETWORK

Artificial Neural Network in Data Mining

Deep Convolutional Neural Network for Hyperspectral Image Classification

Neural-Network Combination for Noisy Data Classification