1 / 66

Neuromorphic Signal Processing for Auditory Scene Analysis

Neuromorphic Signal Processing for Auditory Scene Analysis. Jose C. Principe, Ph.D. Distinguished Professor and Director Computational NeuroEngineering Laboratory, University of Florida Gainesville, FL 32611 principe@cnel.ufl.edu http://www.cnel.ufl.edu. Table of Contents.

berthapeck
Download Presentation

Neuromorphic Signal Processing for Auditory Scene Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neuromorphic Signal Processing for Auditory Scene Analysis Jose C. Principe, Ph.D. Distinguished Professor and Director Computational NeuroEngineering Laboratory, University of Florida Gainesville, FL 32611 principe@cnel.ufl.edu http://www.cnel.ufl.edu

  2. Table of Contents • The need to go beyond traditional signal processing and linear modeling. • Examples: • Dynamic Vector Quantizers. • Signal to Symbol Translators • Entropy Based learning as a model for information processing in distributed systems.

  3. DSP for Man-Made Signals • Digital Signal Processing methods have been developed assuming linear, time invariant systems and stationary Gaussian processes. • Complex exponentials are eigenvectors of linear systems • FFTs define frequency in an interval • Wiener filters are linear optimal for stationary random processes. • Markov models are context insensitive

  4. Neurobiology reality • In order to become more productive we should develop a new systematic theory of biological information processing based on the known biological reality. • Decomposition in real exponentials (mesoscopic) • Local time descriptors (spike trains) • Nonlinear dynamical models • Adaptive distributed representations

  5. Why delay a Neuromorphic Theory of Signal Processing? • A revamped framework is needed to understand biological information processing. It should be based on the distributed nature of the computation, the nonlinear nature of the dynamic PEs, the competition and association of the interactions at different space temporal scales. • Here we will be showing three examples of how the addition of dynamics have impacted conventional models and is helping us find new paradigms for computation.

  6. Protocol for Time-varying modeling

  7. Protocol for Time-varying modeling

  8. Types of Memory • Generalized feedforward (gamma memory -- see Principe et al) • Spatial feedback

  9. Temporal SOM Research • Basically two approaches for temporal self-organizing map (SOM): either memory is placed at the input (embedding) or at the output. See: • Kangas: external memory or hierarchical maps • Chappell and Taylor, Critchley: time-constant at each PE • Kohonen and Kangas: hypermap • Goppert and Rosenstiel: combined distance to input and distance to last winner

  10. SOMs for Dynamic Modeling • Principe et al. applied temporal SOMs for local nonlinear dynamical modeling. • SOMs were used to cluster the NASA Langley supersonic wind tunnel dynamics. From the SOM weights, optimal filters were derived to predict the best control strategy to keep the tunnel at the optimum operating point.

  11. SOMs for Dynamic Modeling • See also Ritter and Schulten.

  12. Biological Motivation - NO • Nitric Oxide (NO) exists in the brain • NO produced by firing neurons • NO diffuses rapidly with long half-life • NO helps control the neuron’s synaptic strength (LTP/LTD) • NO is believed to be a “diffusive messenger” • Krekelberg has shown many interesting properties.

  13. Biological Activity Diffusion • Turing’s Reaction-diffusion equation • Biological method of combining spatial (reaction) info with temporal info (diffusion) • R-D Equations can create wave-fronts • need excitable, nonlinear kinetics, and relaxation after excitation • Example Fitzhugh-Nagumo equations

  14. Temporal Activity Diffusion-TAD • Goal is to create a truly distributed, spatio-temporal memory • Similar to NO diffusion in the SOM outputs • Activity “diffused” to neighboring PEs • lowers threshold of PEs with temporally active neighbors • creates temporal and spatial neighborhoods

  15. SOM-TAD • Models diffusion with a traveling wave-front • Activity decays over time

  16. SOM-TAD Equations • Exponential decay of activity at each PE • Activity creates traveling wave (build-up) • Winner selected including “enhancement” • Normal SOM update rule

  17. SOM-TAD Memory • TAD creates a spatially distributed memory

  18. SOM-TAD Application • Adjustable wave-front speed and width • Temporally self-organize spoken phonemes • words ‘suit’ and ‘small’ • Sampled at 16KHz, 3 bandpass filters (0.6-1.0 Khz, 1.0-3.5 KHz, and 3.5-7.4 KHz) • See also Ruwisch, et. al.

  19. Phoneme Organization s Probabilities with TAD m a u t l Probabilities without TAD

  20. Phoneme Organization Results Winners and Enhancement

  21. Plasticity • Temporal information creates plasticity in the VQ Without temporal info With temporal info

  22. Tessellation Dynamics This demonstration shows how the GasTAD uses the past of the signal to anticipate the future. Note that the network has FIXED coefficients. All the dynamics that are seen come from the input and the coupling given by the temporal diffusion memory mechanism. Run Demonstration

  23. VQ Results VQ=[27,27,27,27,27,27,27,27,27,27,27,27,27,27] VQ=[12,12,16,16,25,25,25,25,27,27,27,27,27,27 GAS-TAD removes noise from signal using temporal information to “Anticipate” the next input

  24. VQ for Speech Recognition • GAS-TAD used to VQ speech and remove noise using temporal information • 15 speakers saying the digits one through ten -- 10 training, 5 testing • Preprocessing: • 10 KHz sampling, 25.6 ms frames, 50% overlap • 12 liftered, cepstral coefficients • Mean filtered 3 at a time to reduce input vectors

  25. Trainable VQ

  26. Training • Each VQ trained with 10 instances of desired digit plus random vectors from other 9 digits

  27. Recognition System • An MLP with a gamma memory for input was used for recognition • Winner-take-all determines digit

  28. System Performance • Compare no VQ (use raw input), vs. NG VQ, vs. GAS-TAD VQ • GAS-TAD VQ reduces errors by 40% and 25% • HHM provided 81% (small data base)

  29. Conclusions • TAD algorithm uses temporal plasticity induced by the diffusion of activity through time and space • Unique spatio-temporal memory • Dynamics that can help disambiguate the static spatial information with temporal information. Principe, J., Euliano N., Garani S., “Principles and Networks for self-organization in space time”, Special Issue on SOMs, Neural Networks, Aug 2002 (in Press).

  30. New paradigms for computation • Interpretation of real world requires two basic steps: • Mapping signals into symbols • Processing symbols For optimality both have to be accomplished with as little error as possible.

  31. New paradigms for computation • Turing machines process symbols perfectly But can they map signals-to-symbols (STS) optimally? • I submit that STS mappings should be implemented by processors that learn directly from the datausing non-convergent (chaotic) dynamics to fully utilize the time dimension for computation.

  32. New paradigms for computation • STS processors interface the infinite complexity of the external world with the finite resources of conventional symbolic information processors. • Such STS processors exist in animal and human brains, and their principle of operation are now becoming known. • This translation is not easy if we observe the size of animal cortices…..

  33. New paradigms for computation • Our aim (w/ Walter Freeman) is to construct a neuromorphic processor in analog VLSI that operates in accordance with the nonlinear (chaotic)neurodynamics of the cerebral cortex. • Besides hierarchical organization, nonlinear dynamics provides the only known mechanism that can communicate local effects over long spatial scales. Except that chaos does not need any extra hardware.

  34. Freeman’s K0 model • Freeman’s modeled the hierarchical organization of neural assemblies using K - (Katchalsky) sets • The simplest (K0) is a distributed, nonlinear, two variable dynamic model

  35. Freeman’s PE (KII) • The fundamental building block is a tetrad of K0 nodes interconnected with fixed weights The Freeman PE functions as an oscillator. Frequency is set by a,b and the strength of negative feedback.

  36. Freeman’s KII model • An area of the cortex is modeled as a layer of Freeman PEs, where the excitatory connections are trainable. This is a set of coupled oscillators in a space-time lattice. adaptive adaptive

  37. Freeman’s KII model • How does it work? PEs oscillate (characteristic frequency) when an input is applied, and the oscillation propagates. The space coupling depends on the learned weights, so information is coded in spatial amplitude of quasi-sinusoidal waves.

  38. Channels 1 2 3 ..... 20 ..... Freeman’s KII model

  39. Freeman’s KIII model • The olfactory system is a multilayer arrangement of Freeman PEs connected with dispersive delays and each layer with its natural (noncomensurate) frequencies. End result: the system state never settles, creating a chaotic attractor with “wings”.

  40. Freeman’s KIII model • How does it work? • With no input the system is in a state of high dimensional chaos, searching a large space. • When a known input is applied to the KII network the dimensionality of the system rapidly collapses to one of the wings of the attractor of low dimensionality. • “Symbols” are coded into these transiently stable attractors.

  41. M1 M1 M1 M1 - - - - + + + + - + - + - + - + + + + + M2 G2 M2 G2 M2 G2 M2 G2 - - - - - + - + - - + - - + - - + - + - + - + - G1 G1 G1 G1 M1 - + - + + M2 G2 - - + - + - G1 M1 - + - + + M2 G2 - + - - + - G1 Freeman’s KIII model PG Layer + + + + + + + + + + + + + + + + P P P P To all Ps + + + + + + + + + + + + + + + + + + + S + + + + From all M1s To all G1s - - - - - - - - - - + - - - - f1(.) S + + AON Layer (Single KII Unit) + f2(.) + f3(.) + PC Layer (Single KII Unit) + - f4(.) EC Layer C

  42. Channels 1 2 ..... 3 8 ..... Freeman’s KIII model All these attractors can be used as different symbols

  43. Conclusion • Coupled nonlinear oscillators can be used as signals to symbol translators. • The dynamics can be implemented in mixed signal VLSI chips to work as intelligent preprocessors for sensory inputs. • The readout of such systems is spatio-temporal and needs to be further researched. Principe J., Tavares V., Harris J., Freeman W., “Design and implementation of a biologically realistic olfactory cortex in analog VLSI”, in the Proc. IEEE, vol 89,#7, 1030-1051, 2001.

  44. Information Theoretic Learning • The mean square error (MSE) criterion has been the workhorse of optimum filtering and neural networks. • We have introduced a new learning principle that applies to both supervised and unsupervised learning based on ENTROPY. • When we distill the method we see that it is based on interactions among pairs of “information particles”, which brings the possibility of using it as a principle for adaptation in highly complex systems.

  45. A Different View of Entropy • Shannon’s Entropy • Renyi’s Entropy Shannon is a special case when

  46. Quadratic Entropy • Quadratic Entropy (a=2) • Information Potential • Parzen window pdf estimation with Gaussian kernel (symmetric)

  47. IP as an Estimator of Quadratic Entropy • Information Potential (IP)

  48. Information Force (IF) • Between two Information Particles (IPTs) • Overall

  49. Entropy Criterion • Think of the IPTs as outputs of a nonlinear mapper (such as the MLP). How can we train the MLP ? • Use the IF as the Injected error. • Then apply the Backpropagation algorithm. Minimization of entropy means maximization of IP.

  50. Implications of Entropy Learning • Note that the MLP is being adapted in unsupervised mode, with a property of its output. • The cost function is totally independent of the mapper, so it can be applied generically. • The algorithm is O(N2).

More Related