ICNNB2005 Plenary Speech V isual Perceptual Learning

ICNNB2005 PlenarySpeechVisual Perceptual Learning Zhongzhi Shi Qingyong Li Hong Hu Zheng Zheng shizz@ics.ict.ac.cn Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China Zhongzhi Shi VPL-ICNNB05

Outline • Introduction • Classification-oriented sparse coding model • Selective attention model based on response saliency • Tolerance relation based granular computing model • Conclusions Zhongzhi Shi VPL-ICNNB05

Visual Pathways Zhongzhi Shi VPL-ICNNB05

Visual Pathways Dorsal pathways analyze motion and spatial relationships between the body and visual stimuli. Ventral pathways analyze form with specific regions identifying colors, faces, letters and other stimuli. Zhongzhi Shi VPL-ICNNB05

Visual Information Processing Zhongzhi Shi VPL-ICNNB05

Visual perceptual learning Goal • Probe into visual system. Visual perceptual learning should be considered as an active process that embeds particular abstraction, reformulation and approximation within the Abstraction framework. • Model the vision information processing mechanism. Neural representation, attention mechanism • Guide the computer vision research. Feature extraction Feature binding Object recognition Zhongzhi Shi VPL-ICNNB05

Perception is entirely data driven. “Non-constructivist” or direct perception. Optic array - patterns of light reaching retina gives a texture gradient -> depth perception. Visual perceptions have their own ‘affordances’. Affordances are salient perceptual characteristics that suggest the use of an object e.g., an umbrella. How could a three dimensional image be derived from affordances. Gibson’s ecological theory Zhongzhi Shi VPL-ICNNB05

The whole visual percept is more than the sum of parts. Visual illusions. A visual percept can be interpreted in more than one way therefore we must have a representation of visual information in our mind. Gestalt theory of perception Zhongzhi Shi VPL-ICNNB05

Sensory input is chaotic, unstable, and distorted. It must be interpreted. The perceiver generates predictions about the nature of sensory input. Visualperception is indirect constructive based on hypothesis testing a cognitively mediated process Empiricism (Richard Gregory) Zhongzhi Shi VPL-ICNNB05

Image processing theory of recognition from vision as it is data driven. It starts with input to the perceptual system in the form of the retinal image. Marr then describes four different stages of visual information processing. Marr’s theory of vision Zhongzhi Shi VPL-ICNNB05

Grey level description Primal sketch. 2.5 Dimensional sketch. 3 Dimensional model sketch Marr’s theory of vision Zhongzhi Shi VPL-ICNNB05

Three levels of analysis Marrian framework for understanding complex information processing systems (Marr, 1982) • Computational theory • Goals of computation, appropriateness of the goal, general strategies • Representation/Algorithm • How to represent the input and the output • Algorithms for transforming from one representation to another • Implementation • How can the representation and algorithm be realized physically (architecture, hardware)? Zhongzhi Shi VPL-ICNNB05

Visual Perception • Efficient coding hypothesis: the goal of visual perception is to produce an efficient representation of the incoming signal (Attneave 1954). • How to establish a precise quantitative relationship between environmental statistics and neural processing? Zhongzhi Shi VPL-ICNNB05

Related work • Biologic approach: examine the statistical properties of neural responses under natural stimulation conditions. Sparse coding and decorrelation in primary visual cortex during natural vision. Science, 287:1273-1276, Feb 2000. Retinal ganglion cells act largely as independent encoders. Nature, 411:698{701, June 2001. • Computation approach: use the statistical properties of natural images to constrain or derive a model for early sensory processing. • Sparse coding model Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607-609, 1996. • Independent component analysis model The 'independent components' of natural scenes are edge filters. Vision Research, 37 23):3327-3338, 1997. Zhongzhi Shi VPL-ICNNB05

An image would be represented by a small number of ‘active’ neurons, ai , out of a large set. Which neurons are active varies from one image to another. The distribution of activity on any given unit should be peaked around zero with heavy tails. Such a distribution will have low entropy, as opposed to a Gaussian distribution What is sparse coding Zhongzhi Shi VPL-ICNNB05

Why is sparse coding • It allows for increased storage capacity in associative memories; • It makes the structure in natural signals explicit; • It represents complex data in a way that is easier to read out at subsequent levels of processing; • It saves energy. Zhongzhi Shi VPL-ICNNB05

Sparse coding model(Field) Olshausen pointed out a perceptual system is exposed to a series of small image patches, drawn from one or more large images, just like the CRF of neurons. Imagine that each image patch, represented by the vector I (numbered row-wise), has been formed by the linear combination of N basis functions. The basis functions form the columns of a fixed matrix, A. The weighting of this linear combination is given by a vector, s. Each component of this vector has its own associated basis function, and represents a response value of a neuron in vision system. The linear synthesis model is therefore given by: Linear superposition model with basis functions Zhongzhi Shi VPL-ICNNB05

Sparse coding model(Field) Olshausen and Field applied two criteria to seek the optimal basis vector and the coefficients Sparseness cost function Minimize the cost function Zhongzhi Shi VPL-ICNNB05

Classification-oriented Sparse Coding Model for Pattern Classification • Sparse coding model just states how information should be represented efficiently. • What information should be represented is more important for visual perception task. • Computations in the early visual cortex are rather interactive and plastic, subject to influence from perceptual inference, task requirement and behavioral experience. Zhongzhi Shi VPL-ICNNB05

Computations in the early visual cortex (Lee) • Feedforward and one-layer network is limited. • Various levels in cognitive and sensory systems have to work together interactively. • Multi-layer model integrating the unsupervised sparse coding principle and supervised feedback Zhongzhi Shi VPL-ICNNB05

Classification-oriented Sparse Coding Model for Pattern Classification Goal of COSC model • Sparseness for the coefficients • Discriminable for the pattern classification task • Combining unsupervised and supervised learning Zhongzhi Shi VPL-ICNNB05

Classification -oriented Sparse Coding Model for Pattern Classification • Training pattern sets and coefficients • Distance between two coefficient vectors • Within-class distance measures • Between-class distance measures the distance between a coefficient vector and the center of class which excludes the vector. Zhongzhi Shi VPL-ICNNB05

Classification -oriented Sparse Coding Model for Pattern Classification • Fisher-like discriminate distance • Cost function Zhongzhi Shi VPL-ICNNB05

Classification -oriented Sparse Coding Model for Pattern Classification Learning process by optimization • Object function • Two nested stages Inner stage: minimize E respects to s with fixed A by conjugate gradient method. Outer stage: minimize E with respect to the A by gradient descent method. Zhongzhi Shi VPL-ICNNB05

Whiten/low-pass filter： Experiment data set and preprocess Experiment Data set Zhongzhi Shi VPL-ICNNB05

Experiment result The set of 144 basis functions learned by the COSC. All have been normalized to fill the grey scale, but with zero always represented by the same grey level. Zhongzhi Shi VPL-ICNNB05

Experiment result- sparseness performance Zhongzhi Shi VPL-ICNNB05

Simple classifier: Experiment result Reconstruction error comparison Classification performance comparison Zhongzhi Shi VPL-ICNNB05

Summary • COSC model can code class-specific features. • The coefficients of COSC notablely improved the classification accuracy, without distinctly damaged the performance of reconstruction error and sparseness. • COSC model is interactive and plastic model supervised by visual perception task. Task-oriented Sparse Coding Model for Pattern Classification. Lecture Notes in Computer Science, Vol. 3610/2005, pp. 903-914. Learning Sparse and Discriminative Structures in Natural Images for Visual Classification. Submitted to Network: Computation in Neural Systems. Zhongzhi Shi VPL-ICNNB05

Attention-guided visual sparse coding model • The number of variable which has a large value produced by sparse coding model is relatively large compared with the computation capacity of neurons, though the kurtosis of every response coefficient is also high. • A typical scene within the neuron’s classic receptive field (CRF) contains many different patterns which compete for neural representation because of the limited processing capacity of neurons in the visual system. • Vision attention mechanism is an active strategy in information processing procedure of brain. Zhongzhi Shi VPL-ICNNB05

Attention-guided visual sparse coding model General model • The first attention module performs a transformation of the image into a ‘retinal image’, nonuniformly sampling the input visual simuti. • The second attention module performs the selective attention based on response saliency. The diagram of the model. Zhongzhi Shi VPL-ICNNB05

Nonuniform sampling model • The density of photoreceptors in the retina is greatest in the central area (fovea) and decreases to the retinal periphery • The resolution of the image representation in the visual cortex is highest for the part of the image projected onto the fovea and decreases rapidly with distance from the fovea center. Vision sampling model Zhongzhi Shi VPL-ICNNB05

Nonuniform sampling model Recursive computation of the Gaussian-like convolution Zhongzhi Shi VPL-ICNNB05

Nonuniform sampling model The input image patch is represented: within the central circle the pixels are full sampled just as the original image, with lower resolution within the first ring surrounding the central circle, and with the lowest resolution within the third circle. Zhongzhi Shi VPL-ICNNB05

Selective attention model • Definition : Response saliency is the response extent for a neuron compared with a group of neurons which respond to the same stimulus. • The purpose of the response saliency is to represent the conspicuity of every neuron in the same perception level for a stimulus and to guide the selection of attended neuron, based on the value of response saliency. • The neuron response that has great response saliency value will be chosen to further process. On the contrary, the neuron that has small value will be omitted. Zhongzhi Shi VPL-ICNNB05

Selective attention model Every such pattern is selective for location, orientation and frequency • Center of the excitatory subregion as the location selectivity • Angle (in degree) between the x-axis and the major axis of the ellipse as orientation • Area of the excitatory subregion as frequency Zhongzhi Shi VPL-ICNNB05

Selective attention model • Discrepancy between Ai and S • Response saliency (RS) value Zhongzhi Shi VPL-ICNNB05

Selective attention model Selection strategies : • Threshold selection mechanism (TSM) TSM is a threshold filtering algorithm • Proportion selection mechanism (PSM) PSM is a bottleneck filtering algorithm Zhongzhi Shi VPL-ICNNB05

Simulation results Histogram of the coefficient in the model for an input image patch. (a) The original response coefficient produced by sparse coding. (b) The response saliency value. (c) The response coefficient after vision attention, selected the frontal 40% response coefficient sorted by response saliency descendly. Zhongzhi Shi VPL-ICNNB05

Simulation results The input image patch and the reconstructed image. The first column is the original image; the second column is the image reconstructed by the full coefficients produced by sparse coding; the third column is the image reconstructed by the selected coefficient by this model. Zhongzhi Shi VPL-ICNNB05

Simulation results Reconstruction errors of the sparse coding model (SC) and attention-guided sparse coding model (AGSC) Zhongzhi Shi VPL-ICNNB05

Summary • This model includes nonuniform sampling module and saliency-based data-driven module, in the framework of efficient coding hypothesis. • This model prominently reduces the number of activated coefficients for an input stimulus but also remains the main essential vision information. • This model designs and implements an active and efficient mechanism to adapt to the limited computation capability and improve the efficiency for sparse coding. A model of Attention-guided Visual Sparse Coding. In Proc. IEEE International Conference on Cognitive Informatics, pp 98-104. California, USA, 2005. Zhongzhi Shi VPL-ICNNB05

Outline • Introduction • Classification-oriented sparse coding model • Selective attention model based on response saliency • Tolerance Relation Based Granular SpaceModel • Conclusions Zhongzhi Shi VPL-ICNNB05

What is Granular Computing? • “There are three basic concepts that underline human cognition: granulation, organization and causation. • Informally, granulation involves decomposition of whole into parts; • Organization involves integration of parts into whole; • Causation involves association of causes with effects. • Granulation of an object A leads to a collection of granules of A, with a granule being a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality” (Zadeh 1997) Zhongzhi Shi VPL-ICNNB05

What is Granular Computing • An umbrella term to cover any theories, methodologies, techniques, and tools that make use of granules in problem solving. • A subset of the universe is called a granule in granular computing. • Basic ingredients of granular computing are subsets, classes, and clusters of a universe. Zhongzhi Shi VPL-ICNNB05

Cognitive activities can be viewed as some kind tolerance spaces in a function space. Motivation of Tolerant Relation Based Granular Computing Model In 1962, Zeeman proposed that cognitive activities can be viewed as some kind tolerance spaces in a function space. The tolerance spaces, which are constructed by distance functions based tolerance relations, is used for stability analysis of dynamic system by Zeeman. Tolerance spaces based on distance functions are developed for the modeling and analysis of information granulation. Zhongzhi Shi VPL-ICNNB05

Most of the models and methods discuss symbolized data and consecutive real value data respectively. Motivation of Tolerant Relation Based Granular Computing Model • The entities on data layer processed by Granular Computing usually belong to two types: symbolized data or consecutive real value data. • Most of the models and methods of granular computing discuss symbolized data or consecutive real value data respectively. Zhongzhi Shi VPL-ICNNB05

Motivation of Tolerant Relation Based Granular Computing Model • Symbolized feature and real value feature can be generated from each other by feature extraction, feature reduction, classification or discretion, etc. • So, we try to construct a uniform granular computing model to study some important problems in pattern recognition and machine learning, such as feature extraction, feature reduction, discretion and classification, etc. Zhongzhi Shi VPL-ICNNB05

ICNNB2005 Plenary Speech V isual Perceptual Learning

ICNNB2005 Plenary Speech V isual Perceptual Learning

Presentation Transcript

Perceptual - Motor Skill Learning

Plenary v. Concurrent Powers

Perceptual Learning, Roving and the Unsupervised Bias

CORTICAL SELF-ORGANIZATION AND PERCEPTUAL LEARNING

Perceptual Evaluation of Speech Quality (PESQ)

Learning Delay: Language/Speech

Auditory Perceptual Assessment of Voice and Speech Disorders

Pragmatically-guided perceptual learning

PERCEPTUAL LEARNING AND CORTICAL SELF-ORGANIZATION

ISUAL Mass Memory

Perceptual-Motor Learning

ISUAL Flight Software

Clinical Examination of V isual System

Session 6 : Perceptual Development and Learning Capacities

Treating Speech-Sound Problems: Articulatory, Perceptual, or Phonological Intervention

Learning Language from its Perceptual Context

Learning Language from its Perceptual Context

S tudent A udio- V isual E -Tool