Hyundo Kim Natural Computation Group

Developmental Constraints Aid the Acquisition of Binocular Sensitivitiesby Melissa Dominguez, and Robert A. JacobsClass presentation for CogSci260 Spring 2004 Hyundo Kim Natural Computation Group

Developmental Progression 1 • Human infants are born with limited perceptual, motor, and cognitive abilities relative to adults. • These apparent inadequacies may be helpful, perhaps necessary stages in development. • Limited mental abilities may reflect simple neural representations that are useful stepping-stones or building blocks for the subsequent development of more complex representations (Turkewitz, and Kenney, 1982) Natural Computation Group

Developmental Progression 2 • Early developmental stages are useful or necessary precursors to more advanced stages. • Sometime think it of as ‘Bootstrapping strategy’. • The developmental schedule for the maturation of different brain regions is staggered such that neural systems that develop relatively early provide a suitable framework for the development of later experience-sensitive systems. • Newport(1990) hypothesized that children use a bootstrapping strategy when attempting to learn a language. ( “less is more”) Natural Computation Group

Developmental Progression 3 • Elman(1993) showed that recurrent neural network whose memory capacity was initially limited but then gradually increased during the course of training learned aspects of an artificial grammar better than a network whose memory capacity was never limited. • Elman claimed that starting small is important to the subsequent acquisition of complex mental abilities. • This article considers the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. Natural Computation Group

Developmental Progression 4 Author applies this idea to: • Binocular disparity sensitivity • Motor learning • Motion Velocity Sensitivity Natural Computation Group

Infant Development • Experimental data from developmental psychology show that newborns’ visual acuity is very poor, of about 1 or 2 cycles per degree (20/400), whereas in adults, it is usually 30 cycles per degree of arc (20/20). • Acuity improves approximately linearly from 20/400 to 20/20 in about eight months. • Infants are acquiring other visual abilities during this time period. • Sensitivity to binocular disparities appears at age of about 4 months. • Authors speculate that poor visual acuity aids in the acquisition of disparity sensitivity. Natural Computation Group

Basic Structure • Figure 1. The developmental and non-developmental models shared a common structure. The bottom portion of this structure is the left and right retinal images. These images are then filtered by binocular energy filters. The outputs of these filters are the inputs to an artificial neural network that is trained to estimate the disparity present in the images. The model illustrated here is model C2M in which low-spatial-frequency information was received during early stages of training, and information at higher frequencies was added as training progressed. Natural Computation Group

4 Models (systems) 4 Models of how to supply the visual information. • C2M (Coarse-scale-To-Multiscale model): Developmental sequence such that the system was exposed only to low-spatial-frequency information at the start of training, and information at higher spatial frequencies was added to its input aas training progressed. • F2M (Fine-scale-To-Multiscale model): Similar to above, high-spatial frequency information was supplied first. • RD (Random-developmental model): Input was randomly selected at each stage of training. • ND (Non-developmental model): Received information at all spatial frequencies throughout the training period. Natural Computation Group

Inputs to the system • Left and right retinal images filtered with binocular energy filters tuned to various spatial frequencies. Figure 5: Gabor filters tuned to a low spatial frequency (λ is the wavelength). The right-eye Gabor is phase-shifted by d units relative to the left-eye Gabor. The sum of the filter responses tends to be large when the disparity in the images is d, d − λ, or d + λ. Because the wavelength λ is relatively large, the sum can peak at input disparities (e.g., d + λ) that are far from the disparity (d) to which the sum was thought to be responsive. Similar situation holds for filters tuned to high spatial frequencies, but now the wavelength λ is smaller and the false peaks occur at input disparities that are significantly closer to the disparity to which the sum was thought to be responsive. Consequently, false peaks are more misleading when using filters tuned to low frequencies than when using filters tuned to high frequencies. Natural Computation Group

Gabor response Real (even) Imaginary (odd) Natural Computation Group

Binocular energy model Real (even) Imaginary (odd) Natural Computation Group

Prediction • One would expect that too much information could lead a learning system in its early stages of training to form poor internal representations, then developmental models ought to have advantage over non-developmental model. • Non-developmental model had a greater number of inputs than the developmental models during the early stages of training and thus, a greater number of modifiable weights. • Because learning in neural networks is a search in weight space, the non-developmental model needed to perform an unconstrained search in a high-dimensional weight space. Unconstrained searches frequently lead to the acquisition of poor representations. Natural Computation Group

Prediction (continued) • When comparing the performance of two developmental models whose stages are based on spatial frequency content (C2M, F2M). One would predict that model C2M should perform best. Motivation for this prediction comes from the field of computer vision. • Consider the task of aligning two images of a scene where the images differ due to a small horizontal offset in their viewpoints. (Stereo correspondence problem). If you have never done this before, you might try to align fine details of each image. • If images are highly textured, this approach would be inappropriate since there are multiple number of potential alignments. • If however, images are blurred, the fine details that caused the confusion would be removed, and the problem becomes much simpler. Natural Computation Group

Prediction (continued) • Figure 1: (A) One image from a stereo pair depicting a textured object and background. Due to the fine details in the stereo pair, solving the stereo correspondence problem is relatively difficult. (B) The image from A blurred so as to remove the fine details. It is easier to solve the stereo correspondence problem when the stereo pair is blurred because image features tend to be larger, less numerous, and more robust to noise. Natural Computation Group

Note! • Computer vision researchers use coarse-to-fine strategy when searching for correspondences in individual pairs of images. • Here, authors used coarse-scale-to-multiscale developmental sequence while training a learning system to detect binocular disparities using many pairs of images. • The connection between the two is NOT obvious. Natural Computation Group

Method 1 • 35 receptive field locations receiving input from overlapping regions of the retina. • At each location, 30 complex cells corresponding to 3 spatial frequencies and 10 phase offsets at each frequency. • Outputs of complex cells were normalized using a softmax nonlinearity. • As a result of normalization, complex cells respond to relative contrast rather than absolute contrast. Natural Computation Group

Method 2 • Normalized outputs of the complex cells were the inputs to an artificial neural network. • Network had 1050(35x30) input units. Hidden layer had 32 units organized into 8 groups of 4 units each. • Connectivity to the hidden units was set so that each group had a limited receptive field; a group of hidden units received inputs from seven receptive field locations at the complex cell level. • Hidden units used a logistic activation function. • Output layer consisted of a single linear unit; this unit’s output was an estimate of the disparity. Natural Computation Group

Method 3 • Weights of an ANN were initialized to small random values and were adjusted during training to minimize a sum of squared error cost function using a conjugate gradient operation. • Weight sharing at the hidden layer such that units within each group had same incoming and outgoing weight values and a hidden unit had the same set of weight values from each receptive field location at the complex cell level. • Models were trained and tested using separate sets of training and test data items. • Training set contained 250 randomly generated data items; test set contained 122 data items that were generated so as to cover the range of possible binocular disparities uniformly. Natural Computation Group

Method 4 • Training was terminated after 35 iterations to avoid overfitting of the training data. • Training period was divided into 3 stages. First and second stages were each 10 iterations and third stage 15. • During first stage (for C2M), ANN received only the outputs of the complex cells tuned to low spatial frequencies (the outputs of the other complex cells were set to zero). • In second stage (C2M), low and medium spatial frequencies. • In third stage (C2M), received outputs of all complex cells. Natural Computation Group

Data sets • Performance of 4 models were evaluated on 3 data sets above. • Data sets were gray scale images with luminance between 0 and 1. Disparities between 0 and 3 pixels. • Ten simulations of each model on each data set were conducted. Natural Computation Group

Result 1 • Figure 4: The four models’ RMSE on the test set data items after training on the three data sets (the error bars give the standard error of the mean). Natural Computation Group

Result 2 • Figure 6: Learning curves for the four models on the three data sets. Natural Computation Group

Result 3 • Figure 7: Performance of models C2M, F2M, and ND on images with disparities of different sizes (small, midsize, and large disparities) at the end of each developmental stage (stages 1, 2, and 3). Training and test data items came from the solid object data set. Natural Computation Group

Result 4 • Figure 8: RMSE of models C2M, F2M, C-CF-CMF, and F-CF-CMF on the test items from the solid object data set. Natural Computation Group

Conclusion • C2M and F2M consistently outperformed the ND and RD model. • Therefore, authors concluded that suitably designed developmental sequences can be useful to systems learning to detect binocular disparities, in accord with less-is-more view of development. • Idea that visual development can aid visual learning is a viable hypothesis to be studied. • In the neural computation, it is well known that systems learn best when they are suitably constrained through the use of domain knowledge. • The design of appropriate developmental progressions through the use of domain knowledge provides researchers with an effective means of biasing their learning systems so as to enhance their performance. Natural Computation Group

References • Dominguez, M. and Jacobs, R.A. (2003) Developmental constraints aid the acquisition of binocular disparity sensitivities. Neural Computation, 15, 161-182. • Ivanchenko, V. and Jacobs, R.A. (2003) A developmental approach aids motor learning. Neural Computation, 15, 2051-2065. • Jacobs, R.A. and Dominguez, M. (2003) Visual development and the acquisition of motion velocity sensitivities. Neural Computation, 15, 761-781. • Simon Haykin (1999), Neural Networks: A Comprehensive Foundation, Prentice Hall, 2nd ed. • Fleet, D. J., Wagner, H., & Heeger, D. J. (1996). Neural encoding of binocular disparity: Energy models, position shifts, and phase shifts. Vision Research, 36, 1839–1857. • Wolfgang Sturzl, Ulrich Hoffmann, & Hanspeter A. Mallot (2002) Vergence Control and Disparity Estimation with Energy Neurons: Theory and Implementation. Proc. of ICANN, LNCS 2415, 1255-1260. Natural Computation Group

Hyundo Kim Natural Computation Group