Visual input processing

Cross-sensorial processing – MED7 Visual input processing Lecturer: Smilen Dimitrov

Introduction • The immobot base exercise • Work on the visual input • Goal – object localizationin 3D • Setup: • PC • Two Logitech QC Zoomwebcams

Setup • Setup for a PC: • Logitech QuickCam (QC) drivers • QuickTime • WinVDig (that corresponds to the installed version of QuickTime) • Max/MSP/Jitter

Setup • Camera parameters Image Sensor: 1/4” Color 640 x 480 Pixel CMOS Lens type: 3P F/# : F/2.4 Effective focal length : 5.0mm

Setup • Low tech configuration – stereo imaging not guaranteed (frame delays) • Other options • Bumblebee camera • True stereo camera • Firewire (power issues, drivers) • Axis 206 camera • IP camera (drivers)

Goal of the vision processing algorithm • Object detection: • the application needs to detect the presence of a new object whenever it enters the monitored environment. • Object recognition: • Once a new object is detected, it needs to be classified to determine its type (e.g., a car versus a truck, a tiger versus a deer). • Object tracking: • Assuming the new object is of interest to the application, it can be tracked as it moves through the environment. Tracking involves computing current location of the object and its trajectory, Color tracking Estimation of 3Dlocation through two view geometry - stereopsis

Goal of the vision processing algorithm

Color tracking • Using a Max/MSP/Jitter provided algorithm – jit.findbounds • Input – min and max color range to react to, and video • Output – min and max (x,y) coordinates of the rectangle where the color has been found

Color tracking • Jit.findbounds output – rectangle • Center coordinate

Color tracking – example code • Can be performed in Max/MSP javascript using jsui – slow !

Color tracking - background • Video tracking - the process of locating a moving object (or several ones) in time using a camera. An algorithm analyses the video frames and outputs the location of moving targets within the video frame. • video tracking systems usually employ a motion model which describes how the image of the target might change for different possible motions of the object to track. • Video tracking approaches: • Blob tracking: Segmentation of object interior (for example blob detection, block-based correlation or optical flow) • Contour tracking: Detection of object boundary (e.g. active contours or Condensation algorithm) • Visual feature matching: Registration • Color tracking is a type of blob tracking

Color tracking - background • Blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker than the surrounding. There are two main classes of blob detectors (i) differential methods based on derivative expressions and (ii) methods based on local extrema in the intensity landscape. • A blob (binary large object) is an area of touching pixels with the same logical state. • A group of pixels organized into a structure is commonly called a blob. Problems related to blobs: 1. Where are the edges? 2. Where is the center? 3. How many pixels does it contain? 4. What is the average pixel intensity? 5. What is the blob's orientation (angle)?

Color tracking - background • Blob center calculation – simple method

Color tracking - background • A blob (binary large object) is an area of touching pixels with the same logical state. • All pixels in an image that belong to a blob are in a foreground state. • All other pixels are in a background state. • In a binary image, pixels in the background have values equal to zero while every nonzero pixel is part of a binary object. • For jit.findbounds - this logical test of belonging to the blob is whether the color of the currently tested pixel falls within the range set to be detected

Color tracking - background • What is easily identifiable by the human eye as several distinct but touching blobs - may be interpreted by software as a single blob. • A reliable software package will tell you how touching blobs are defined. For example, you can define touching pixels as adjacent pixels along the vertical or horizontal axis as touching or include diagonally adjacent pixels. • Segmentation of the image - separating the good blobs from the background and each other as well as eliminating everything else that is not of interest. • Segmentation usually involves a binarization operation – a black and white image result

Color tracking - background • blob analysis – logical – (generally) performed on black and white image • Brightness - rectangle algorithm • The rectangle algorithm keeps track of four points in each frame, the top most, left most, right most and bottom most points where the brightness exceeds a certain threshold value.

Color tracking - background • Tracking types: (I) objects of a given nature, e.g., cars, people, faces (II) objects of a given nature with a specific attribute, e.g., moving cars, walking people, talking heads, face of a given person (III) objects of a priori unknown nature but of a specific interest, e.g., moving objects, objects of semantic interest manually picked in the first frame • (I) and (II) - part of the input video frame is searched against a reference model (image patches – or overall shape[geometry]) describing the appearance of the object. • (III) - the reference can be extracted from the first frame and kept frozen – color tracking • Recent color tracking algorithms: • MeanShift • Continuously Adaptive Mean Shift (CamShift)

Color tracking - background • Advanced application of tracking in stereo – matching • Starting from a collection of images or a video sequence the first step consists in relating the different images to each other. • two images are shown with the extracted corners. Note that it is not possible to find the corresponding corner for each corner, but that for many of them it is. • In our example, we are having only one 3D point to deal with – we assume the data obtained from the two cameras are matched

Camera parameters • Extrinsic and intrinsic parameters • Extrinsic parameters • the orientation of the camera Euclidean co-ordinates with respect to the world Euclidean co-ordinate system.This relation is given by matrices R and t. • Thus there are six extrinsic camera parameters; three rotations and three translations.

Camera parameters • Extrinsic and intrinsic parameters • Intrinsic parameters – coefficients of calibration matrix K • px and py are the width and the height of the pixels, c=[cx cy 1]T is the principal point (defined as intersection of the optical axis and the retinal [image] plane - center of image plane) and a the skew angle as indicated

Stereo 3D localization algorithm

Stereo 3D localization algorithm • Problem:

Stereo 3D localization algorithm • Writing the system for the two cameras

Stereo 3D localization algorithm • Special case – canonical configuration – binocular • The model has two identical cameras separated only in the X direction by a baseline distance b. The image planes are coplanar in this model. • The baseline is aligned to the horizontal co-ordinate axis, the optical axes of the cameras are parallel, the epipoles move to infinity, and the epipolar lines in the image planes are parallel. • Rotation matrices are identity. • b – distance, f – focal length • Extrinsic parameters

Stereo 3D localization algorithm • Intrinsic parameters are ignored here – no calibration ! • We will try to scale the coordinates manually until we get something meaningful.

Stereo 3D localization algorithm • Intersection of the lines in 3D is not guaranteed • Derivation using principle behind CPA (closest points of approach) • Looking for the closest points on the lines • Solution using parametric equations

Stereo 3D localization algorithm • Finally, we obtain the estimate point CMID which we declare to be our object location O(X,Y,Z) • We will use this in code to calculate the vector location from the obtained coordinates from color tracking • Will be programmed in JavaScript, and called from Max/MSP/Jitter

Problems with the approach • No calibration – no intrinsic parameters taken into account • Low end cameras – aberrations • Low end cameras – radial distortions • No guarantee for time sync between left and right images • In general – approximative/illustrative

Implementation in Max/MSP/Jitter

Visual input processing