Stereo Vision. John Morris. Vision Research in CITR. Basics. A single image has no depth information Humans infer depth from ‘clues’ in the scene but These are ambiguous Stereo vision systems take two images of a scene from different viewpoints
Vision Research in CITR
Two cameras: Left and RightOptical centres: OL and OR
Virtual image plane is projection of actual image plane through optical centre
Baseline, b, is the separation between the optical centres
Scene Point, P, imaged at pL and pR
pL = 9
pR = 3
Disparity, d = pR – PL = 6
Disparity is the amount by which the two images of P are displaced relative to each other
Depth, z =
p = pixel width
Using stereo vision systems to measure properties (dimensions here) of a scene
Note that if the cameras are aligned so that the scanlines of both cameras lie in the epipolar planes,
then matching pixels must lie in the same scanline on both images.
This is the epipolar constraint.
Possible scene points
Actual scene points
ie number of pixels between corresponding points in two images
d = xl– xr
is the disparity- the difference in position between the corresponding points in the two images, commonly measured in pixels
Note the reciprocal relationship between disparity and depth!
This is particularly relevant when considering the accuracy of stereo photogrammetry
b baseline (camera separation)
q camera angular FoV
Dsens sensor width
n number of pixels
p pixel width
f focal length
a object extent
D distance to object
disparityStereo Camera Configuration
Points along these lineshave the same LR displacement
These two are equivalent!
where and are the average pixel values in the left and right windows.
c(d) = | Il(i+k,j+l) – Ir(i+k-d,j+l) |
Many matching functions can be used with varying success!!
Input – Ground truth
Sharp edges are blurred!
Sharp edges and less noise
where w0, ..., w3 are weights
(determining the weights that yield the best matches is a nontrivial task).
… and many more!!