1 / 49

Evaluation Techniques in Computer Vision

Evaluation Techniques in Computer Vision. EE4H, M.Sc 0407191 Computer Vision Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm. Contents. Why evaluate? Images – synthetic/natural? Noise Example 1. Evaluation of thresholding/segmentation methods

florence
Download Presentation

Evaluation Techniques in Computer Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation Techniques in Computer Vision EE4H, M.Sc 0407191 Computer Vision Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm

  2. Contents • Why evaluate? • Images – synthetic/natural? • Noise • Example 1. Evaluation of thresholding/segmentation methods • Example 2. Evaluation of optical flow methods

  3. Why evaluate? • Computer vision algorithms are complex and difficult to analyse mathematically • Evaluation is usually through measurement of the algorithm’s performance on test images • Use of a range of images to establish performance envelope • Comparison with existing algorithms • Performance on degraded (noise-added) images (robustness) • Sensitivity to algorithm parameter settings

  4. Test images • Real images • ‘Ground truth’ difficult to establish • Pseudo-real images • Could be synthetic objects moving against real background • Often a good compromise • Synthetic images • Noise and illumination variation over object surfaces hard to model realistically

  5. Simple synthetic images • Simple ‘object-background’ synthetic images used to evaluate thresholding and segmentation algorithms • They obey a very simple image model (piecewise constant + Gaussian noise) • Unrealistic in practice – images are not like this!

  6. Simple synthetic images Medium noise Zero noise Low noise

  7. Pseudo-real images • More realistic object background images are better used to evaluate segmentation algorithms • Images of natural objects in natural illumination • Ground truth can be established using hand segmentation tools (such as built into many image processing packages)

  8. Pseudo-real images Screws Keys Cars Washers

  9. Simple synthetic edges • Again, piecewise constant + Gaussian noise image model • ‘Ideal’ step edge • Precise edge location but not achievable by finite aperture imaging systems

  10. Simple synthetic edges Low noise Medium noise High noise

  11. Pseudo-real edges • More realistic edge profiles can be created by smoothing an ideal step edge * = Step edge Gaussian filter

  12. Pseudo-real movies • The ‘yosemite’ sequence is a computer generated movie of a rendering of a fly-through the Yosemite valley • Background clouds are real • Enables true flow (ground truth) to be determined • Used extensively in the evaluation of optical flow algorithms • yosemite.avi • yosemite_flow.avi

  13. Noise • Often used to evaluate the ‘robustness’ of algorithms • Additive noise usual in optical images but multiplicative is more realistic in sonar/radar images • Noise level proportional to signal level • Usual noise model is independent random variables (usually Gaussian) • Correlated noise often more realistic

  14. Noise • Standard noise model is zero-mean identical independently distributed (iid) Gaussian (normal) random variables • Characterised by variance • Probability distribution of rv’s

  15. Noise • Noise level characterised by the signal-to-noise ratio • Usually expressed in dB’s • Defined as : • is the mean-square grey level defined (for a pixel image) as

  16. Noise dB 30dB 0dB

  17. Noise (mean-square error) • We can regard the mean-square error (difference) between 2 images as noise • Often used to evaluate image compression algorithms in comparing the original and decompressed images • Image differences can also be expressed as the peak-signal-to-noise-ratio (PSNR) in dB by taking the signal level as 255

  18. Noise (mean-square error)

  19. Other types of noise • The other main category of (additive) noise is impulse (sometimes called ‘salt and pepper’) noise • Characterised by the impulse rate (spatial density of noise impulses) and mean square amplitude of impulse • Can normally be easily filtered out using median filters

  20. Other types of noise Original Salt and pepper noise De-speckled

  21. Other types of noise • There are many other types of noise which can be considered in algorithm evaluation • Essentially more sophisticated and realistic probability distributions of noise rv’s • For example a ‘generalised’ Gaussian model is often considered to model ‘heavy’ tailed distributions • However, in my humble opinion, a more realistic source of noise is the deviation away from the ‘ideal’ of the illumination variation across object surfaces

  22. Other types of noise

  23. Other types of noise

  24. Evaluation of thresholding & segmentation methods • Segmentation and thresholding algorithms essentially group pixels into regions (or classes) • Simplest case is object/background • Simple evaluation metrics just quantify the number of miss-classified pixels • For basic images models such as constant greylevel in object/background regions plus iid Gaussian noise, the probability of error can be computed analytically

  25. Evaluation of thresholding & segmentation methods • For a simple object/background image :

  26. Evaluation of thresholding & segmentation methods • Miss-classification probability is a function of a threshold T • For a simple constant region greylevel model plus additive iid Gaussian noise we can easily derive an analytical expression for • Not very useful in practice as limited image model and we also require the ground truth • More useful just to simply measure the miss-classification error as a function of threshold

  27. Evaluation of thresholding & segmentation methods • Usual to represent correct classification probabilities and false alarm probabilities jointly within a receiver operating curve (ROC) • For example, the ROC shows how these vary as a function of threshold for an object/background classification

  28. Evaluation of thresholding & segmentation methods 1.0 T=0 Prob. of correct classification T=255 0.0 0.0 1.0 Prob. of false alarm

  29. Evaluation of thresholding & segmentation methods • More useful methods of evaluation can be found by taking account of the application of the segmentation • Segmentation is rarely an end in itself but a component in an overall machine vision system • Also, the level of under- or over- segmentation of an algorithm needs to be determined

  30. Evaluation of thresholding & segmentation methods Ground truth Under-segmentation Over-segmentation

  31. Evaluation of thresholding & segmentation methods • Under-segmentation is bad as distinct regions are merged • Over-segmentation can be acceptable as sub-regions comprising a single ground truth region can be merged using ‘high’ level knowledge • Also, the level of over-segmentation can be controlled by parameter settings of the algorithm

  32. Evaluation of thresholding & segmentation methods • A possible segmentation metric is to quantify correctly detected regions, over-segmentation and under-segmentation • Depends upon some threshold setting T • Region rather than pixel based • Used in Koester and Spann’s paper (IEEE Trans. PAMI, 2000)to evaluate range image segmentations

  33. Evaluation of thresholding & segmentation methods • Correct detection • At least T % of the pixels in region k of the segmented image are marked as pixels in region j of the ground truth image • And vice versa Segmentation GT image

  34. Evaluation of thresholding & segmentation methods • Over-segmentation • Region j in the ground truth image corresponds to regions k1, k2… km in the segmented image if : • At least T % of the pixels in region ki are marked as pixels of region j • At least T % of the pixels in region j are marked as pixels in the union of regions k1, k2… km

  35. Evaluation of thresholding & segmentation methods GT image Segmentation

  36. Evaluation of thresholding & segmentation methods • Under-segmentation • Regions j1, j2… jm in the ground truth image correspond to region k in the segmented image if : • At least T % of the pixels in region kare marked as pixels in the union of regions j1, j2… jm • At least T % of the pixels in region ji are marked as pixels in region k

  37. Evaluation of thresholding & segmentation methods GT image Segmentation

  38. Evaluation of thresholding & segmentation methods • The metric also allows us to quantify missed and noise regions • Missed regions – regions in the ground truth image not found in the segmented image • Noise regions – regions in the segmented image not found in the ground truth image • Overall, the average number of correct, over, under, missed and noise regions can be quantified over an image database and different algorithms compared

  39. Evaluation of optical flow methods • Optical flow algorithms compute the 2D optical flow vector at each pixel using consecutive frames in a video sequence • Optical flow algorithms are notoriously un-robust • Crucial to evaluate the effectiveness of any method used (or any new method devised) • Usually ground truth difficult to come by

  40. Evaluation of optical flow methods

  41. Evaluation of optical flow methods • This simple error measurement naturally amplifies errors when the flow vectors are large (for the same relative flow error) • Can normalize the error by the product of the magnitudes of the ground truth flow and flow estimate

  42. Evaluation of optical flow methods • Often the ground truth is not available • A useful (but often crude) way of comparing the quality of two optical flow fields and is to compute the displaced frame difference (DFD) statistic • Uses the two consecutive frames of a sequence from which the flows were computed

  43. Evaluation of optical flow methods

  44. Evaluation of optical flow methods • DFD is a crude estimate because it says nothing about the accuracy of the motion field directly – just the quality of the pixel mapping from one frame to the next • Plus it says nothing about the confidence attached to optical flow estimates • However, it is the basis of motion compensation algorithms for most of the current video compression standards (MPEG, H261 etc)

  45. Evaluation of optical flow methods • In optical flow estimation, as in other types of estimation algorithms, we are often interested in the quality of the estimates • In classic estimation theory, we often compute confidence limits on estimates • We can say with a certain degree of confidence (say 90%) that the parameter lies within certain bounds • We usually assume that the quantities we are estimating follow some known probability distribution (for example chi-squared)

  46. Evaluation of optical flow methods • In the case of optical flow vectors, confidence regions are ellipses in 2 dimensions • They essentially characterise the distribution of the estimation error • Assuming a normal distribution of the flow error, confidence ellipses can be drawn for any confidence limit • Orientation and shape of ellipses determined by the covariance matrix defining the normal distribution • The eigenvalues of the covariance matrix define a particular confidence limit

  47. Evaluation of optical flow methods 99% 90% 70% Confidence ellipses of

  48. Evaluation of optical flow methods Yosemite true flow Yosemite Yosemite flow (L&K) Yosemite flow (L&K) confidence thresholded

  49. Conclusions • Evaluation in computer vision is a difficult and often controversial topic • I would suggest 3 rules of thumb to consider when evaluating your work for the purposes of assignments • Consider carefully your test data. Make it as realistic as possible • Make your evaluations as much as possible ‘application driven’ • Make your algorithms ‘self evaluating’ if possible through the use of confidence statistics

More Related