A Bayesian Approach to Recognition

1 / 86

# A Bayesian Approach to Recognition - PowerPoint PPT Presentation

A Bayesian Approach to Recognition. Moshe Blank Ita Lifshitz. Reverend Thomas Bayes 1702-1761. Agenda. Bayesian decision theory Maximum Likelihood Bayesian Estimation Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## A Bayesian Approach to Recognition

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. A Bayesian Approach to Recognition Moshe Blank Ita Lifshitz Reverend Thomas Bayes 1702-1761

2. Agenda • Bayesian decision theory • Maximum Likelihood • Bayesian Estimation • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

3. Bayesian Decision Theory • We are given a training set T of samples of class c. • Given a query image x, want to know the probability it belongs to the class, p(x) • We know that the class has some fixed distribution, with unknown parameters θ, that is p(x|θ) is known • Bayes rule tells us: p(x|T) =∫p(x,θ|T)dθ = ∫p(x|θ)p(θ|T)dθ • What can we do about p(θ|T)?

4. Maximum Likelihood Estimation What can we do about p(θ|T)? Choose parameter value θML, that make the training data most probable: θML = arg max P(T|θ) p(θ|T) = δ(θ – θML) ∫p(x|θ)p(θ|T)dθ = p(x| θML)

5. ML Illustration Assume that the points of T are drawn from some normal distribution with known variance and unknown mean

6. Bayesian Estimation • The Bayesian Estimation approach considers θ as a random variable. • Before we observe the training data, the parameters are described by a prior p(θ) which is typically very broad. • Once the data is observed, we can make use of Bayes’ formula to find posterior p(θ|T). Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.

7. Bayesian Estimation • Unlike ML, Bayesian estimation does not choose a specific value for θ, but instead performs a weighted average over all possible values of θ. • Why is it more accurate then ML?

8. Maximal Likelihood vs Bayesian • ML and Bayesian estimations are asymptotically equivalent and “consistent”. • ML is typically computationally easier. • ML is often easier to interpret: it returns the single best model (parameter) whereas Bayesian gives a weighted average of models. • But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information). • Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions.

9. Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

10. Objective Given an image, decide whether or not it contains an object of a specific class.

11. Main Issues • Representation • Learning • Recognition

12. Approaches to Recognition • Photometric properties – filter subspaces, neural networks, principal analysis… • Geometric constraints between low level object features – alignment, geometric invariance, geometric hashing… • Object Model

13. Model: constellation of Parts • Yuille, ‘91 • Brunelli & Poggio, ‘93 • Lades, v.d. Malsburg et al. ‘93 • Cootes, Lanitis, Taylor et al. ‘95 • Amit & Geman, ‘95, ‘99 • Perona et al. ‘95, ‘96, ‘98, ‘00, ‘02 Fischler & Elschlager, 1973

14. Perona’s Approach • Objects are represented as a probabilistic constellation of rigid parts (features). • The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts.

15. Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Model parameterization • Feature Selection • Learning • Mixture model • More advanced probabilistic model • “One-Shot” Learning

16. Weber, Weilling, Perona - 2000 • Unsupervised Learning of Models for Recognition • Towards Automatic Discovery of Object Categories

17. Unsupervised Learning Learn to recognize object class given a set of class and background pictures, without preprocessing – labeling, segmentation, alignment.

18. Model Description • Each object is constructed of F parts, each of a certain type. • Relations between the part locations define the shape of the object.

19. Image Model • Image is transformed into a collection of parts • Objects are modeled as sub collections

20. Model Parameterization Given an image we detect potential object parts, to obtain the following observable:

21. Hypothesis • When presented with an un-segmented and unlabeled image, we do not know which parts correspond to the foreground. • Assuming the image contains the object, use vector of indices h to indicate which of the observables correspond to a foreground point (i.e. real part of the object). • We call h hypothesis since it is a guess on the structure of the object. h = (h1, …, hT) is not observable.

22. Additional Hidden Variables • We denote by the locations of the unobserved object parts. • b = sign(h) – binary vector indicates which parts were detected • n = number of background parts detected of each type

23. Probabilistic Model • We can now define a generative probabilistic model for the object class using the probability density function:

24. Model Details Since n, b are determined by Xo, h, we have: By Bayesian rule:

25. Model Details Full table of joint probabilities (for small F) or F independent detection rate probabilities for large F

26. Model Details Poisson probability density function with parameter Mt for detection of feature of type t

27. Model Details Uniform probability over all hypotheses consistent with n and b

28. Model Details Where - coordinates of all foreground detections, and - coordinates of all background detections

29. Sample object classes

30. Invariance to Translation Rotation and Scale • There is no use in modeling the shape of the object in terms of absolute pixel positions of the features. • We apply a transformation on features’ coordinates to make the shape invariant to translation, rotation and scale. • But the feature detector must be invariant to the transformations as well!

31. Automatic Part Selection • Find points of interest in all training images • Apply Vector Quantization and clustering to get 100 total candidate patterns.

32. Automatic Part Selection Points of interest patterns

33. Method Scheme Part Selection Model Learning Test

34. Automatic Part Selection • Find subset of candidate parts of (small) size F to be used in the model that gives the best performance in the learning phase. 57% 87% 51%

35. Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data μ, Σ – expectation and covariance parameters of the joint Gaussian modeling the shape of the foreground b – random variable denoting whether each of the parts of the model is detected or not M – average number of background detections for each of the parts

36. Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data, i.e. maximize the likelihood arg max p( Xo| θ ) θ • Done using the EM method

37. Expectation Maximization (EM) • EM is an iterative optimization method to estimate some unknown parameters θ, given measurement data, but not given some “hidden” variables J. • We want to maximize the posterior probability of the parameters θ given the data U, marginalizing over J:

38. Expectation Maximization (EM) E-Step: Estimate unobserved data using θk Choose an initial parameter θ0 Guess of parameters θk Guess of unknown hidden data Observed Data M-Step: Compute Maximum Likelihood Estimate parameter θk+1 using estimated data

39. Expectation Maximization (EM) • alternate between estimating the unknowns θ and the hidden variables J. • EM algorithm converges to a local maximum

40. Method Scheme Part Selection Model Learning Test

41. Recognition • Using the maximum a posteriori approach we consider the ratio R = where h0 is the null hypothesis – which explains all parts as background noise. • We accept the image as belonging to the class if R is above a certain threshold.

42. Database • Two classes – faces and cars • 100 training images for each class • 100 test images for each class • Images vary in scale, location of the object, lighting conditions • Images have cluttered background • No manual preprocessing

43. Learning Results

44. Model Performance Average training and testing errors measured as 1-Area(ROC) Suggests 4 parts model for faces and 5 parts model for cars as optimal.

45. Multiple use of parts Part Labels: Part ‘C’ has high variance along the vertical direction – can be detected in several locations – bumper, license plate or roof.

46. Recognition Results • Average success rate (at even False Positive and False Negative ratios): • Faces: 93.5% • Cars: 86.5%

47. Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

48. Mixture Model • Gaussian model works good for homogenous classes, but real life objects can be far from homogenous. • Can we extend our approach to multi-model classes?

49. Mixture Model • An object is modeled using Ω different components, each is a probabilistic model: • Each component “sees the whole picture”. • Components are trained together.

50. Database • Faces with different viewing angles – 0°, 15°, …, 90° • Cars – rear view and side view • Tree leaves – of several types