1 / 77

Giansalvo EXIN Cirrincione

unit #3. Neural Networks and Pattern Recognition. Giansalvo EXIN Cirrincione. labelled unlabelled. PROBABILITY DENSITY ESTIMATION.

zubeda
Download Presentation

Giansalvo EXIN Cirrincione

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. unit #3 Neural Networks and Pattern Recognition Giansalvo EXIN Cirrincione

  2. labelled • unlabelled PROBABILITY DENSITY ESTIMATION A specific functional form for the density model is assumed. This contains a number of parameters which are then optimized by fitting the model to the training set. The chosen form is not correct

  3. PROBABILITY DENSITY ESTIMATION It does not assume a particular functional form, but allows the form of the density to be determined entirely by the data. The number of parameters grows with the size of the TS

  4. PROBABILITY DENSITY ESTIMATION It allows a very general class of functional forms in which the number of adaptive parameters can be increased in a sistematic way to build even more flexible models, but where the total number of parameters in the model can be varied independently from the size of the data set.

  5. Parametric model: normal or Gaussian distribution

  6. Parametric model: normal or Gaussian distribution Mahalanobis distance

  7. Parametric model: normal or Gaussian distribution contour of constant probability density (smaller by a factor exp(-1/2))

  8. Parametric model: normal or Gaussian distribution The components of x are statistically independent

  9. Parametric model: normal or Gaussian distribution

  10. Parametric model: normal or Gaussian distribution • Some properties : • any moment can be expressed as a function of m and S • under general assumptions, the mean of M random variables • tends to be distributed normally, in the limit as M tends to infinity • (central limit theorem). Example: sum of a set of variables drawn • independently from the same distribution • under any non-singular linear transformation of the coordinate • system, the pdf is again normal, but with different parameters • the marginal and conditional densities are normal.

  11. Parametric model: normal or Gaussian distribution discriminant function independent normal class-conditional pdf’s quadratic decision boundary

  12. Parametric model: normal or Gaussian distribution independent normal class-conditional pdf’s Sk = S linear decision boundary

  13. Parametric model: normal or Gaussian distribution P(C1) = P(C2)

  14. Parametric model: normal or Gaussian distribution P(C1) = P(C2) = P(C3)

  15. Parametric model: normal or Gaussian distribution S = s2I template matching

  16. ML finds the optimum values for the parameters by maximizing a likelihoodfunction derived from the training data. drawn independently from the required distribution

  17. TS joint probability density Likelihood of for the given TS ML finds the optimum values for the parameters by maximizing a likelihoodfunction derived from the training data.

  18. error function homework homework Gaussian pdf sample averages

  19. Uncertainty in the values of the parameters

  20. weighting factor (posterior distribution) drawn independently from the underlying distribution

  21. For large numbers of observations, the Bayesian representation of the density approaches the maximum likelihood solution. A prior which gives rise to a posterior having the same functional form is said to be a conjugate prior (reproducing densities, e.g. Gaussian).

  22. homework homework sample mean normal distribution Example Assume s known Find m given 

  23. normal distribution Example

  24. batch sequential • Iterative techniques: • no storage of a complete TS • on-line learning in real-time adaptive systems • tracking of slowly varying systems From the ML estimate of the mean of a normal distribution

  25. regression function Assume g has finite variance: The Robbins-Monro algorithm Consider a pair of random variables g and  which are correlated

  26. positive The Robbins-Monro algorithm Successive corrections decrease in magnitude for convergence Corrections are sufficiently large that the root is found The accumulated noise has finite variance (noise doesn’t spoil convergence )

  27. The Robbins-Monro algorithm The ML parameter estimate  can be formulated as a sequential update method using the Robbins-Monro formula.

  28. homework

  29. Consider the case where the pdf is taken to be a normal distribution, with known standard deviation s and unknown mean m. Show that, by choosing aN = s2 / (N+1), the one-dimensional iterative version of the ML estimate of the mean is recovered by using the Robbins-Monro formula for sequential ML. Obtain the corresponding formula for the iterative estimate of s2 and repeat the same analysis.

  30. histograms SUPERVISED LEARNING We can choose both the number of bins M and their starting position on the axis. The number of bins (viz. the bin width) acts as a smoothing parameter. Curse of dimensionality ( Mdbins)

  31. Density estimation in general The probability that a new vector x,drawn from the unknown pdf p(x), will fall inside some region Rof x-space is given by: If we have N points drawn independently from p(x), the probability that K of them will fall within R is given by the binomial law: The distribution is sharply peaked as N tends to infinity. Assume p(x) is continuous and slightly varies over the region R of volume V.

  32. trade-off FIXED DETERMINED FROM DATA Density estimation in general Assumption #1 R relatively large so that P will be large and the binomial distribution will be sharply peaked Assumption #2 R small justifies the assumption of p(x) nearly constant inside the integration region. K-nearest-neighbours

  33. DETERMINED FROM DATA FIXED Density estimation in general trade-off Assumption #1 R relatively large so that P will be large and the binomial distribution will be sharply peaked Assumption #2 R small justifies the assumption of p(x) nearly constant inside the integration region. Kernel-based methods

  34. interpolation function (ZOH) Kernel-based methods R is a hypercube centred on x We can find an expression for K by defining a kernel functionH(u), also known as a Parzen window, given by: Superposition of N cubes of side h with each cube centred on one of the data points.

  35. Kernel-based methods smoother estimate

  36. ZOH 30 samples Gaussian Kernel-based methods

  37. Over different selections of data points xn Kernel-based methods All of the data points must be stored ! The expectation of the estimated density is a convolution of the true pdf with the kernel function and so represents a smoothed version of the pdf. For a finite data set, there is no non-negative estimator which is unbiased for all continuous pdf’s (Rosenblatt, 1956)

  38. K-nearest neighbours Consider a small hypersphere centred at a point x and allow the radius of the sphere to grow until it contains precisely K data points. The estimate of the density is then given by K / NV. The optimum choice of h may be a function of position. One of the potential problems with the kernel-based approach arises from the use of a fixed width parameter (h) for all of the data points. If h is too large, there may be regions of x-space in which the estimate is oversmoothed. Reducing h may lead to problems in regions of lower density where the model density will become noisy.

  39. K-nearest neighbours The estimate is not a true probability density since its integral over all x-space diverges. All of the data points must be stored ! Branch-and-bound

  40. K-nearest neighbour classification rule The data set contains Nkpoints in class Ck and N points in total. Draw a hypersphere around x which encompasses K points irrespective of their class.

  41. K-nearest neighbour classification rule K = 1 : nearest-neighbour rule Find a hypersphere around x which contains K points and then assign x to the class having the majority inside the hypersphere.

  42. K-nearest neighbour classification rule K = 1 : nearest-neighbour rule Samples that are close in feature space likely belong to the same class.

  43. 1-NNR K-nearest neighbour classification rule

  44. L  0 with equality iffthe two pdf’s are equal. Measure of the distance between two density functions Kullback-Leibler distance or asymmetric divergence

  45. homework

More Related