1 / 70

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

Presented by Patrick Dallaire – DAMAS Workshop november 2 th 2012. Hierarchical Double Dirichlet Process Mixture of Gaussian Processes. Paper from Tayal et al . (2012) AAAI. INTRODUCTION. PROBLEM DESCRIPTION. Consider a non-stationary time series such as:. PROPOSED MODELS.

Download Presentation

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presented by Patrick Dallaire – DAMAS Workshop november 2th 2012 Hierarchical Double Dirichlet Process Mixture of Gaussian Processes Paper from Tayal et al. (2012) AAAI

  2. INTRODUCTION

  3. PROBLEM DESCRIPTION • Consider a non-stationary time series such as:

  4. PROPOSED MODELS • Gaussian processes • Infinite mixture of Gaussian processes • Dirichlet process mixture of Gaussian processes • Hierarchical double Dirichlet process mixture of Gaussian processes

  5. OUTLINE • Bayesianmodeling • Dirichlet processes • Hierarchical Dirichlet processes • Gaussianprocesses

  6. Bayesian MODELING

  7. THE BAYESIAN APPROACH • Define a model linking the unknown parameters to the data: • Specify a prior probability distribution expressing our belief about the parameters: • Compute the posterior distribution of the parameters given the data withBayes' theorem:

  8. COMPUTATIONAL ISSUES • Bayes' theorem involves different quantities: • The shape of the posterior is given by: • The marginal likelihood is used as a normalizing constant:

  9. POSTERIOR PREDICTION • The predictive distribution can be formulated as: • Predictions should consider the all posterior uncertainty about the parameters:

  10. CONJUGATE MODELS • Integrals involved in Bayesian inference can be analytically intractable, increasing the computational complexity. • A model is said conjugate when the posterior and prior distributions belong to the same family. • Posterior computation for conjugate models is done analytically.

  11. GAUSSIAN PROCESSES

  12. INTRODUCTION • Gaussianprocesses (GP) are used for supervisedlearning to estimate a function of interest • GPs are probability distributions over space of functions • They belong to the class of nonparametric Bayesian approaches

  13. NORMAL DISTRIBUTION • Let us assume a random variable

  14. NORMAL DISTRIBUTION • We place the random variable such as:

  15. INDEXING RANDOM VARIABLES • Assume multiple variables indexed by and placed at :

  16. MULTIVARIATE NORMAL • According to this construction, we have a set of i.i.d. normally distributed random variables • The joint probability can be represented as: • What happens when adding covariance?

  17. MULTIVARIATE NORMAL • An example with dependent variables

  18. INFINITE NORMAL • Assume that random variables are now indexed by input values in • Since this space is covered by normal variables, infinitely many normal variables • Let us denote by the normal at • We must define how these variables covary

  19. DEFINITION • A Gaussian process is a set of random variables for which any subset of its variable has a multivariate normal joint distribution • To specify a prior distribution in a space of functions, we define:whereis the meanfunction and is the covariance function

  20. SAMPLING EXAMPLE

  21. SAMPLING EXAMPLE

  22. SAMPLING EXAMPLE

  23. SAMPLING EXAMPLE

  24. PRIOR OVER FUNCTIONS • Specifying a GP consists in defining its mean and covariance functions • The covariance function determines a likelihood over the different types of functions

  25. LEARNING EXAMPLE

  26. LEARNING EXAMPLE

  27. DIRICHLET PROCESSES

  28. DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:

  29. STICK-BREAKING CONSTRUCTION

  30. STICK-BREAKING CONSTRUCTION

  31. STICK-BREAKING CONSTRUCTION

  32. STICK-BREAKING CONSTRUCTION

  33. STICK-BREAKING CONSTRUCTION

  34. STICK-BREAKING CONSTRUCTION

  35. STICK-BREAKING CONSTRUCTION

  36. STICK-BREAKING CONSTRUCTION

  37. STICK-BREAKING CONSTRUCTION

  38. STICK-BREAKING CONSTRUCTION

  39. STICK-BREAKING CONSTRUCTION

  40. STICK-BREAKING CONSTRUCTION

  41. STICK-BREAKING CONSTRUCTION

  42. STICK-BREAKING CONSTRUCTION

  43. STICK-BREAKING CONSTRUCTION

  44. DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:

  45. CLUSTERING PROPERTY • A random draw from a Dirichlet process is discrete with probability one • Only a finite number of its atoms will have an appreciable mass • A data point from the random distributionis associated to cluster with probability

  46. DIRICHLET PROCESS MIXTURE OF GAUSSIAN PROCESSES

  47. INTRODUCTION • Gaussian processes using a stationary set of hyperparametersmay be too restrictive • Dirichlet process can be used to group the observed data into clusters • Each cluster could be given a private set of hyperparameters representing the local behavior of the function

  48. GENERATIVE PROCESS • Partition the data into clusters with the Dirichlet process • For each cluster, sample an input Gaussian and a set of hyperparameters • For each data in a cluster, sample its input position according to the input Gaussian • Sample output variables according to the respective Gaussian process

  49. GENERATIVE PROCESS • Partition the data into clusters with the Dirichlet process • For each cluster, sample an input Gaussian and a set of hyperparameters • For each data in a cluster, sample its input position according to the input Gaussian • Sample output variables according to the respective Gaussian process

  50. EXAMPLE Popularity Data cluster cluster

More Related