1 / 66

Bayesian methods, priors and Gaussian processes

Bayesian methods, priors and Gaussian processes. John Paul Gosling Department of Probability and Statistics. Overview. The Bayesian paradigm Bayesian data modelling Quantifying prior beliefs Data modelling with Gaussian processes. Bayesian methods.

Download Presentation

Bayesian methods, priors and Gaussian processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics

  2. Overview • The Bayesian paradigm • Bayesian data modelling • Quantifying prior beliefs • Data modelling with Gaussian processes An Overview of State-of-the-Art Data Modelling 2

  3. Bayesian methods The beginning, the subjectivist philosophy, and an overview of Bayesian techniques.

  4. Subjective probability • Bayesian statistics involves a very different way of thinking about probability in comparison to classical inference. • The probability of a proposition is defined to a measure of a person’s degree of belief. • Wherever there is uncertainty, there is probability • This covers aleatory and epistemic uncertainty An Overview of State-of-the-Art Data Modelling 4

  5. Differences with classical inference To a frequentist, data are repeatable, parameters are not: P(data|parameters) To a Bayesian, the parameters are uncertain, the observed data are not: P(parameters|data) An Overview of State-of-the-Art Data Modelling 5

  6. Bayes’s theorem for distributions • This can be extended to continuous distributions: • In Bayesian statistics, we use Bayes’s theorem in a particular way: • In early probability courses, we are taught Bayes’s theorem for events: An Overview of State-of-the-Art Data Modelling 6

  7. Prior to posterior updating Data Bayes’s theorem is used to update our beliefs. The posterior is proportional to the prior times the likelihood. Prior Posterior An Overview of State-of-the-Art Data Modelling 7

  8. Posterior distribution • So, once we have our posterior, we have captured all our beliefs about the parameter of interest. • We can use this to do informal inference, i.e. intervals, summary statistics. • Formally, to make choices about the parameter, we must couple this with decision theory to calculate the optimal decision. An Overview of State-of-the-Art Data Modelling 8

  9. Sequential updating Today’s posterior is tomorrow’s prior Prior beliefs Data Posterior beliefs More data Posterior beliefs An Overview of State-of-the-Art Data Modelling 9

  10. The triplot • A triplot gives a graphical representation of prior to posterior updating. Prior Likelihood Posterior An Overview of State-of-the-Art Data Modelling 10

  11. Audience participation Quantification of our prior beliefs • What proportion of people in this room are left handed? – call this parameter ψ • When I toss this coin, what’s the probability of me getting a tail? – call this θ An Overview of State-of-the-Art Data Modelling 11

  12. A simple example • The archetypal example in probability theory is the outcome of tossing a coin. • Each toss of a coin is a Bernoulli trial with the probability of tails given by θ. • If we carry out 10 independent trials, we know the number of tails(X) will follow a binomial distribution. [X | θ~ Bi(10, θ)] An Overview of State-of-the-Art Data Modelling 12

  13. Our prior distribution • A Beta(2,2) distribution may reflect our beliefs about θ. An Overview of State-of-the-Art Data Modelling 13

  14. Our posterior distribution • If we observe X = 3, we get the following triplot: An Overview of State-of-the-Art Data Modelling 14

  15. Our posterior distribution • If we are more convinced, a priori, that θ= 0.5 and we observe X = 3, we get the following triplot: An Overview of State-of-the-Art Data Modelling 15

  16. Credible intervals • If asked to provide an interval in which there is a 90% chance of θlying, we can derive this directly from our posterior distribution. • Such an interval is called a credible interval. • In frequentist statistics, there are confidence intervals that cannot be interpreted in the same way. • In our example, using our first prior distribution, we can report a 95% posterior credible interval for θ of (0.14,0.62). An Overview of State-of-the-Art Data Modelling 16

  17. Basic linear model • Yesterday we saw a lot of this: • We have a least squares solution given by • Instead of trying to find the optimal set of parameters, we express our beliefs about them. An Overview of State-of-the-Art Data Modelling 17

  18. Basic linear model • By selecting appropriate priors for the two parameters, we can derive the posterior analytically. • It is a normal inverse-gamma distribution. • The mean of our posterior distribution is then which is a weighted average of the LSE and prior mean. An Overview of State-of-the-Art Data Modelling 18

  19. Bayesian model comparison • Suppose we have two plausible models for a set of data, M and N say. • We can calculate posterior odds in favour of M using An Overview of State-of-the-Art Data Modelling 19

  20. Bayesian model comparison • The Bayes factor is calculated using • A Bayes factor that is greater than one would mean that your odds in favour of M increase. • Bayes factors naturally help guard against too much model structure. An Overview of State-of-the-Art Data Modelling 20

  21. Advantages/Disadvantages • Bayesian methods are often more complex than frequentist methods. • There is not much software to give scientists off-the-shelf analyses. • Subjectivity: all the inferences are based on somebody’s beliefs. An Overview of State-of-the-Art Data Modelling 21

  22. Advantages/Disadvantages • Bayesian statistics offers a framework to deal with all the uncertainty. • Bayesians make use of more information – not just the data in their particular experiment. • The Bayesian paradigm is very flexible and it is able to tackle problems that frequentist techniques could not. • In selecting priors and likelihoods, Bayesians are showing their hands – they can’t get away with making arbitrary choices when it comes to inference. • … An Overview of State-of-the-Art Data Modelling 22

  23. Summary • The basic principles of Bayesian statistics have been covered. • We have seen how we update our beliefs in the light of data. • Hopefully, I’ve convinced you that the Bayesian way is the right way. An Overview of State-of-the-Art Data Modelling 23

  24. Priors Advice on choosing suitable prior distributions and eliciting their parameters.

  25. Importance of priors • As we saw in the previous section, prior beliefs about uncertain parameters are a fundamental part of Bayesian statistics. • When we have few data about the parameter of interest, our prior beliefs dominate inference about that parameter. • In any application, effort should be made to model our prior beliefs accurately. An Overview of State-of-the-Art Data Modelling 25

  26. Weak prior information • If we accept the subjective nature of Bayesian statistics and are not comfortable using subjective priors, then many have argued that we should try to specify prior distributions that represent no prior information. • These prior distributions are called noninformative, reference, ignorance or weak priors. • The idea is to have a completely flat prior distribution over all possible values of the parameter. • Unfortunately, this can lead to improper distributions being used. An Overview of State-of-the-Art Data Modelling 26

  27. Weak prior information • In our coin tossing example, Be(1,1), Be(0.5,0.5) and Be(0,0) have been recommended as noninformative priors. Be(0,0) is improper. An Overview of State-of-the-Art Data Modelling 27

  28. Conjugate priors • When we move away from noninformative priors, we might use priors that are in a convenient form. • That is a form where combining them with the likelihood produces a distribution from the same family. • In our example, the beta distribution is a conjugate prior for a binomial likelihood. An Overview of State-of-the-Art Data Modelling 28

  29. Informative priors • An informative prior is an accurate representation of our prior beliefs. • We are not interested in the prior being part of some conjugate family. • An informative prior is essential when we have few or no data for the parameter of interest. • Elicitation, in this context, is the process of translating someone’s beliefs into a distribution. An Overview of State-of-the-Art Data Modelling 29

  30. Elicitation • It is unrealistic to expect someone to be able to fully specify their beliefs in terms of a probability distribution. • Often, they are only able to report a few summaries of the distribution. • We usually work with medians, modes and percentiles. • Sometimes they are able to report means and variances, but there are more doubts about these values. An Overview of State-of-the-Art Data Modelling 30

  31. Elicitation • Once we have some information about their beliefs, we fit some parametric distribution to them. • These distribution almost never fit the judgements precisely. • There are nonparametric techniques that can bypass this. • Feedback is essential in the elicitation process. An Overview of State-of-the-Art Data Modelling 31

  32. Normal with unknown mean Noninformative prior: An Overview of State-of-the-Art Data Modelling 32

  33. Normal with unknown mean Conjugate prior: An Overview of State-of-the-Art Data Modelling 33

  34. Normal with unknown mean Proper prior: An Overview of State-of-the-Art Data Modelling 34

  35. Structuring prior information • It is possible to structure our prior beliefs in a hierarchical manner: • Here is referred to as the hyperparameter(s). x Data model: First level of prior: Second level of prior: An Overview of State-of-the-Art Data Modelling 35

  36. Structuring prior information • An example of this type of hierarchical is a nonparametric regression model. • We want to know about μ so the other parameters must be removed. The other parameters are known as nuisance parameters. Data model: First level of prior: Second level of prior: An Overview of State-of-the-Art Data Modelling 36

  37. Analytical tractability • The more complexity that is built into your prior and likelihood the more likely it is that you won’t be able to derive your posterior analytically. • In the ’90’s, computational techniques were devised to combat this. • Markov chain Monte Carlo (MCMC) techniques allow us to access our posterior distributions even in complex models. An Overview of State-of-the-Art Data Modelling 37

  38. Sensitivity analysis • It is clear that the elicitation of prior distributions is far from being a precise science. • A good Bayesian analysis will check that the conclusions are sufficiently robust to changes in the prior. • If they aren’t, we need more data or more agreement on the prior structure. An Overview of State-of-the-Art Data Modelling 38

  39. Summary • Prior distributions are an important part of Bayesian statistics. • They are far from being ad hoc, pick-the-easiest-to-use distributions when modelled properly. • There are classes of noninformative priors that allow us to represent ignorance. An Overview of State-of-the-Art Data Modelling 39

  40. Gaussian processes A Bayesian data modelling technique that fully accounts for uncertainty.

  41. Data modelling: a fully probabilistic method • Bayesian statistics offers a framework to account for uncertainty in data modelling. • In this section, we’ll concentrate on regression using Gaussian processes and the associated Bayesian techniques An Overview of State-of-the-Art Data Modelling 41

  42. The basic idea We have: or and are uncertain. In order to proceed, we must elicit our beliefs about these two. can be dealt with as in the previous section. An Overview of State-of-the-Art Data Modelling 42

  43. Gaussian processes A process is Gaussianif and onlyif every finite sample from the process is a vector-valued Gaussian random variable. • We assume that f(.) follows a Gaussian process a priori. • That is: • i.e. any sample of f(x)’s will follow a MV-normal. An Overview of State-of-the-Art Data Modelling 43

  44. Gaussian processes We have prior beliefs about the form of the underlying model. We observe/experiment to get data about the model with which we train our GP. We are left with our posterior beliefs about the model, which can have a ‘nice’ form. An Overview of State-of-the-Art Data Modelling 44

  45. A simple example Warning: more audience participation coming up An Overview of State-of-the-Art Data Modelling 45

  46. A simple example • Imagine we have data about some one dimensional phenomenon. • Also, we’ll assume that there is no observational error. • We’ll start with five data points between 0 and 4. • A priori, we believe is roughly linear and differentiable everywhere. An Overview of State-of-the-Art Data Modelling 46

  47. A simple example An Overview of State-of-the-Art Data Modelling 47

  48. A simple example An Overview of State-of-the-Art Data Modelling 48

  49. A simple example An Overview of State-of-the-Art Data Modelling 49

  50. A simple example with error • Now, we’ll start over and put some Gaussian error on the observations. • Note: in kriging, this is equivalent to adding a nugget effect. , An Overview of State-of-the-Art Data Modelling 50

More Related