1 / 97

Bayes, birds and brains: applications of inference and probabilistic modelling

Bayes, birds and brains: applications of inference and probabilistic modelling. Stephen Roberts Pattern Analysis & Machine Learning Research Group University of Oxford http://www.robots.ox.ac.uk/~parg. Introduction.

kaiyo
Download Presentation

Bayes, birds and brains: applications of inference and probabilistic modelling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayes, birds and brains: applications of inference and probabilistic modelling Stephen Roberts Pattern Analysis & Machine Learning Research Group University of Oxford http://www.robots.ox.ac.uk/~parg

  2. Introduction • Bayesian inference has profound impact in principled handling of uncertainty in practical computation • What this talks aims to do: • Give an overview of Bayesian inference applied to several real-world problem domains • What it does not aim to do: • Give endless equations – these are important and elegant, but are in the open literature

  3. What’s wrong with sampling? • Nothing – apart from speed and the occasional frequentist way the samples are used… • Not much we can do for speed, lots of clever methods out there which help • Bayesian sampling (using Gaussian processes) • Bayes-Hermite Quadrature [O’Hagan, 1992] • Bayesian Monte Carlo [Rasmussen and Ghahramani, 2003] • Variational Bayes

  4. Variational Bayes - 1 Log posterior bounded below by Free Energy

  5. p(x1,x2) x2 x1 q(x1)q(x2) x2 Variational Bayes - 2 • A slow and (often) painful derivation leads to an iterative node update for DAGs • This converges to a local optimum – like EM and many other energy minimization approaches – get the priors right!

  6. Variational Bayes – 3 • Some relief via Variational Message Passing • Same update equations as VB but at fraction of pain • Conjugate exponential family only • Pearl-style message passing on graphical model using sufficient statistics only • For many applications the factored natureof degrades performance – need non-factored proposals – extra computation (e.g. some VB models with mixture model nodes)

  7. Priors & model selection • Sensitivity to priors • posterior distributions conjugate with priors • empirically can be a problem – know the domain • Model selection • evaluate set of models for VFE. Rank or integrate • use VFE in ‘quasi-RJMCMC’ approach • use ridge regression (ARD, weight decay) priors

  8. Simple example - ICA • ICA (Bell & Sejnowski, Attias, Amari….) • Bayesian ICA (Roberts 1998, Attias 1999, Miskin & MacKay 2000, Choudrey & Roberts 2000)

  9. vbICA – graphical model

  10. vbICA – simple example

  11. How many sources?

  12. vbICA - VFE

  13. vbICA - RJMCMC

  14. vbICA – source suppression

  15. Mixtures of ICAs (Choudrey & Roberts, 2001)

  16. VFE and RR (ARD) work

  17. Example (& a cautionary tale)

  18. … a cautionary tale

  19. Ridge regression…

  20. Variational free energy

  21. Recovered images

  22. A cautionary conclusion… • In high noise regimes use ARD to focus on a small subset of models • These are then investigated in more detail using variational free energy

  23. Priors • If we have prior knowledge regarding the sources or the mixing , we can use it. • Spatial information • Positive mixing • Positive sources • Structured observations

  24. Positivity

  25. An example

  26. ICA with different priors Which is ‘correct’ though?

  27. Which is ‘correct’?

  28. Epilepsy data

  29. Structure priors • To be an ICA matrix, must lie on manifold of decorrelating matrices. These form ‘great circles’ in the matrix space. • Can parameterize using co-ordinates on the manifold. • Where do priors lie?

  30. Gaussian priors Gaussian priors on the mixing process just form great circles – they have little impact if we already compute on the decorrelating manifold as they are aligned with the manifold.

  31. Structure priors Potential from brain source – dipole potential (Knuth) • Sensor coupling has spatial structure, close by sensors have similar coupling weights • Gaussian process prior: still gives great circle in matrix space but very informative as not aligned along decorrelating manifold

  32. Phantom head experiments Without prior With prior

  33. Brain-Computer Interfaces ‘direct’ control in real-time using ‘thought’

  34. Motor cortex • When we plan a movement, changes take place in the motor cortex, whether or not the movement takes place. • When we change cognitive task, changes take place in the cortex.

  35. Cursor control – real time BCI Bayes – rejection Bayes baseline dT = 50ms max median min

  36. The curse of feedback bits t (secs)

  37. 11100001010101010101010010010101010010101110101001010101001010100100101001010100101010100010100101000010010010001110000101010101010101001001010101001010111010100101010100101010010010100101010010101010001010010100001001001000 Information Engines “If all you have is a hammer, everything looks like a nail.” P(action|data) = 0.95 DATA ENGINE (MODEL) INFORMATION potential entropy machine useful

  38. 111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101 Inside or outside the box? The inferences we make, and the actions decided upon have an impact on the data Learning with changing objectives

  39. Sequential Bayesian inference • Particle filter (SIR) • Humble (variational) Kalman filter: Bayesian inference assuming generalized (non-) linear Gaussian system • Adaptive system using sequential variational Bayes • BCI application • (Musical score following)

  40. Generalised non-linear dynamic classifier Copes with missing inputs & labels, input noise and bit errors on labels as well as time-delayed information Penny, Sykacek, Lowne

  41. Foul stuff…

  42. What it buys us

  43. Thanks to John Gann

  44. PART II: BIRDS

  45. Hidden Markov birds… • Global Positioning System (GPS) • 15g units • Strapped to back of bird • Gives position every second Roberts, Guilford, Biro, Lau 2004,5 JTB

More Related