1 / 81

MCQMC 2012

From inference to modelling to algorithms and back again Kerrie Mengersen QUT Brisbane. MCQMC 2012. Acknowledgements: BRAG. Bayesian methods and models + Fast computation + Applications in environment, health, biology, industry. So what’s the problem?. Inferential need. Model. Algorithm.

pearl
Download Presentation

MCQMC 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From inference to modelling to algorithms and back againKerrie MengersenQUT Brisbane MCQMC 2012

  2. Acknowledgements: BRAG Bayesian methods and models+ Fast computation + Applications in environment, health, biology, industry

  3. So what’s the problem? Inferential need Model Algorithm

  4. Matchmaking 101 Inferential need Model

  5. Study 1: Presence/absence models Sama Low Choy Mark Stanaway

  6. Plant biosecurity

  7. Observations and data Visual inspection symptoms Presence / absence data Space and time Dynamic invasion process Growth, spread Inference Map probability of extent over time Useful scale for managing trade / eradication Currently use informal qualitative approach Hierarchical Bayesian model to formalise the information From inference to model

  8. Hierarchical Bayesian model for plant pest spread • Data Model: Pr(data | incursion process and data parameters) • How data is observed given underlying pest extent • Process Model: Pr(incursion process | process parameters) • Potential extent given epidemiology / ecology • Parameter Model: Pr(data and process parameters) • Prior distribution to describe uncertainty in detectability, exposure, growth … • The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by: Pr(process, parameters | data)  Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)

  9. Early Warning Surveillance • Priors based on emergency plant pest characteristics • exposure rate for colonisation probability • spread rates to link sites together for spatial analysis • Add surveillance data • Posterior evaluation • modest reduction in area freedom • large reduction in estimated extent • residual “risk” maps to target surveillance

  10. Observation Parameter Estimates Taking into account invasion process Hosts Host suitability Inspector efficiency Identify contributions

  11. Clair Alston Study 2: Mixture models

  12. CAT scanning sheep

  13. From inference to model What proportions of the sheep carcase are muscle, fat and bone? • Finite mixture model yi ~ Slj N(mj,sj2) • Include spatial information

  14. Inside a sheep

  15. Inside a sheep

  16. Study 3: State space models Nicole White

  17. Parkinson’s Disease

  18. PD symptom data • Current methods for PD subtype classification rely on a few criteria and do not permit uncertainty in subgroup membership. • Alternative: finite mixture model (equivalent to a latent class analysis for multivariate categorical outcomes) • Symptom data:Duration of diagnosis, early onset PD, gender, handedness, side of onset

  19. From inference to model yij: ith subject’s response to item j 1. Define a finite mixture model based on patient responses to Bernoulli and Multinomial questions. 2. Describe subgroups w.r.t. explanatory variables 3. Obtain patient’s probability of class membership

  20. PD: Symptom data

  21. PD Signal data:“How will they respond?”

  22. Inferential aims Identify spikes and assign to unknown no.source neurons Compare clusters between segments within a recording and between recordings at different locations of the brain 3 depths

  23. Microelectrode recordings Each recording was divided into 2.5sec. segments Discriminating features foundvia PCA

  24. From inference to model DP Model yi | qi ~ p(yi | qi) qi ~ G G ~ DP(a, G0) P PCs, yi=(yi1,..,yiP) ~ MVN(m,S) G0 = p(m|S) p(S) a ~ Ga(2,2)

  25. Average waveforms

  26. Study 4: Spatial dynamic factor models Chris Strickland Ian Turner What can we learn about landuse from MODIS data?

  27. Differentiate landuse SDFM • 1st factor has influence on temporal dynamics in right half of image (woodlands) • 3rd factor has influence on LH image (grasslands) 1st trend component 2nd trend comp. common cyclical comp.

  28. Matchmaking 101 Inferential need Model smart models

  29. Smart models Tailoring Generalisation Blocking Reparametrisation Reformulation

  30. Example 1: Generalisation Mixtures are greatbut how do we choose k? Propose an overfitting model (k>k0) Non-identifiable! All values of q = (p10,..,pk00, 0, g10,..,gk00) and all values of q = (p10,..,pj,…,pk00, pk+1, g10,..,gk00, gj0) with pj+pk+1=pj0 fit equally well. Judith Rousseau f0(x) = Sj=1,..,k0 pj ggj(x)

  31. So what? • Multiplicity of possible solutions => MLE does not have a stable asymptotic behaviour. • Not important when fq is the main object of interest, but important if we want to recover q. • It thus becomes crucial to know that the posterior distribution under overfitted mixtures give interpretable results

  32. Possible alternatives to avoid overfitting Fruhwirth-Schnatter (2006): either one of the component weights is zero or two of the component parameters are equal. • Choose priors that bound the posterior away from the unidentifiability sets. • Choose priors that induce shrinkage for elements of the component parameters. Problem: may not be able to fit the true model

  33. Our result Assumptions: • L1 consistency of the posterior • Model g is three times differentiable, regular, and integrable • Prior on Q is continuous and positive, and the prior on (p1,..,pk) satisfies p(p)  p1a1-1…pkak-1

  34. Our result - 1 • If max(a1)<d/2, where d=dim(g), then asymptotically f(q|x) concentrates on the subset of parameters for which fq = f0, so k-k0 components have weight 0. • The reason for this stable behaviour as opposed as the unstable behaviour of the maximum likelihood estimator is that integrating out the parameter acts as a penalization: the posterior essentially puts mass on the sparsest way to approximate the true density.

  35. Our result - 2 • In contrast, if min(aj, j≤k)>d/2 and k>k0, then 2 or more components will tend to merge with non-neglectable weights each. This will lead to less stable behaviour. • In the intermediate case, if min(aj, j≤k) ≤d/2 ≤max(aj,j ≤k), then the situation varies depending on the aj’s, and on the difference between k and k0.

  36. Implications: Model dimension • When d/2>max{aj, j=1,..,k},dk0+k0-1+Sj≥k0+1aj appears as an effective dimension of the model • This is different from the number of parameters, dk+k-1, or from other “effective number of parameters” • Similar results are obtained for other situations

  37. Example 1 yi ~ N(0,1); fit pN(m1,1)+(1-p)N(m2,1)ai=1 > d/2

  38. Example 2yi ~ N(0,1) G=pN2(m1,S1)+(1-p)N2(m2, S2), Sj diagonald = 3; a1=a2=1 < d/2

  39. Conclusions • The result validates the use of Bayesian estimation in mixture models with too many components. • It is one of the few examples where the prior can actually have an impact asymptotically, even to first order (consistency) and where choosing a less informative prior leads to better results. • It also shows that the penalization effect of integrating out the parameter, as considered in the Bayesian framework is not only useful in model choice or testing contexts but also in estimating contexts.

  40. Example 2: Empirical likelihoods Christian Robert • Sometimes the likelihood associated with the data is not completely known or cannot be computed in a manageable time (eg population genetic models, hidden Markov models, dynamic models), so traditional tools based on stochastic simulation (eg, regular MCMC) are unavailable or unreliable. • Eg, biosecurity spread model.

  41. Model alternative: ELvIS • Define parameters of interest as functionals of the cdf F (eg moments of F), then use Importance Sampling via the Empirical Likelihood. • Select the F that maximises the likelihood of the data under the moment constraint. • Given a constraint of the form Ep(h(Y)) = q the EL is defined as Lel(q|y) = maxFPi=1:n{F(yi)-F(Yi-1} • For example, in the 1-D case when q = Eq(Y) the empirical likelihood in q is the maximum of p1,…,pn under the constraint Si=1:npiyi = q

  42. Quantile distributions • A quantile distribution is defined by a closed-form quantile function F-1(p;q) and generally has no closed form for the density function. • Properties: very flexible, very fast to simulate (simple inversion of the uniform distribution). • Examples: 3/4/5-parameter Tukey’s lambda distribution and generalisations; Burr family; g-and-k and g-and-h distributions.

  43. g-and-k quantile distribution

  44. Methods for estimating a quantile distribution • MLE using numerical approximation to the likelihood • Moment matching • Generalised bootstrap • Location and scale-free functionals • Percentile matching • Quantile matching • ABC • Sequential MC approaches for multivariate extensions of the g-and-k

  45. ELvIS in practice • Two values of q=(A,B,g,k): q=(0,1,0,0) standard normal distributionq=(3,2,1,0.5) Allingham’s choice • Two priors for q: U(0,5)4 A~U(-5,5), B~U(0,5), g~U(5,5), k~(-1,1) • Two sample sizes: n=100 n=1000

  46. ELvIS in practice: q=(3,2,1,0.5), n=100

  47. Matchmaking 101 Model Algorithm

  48. A wealth of algorithms! MC MCMC IS SMC ABC QMC VB

  49. From model to algorithm Chris Strickland Models: • Logistic regression • Non-Gaussian state space models • Spatial dynamic factor models Evaluate: • Computation time • Maximum bias • sd • Inefficiency factor (IF) • Accuracy rate

More Related