1 / 29

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?. Fundamentalist Bayes: The posterior is ALL knowledge you have about the state.

kenda
Download Presentation

Deciding, Estimating, Computing, Checking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deciding, Estimating, Computing, Checking How are Bayesian posteriorsused, computed and validated?

  2. Fundamentalist Bayes:The posterior is ALL knowledge you have about the state • Use in decision making: take action maximizing your utility. Must know costto decide state is A when it is B.(Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more) • Estimation: cost of deciding state is ’ when it is 

  3. Maximum expected utility decision

  4. Estimating the state

  5. Loss functions • L(x,y)=(x-y)^2, squared error,optimal estimator is mean • L(x,y)=|x-y|, absolute error,optimal estimate is median • L(x,y)=(x-y), Dirac function,optimal estimate is mode

  6. Loss functions HW 2 HW2

  7. Loss functions, + - • Mean: Easy to compute, necessary forestimating probabilities, sensitive to outliers • Median: Robust, scale-invariant, onlyapplicable in 1D • Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise

  8. Computing Posteriors • Finite state space: easy • Discretized state space: easypost=prior.*likelihood; post=post(sum(post)) • Analytical prior conjugate wrt likelihood: easy • High-dimensional state space (eg 3D image),difficult, MCMC

  9. Conjugate families • Normal prior N(mu,s2) • Normal likelihood N(mu’,s2’) • Then posterior is normal N(mup,s2p), where(x-mu)^2/s2+(x-mu’)^2/s2’=(x-mup)^2/s2p+c • i.e., 1/s2+1/s2’=1/s2p mu/s2+mu’/s2’=mup/s2p • Unknown variances is more difficult …

  10. Conjugate families • Beta conjugate wrt Bernoulli trials • Dirichlet conjugate wrt discrete • Wishart conjugate wrt multivariate normal, • Fairly complete table in Wikipedia

  11. Wikipedia on conjugate distributions

  12. Markov Chain Monte Carlo

  13. Markov Chain Monte Carlo

  14. MCMC and mixing Target  Small prop Good prop Large prop

  15. Testing and Cournot’s Principle • Standard Bayesian analysis does not reject a model: it selects the best of those considered. • An event with small probability will not happen • Assume a model M for an experiment and a low probability event R in result data • Perform experiment. If R happened, something was wrong: Assumed model M obvious choice • Thus, assumption that M was right is rejected

  16. Test statistic • Define model to test, the null hypothesis H • Define real valued function t(D) on data space. • Find distribution of t(D) induced by H • Define rejection region R such that P(t(D)R) is low (1% or 5%) • R is typically tails of distribution, t(D)<l or t(D)>uwhere [l,u] is a high probability interval • If t(D) in rejection set, the null hypothesis H has been rejected at significance level P(t(D)R) (1% or 5%)

  17. Kolmogorov-Smirnov test Is sample from givendistribution? Test statistic d is maxdeviation of empiricalcumulative distributionfrom theoretical. If d*sqrt(n) > 2.5, Sample is (probably)not from target distr

  18. Kolmogorov-Smirnov test >> rn=randn(10,1); >> jj=[1:10]; >> jj=jj/10; >> KS(sort(rn),rnn) ans= 1.4142 >>

  19. Kolmogorov-Smirnov test

  20. Kolmogorov-Smirnov test

  21. Combining Bayesian and frequentist inference • Posterior for parameter • Generating testing set (Gelman et al, 2003)

  22. Graphical posterior predictivemodel checking takes first place inauthoritative book.Left column is 0-1 coding of logistic regression of sixsubjects response (row) to stimulus(column). Replicationsusing posterior and likelihooddistribution in right six columns. There is clear micro-structure in left column not present in the right ones. Thus,the fitting appears to have beendone with inappropriate(invalid)model.

  23. Cumulative counts of real coal-mining disasters (lower red)Comparing with 100 scenarios of same number of simulateddisasters occuring randomly: The real data cannot reasonablybe produced by a constant-intensity process.

  24. The useful concept of p-value

  25. Multiple testing • The probability of rejecting a true null hypothesis at 99% is 1%. • Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 • Bonferroni correction, FWE control:in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 %

  26. Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.

  27. Fiducial inference • Fiducial inference is fairly undeveloped,and also controversial. It is similar in idea to Neyman’s confidence interval which is used a lot despite philosophical problems and lack of general understanding. • Objective is to find region in which a distributions parameters lie, with confidence c. • Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c. • However, this is before data has been seen,and estimator is not sufficient statistic.Somewhat scruffy.

  28. Hedged prediction schemeVovk/Gammerman • Given sequence z1=(x1,y1), z2=(x2,y2), …zn=(xn,yn) AND new x(n+1), predict y(n+1) • xi typically (high-dimensional) feature vector • yi discrete (classification), or real (regression) • Predict y(n+1)Y with (say) 95% confidence, or • Predict y(n+1) precisely and state confidence(classification only) • Predict y(n+1) giving the sequence ‘maximum randomness’ using computable approximation to Kolmogorov randomness • Can be based on SVM method

More Related