Deciding, Estimating, Computing, Checking

1 / 29

# Deciding, Estimating, Computing, Checking - PowerPoint PPT Presentation

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?. Fundamentalist Bayes: The posterior is ALL knowledge you have about the state.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Deciding, Estimating, Computing, Checking' - kenda

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Deciding, Estimating, Computing, Checking

How are Bayesian posteriorsused, computed and validated?

Fundamentalist Bayes:The posterior is ALL knowledge you have about the state
• Use in decision making: take action maximizing your utility. Must know costto decide state is A when it is B.(Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more)
• Estimation: cost of deciding state is ’ when it is 
Loss functions
• L(x,y)=(x-y)^2, squared error,optimal estimator is mean
• L(x,y)=|x-y|, absolute error,optimal estimate is median
• L(x,y)=(x-y), Dirac function,optimal estimate is mode
Loss functions, + -
• Mean: Easy to compute, necessary forestimating probabilities, sensitive to outliers
• Median: Robust, scale-invariant, onlyapplicable in 1D
• Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise
Computing Posteriors
• Finite state space: easy
• Discretized state space: easypost=prior.*likelihood; post=post(sum(post))
• Analytical prior conjugate wrt likelihood: easy
• High-dimensional state space (eg 3D image),difficult, MCMC
Conjugate families
• Normal prior N(mu,s2)
• Normal likelihood N(mu’,s2’)
• Then posterior is normal N(mup,s2p), where(x-mu)^2/s2+(x-mu’)^2/s2’=(x-mup)^2/s2p+c
• i.e., 1/s2+1/s2’=1/s2p mu/s2+mu’/s2’=mup/s2p
• Unknown variances is more difficult …
Conjugate families
• Beta conjugate wrt Bernoulli trials
• Dirichlet conjugate wrt discrete
• Wishart conjugate wrt multivariate normal,
• Fairly complete table in Wikipedia
MCMC and mixing

Target 

Small prop

Good prop

Large prop

Testing and Cournot’s Principle
• Standard Bayesian analysis does not reject a model: it selects the best of those considered.
• An event with small probability will not happen
• Assume a model M for an experiment and a low probability event R in result data
• Perform experiment. If R happened, something was wrong: Assumed model M obvious choice
• Thus, assumption that M was right is rejected
Test statistic
• Define model to test, the null hypothesis H
• Define real valued function t(D) on data space.
• Find distribution of t(D) induced by H
• Define rejection region R such that P(t(D)R) is low (1% or 5%)
• R is typically tails of distribution, t(D)<l or t(D)>uwhere [l,u] is a high probability interval
• If t(D) in rejection set, the null hypothesis H has been rejected at significance level P(t(D)R) (1% or 5%)
Kolmogorov-Smirnov test

Is sample from givendistribution?

Test statistic d is maxdeviation of empiricalcumulative distributionfrom theoretical.

If d*sqrt(n) > 2.5,

Sample is (probably)not from target distr

Kolmogorov-Smirnov test

>> rn=randn(10,1);

>> jj=[1:10];

>> jj=jj/10;

>> KS(sort(rn),rnn)

ans= 1.4142

>>

Combining Bayesian and frequentist inference
• Posterior for parameter
• Generating testing set (Gelman et al, 2003)

Graphical posterior predictivemodel checking takes first place inauthoritative book.Left column is 0-1 coding of logistic regression of sixsubjects response (row) to stimulus(column). Replicationsusing posterior and likelihooddistribution in right six columns. There is clear micro-structure in left column not present in the right ones. Thus,the fitting appears to have beendone with inappropriate(invalid)model.

Cumulative counts of real coal-mining disasters (lower red)Comparing with 100 scenarios of same number of simulateddisasters occuring randomly: The real data cannot reasonablybe produced by a constant-intensity process.

Multiple testing
• The probability of rejecting a true null hypothesis at 99% is 1%.
• Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63
• Bonferroni correction, FWE control:in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 %
Fiducial Inference

R A Fisher (1890--1962).

In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling.

He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.

Fiducial inference
• Fiducial inference is fairly undeveloped,and also controversial. It is similar in idea to Neyman’s confidence interval which is used a lot despite philosophical problems and lack of general understanding.
• Objective is to find region in which a distributions parameters lie, with confidence c.
• Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c.
• However, this is before data has been seen,and estimator is not sufficient statistic.Somewhat scruffy.
Hedged prediction schemeVovk/Gammerman
• Given sequence z1=(x1,y1), z2=(x2,y2), …zn=(xn,yn) AND new x(n+1), predict y(n+1)
• xi typically (high-dimensional) feature vector
• yi discrete (classification), or real (regression)
• Predict y(n+1)Y with (say) 95% confidence, or
• Predict y(n+1) precisely and state confidence(classification only)
• Predict y(n+1) giving the sequence ‘maximum randomness’ using computable approximation to Kolmogorov randomness
• Can be based on SVM method