1 / 37

Statistical Methods Bayesian methods 4

Statistical Methods Bayesian methods 4. Daniel Thorburn Stockholm University 2012-04-12. Outline. 14. MCMC – an introduction – Gibbs’ sampling 15. Exercises in subjective probability 16. Further examples of Gibbs’ sampling 17. Dynamic models (or time series analysis in a changing world).

robert
Download Presentation

Statistical Methods Bayesian methods 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methods Bayesian methods 4 Daniel Thorburn Stockholm University 2012-04-12

  2. Outline 14. MCMC – an introduction – Gibbs’ sampling 15. Exercises in subjective probability 16. Further examples of Gibbs’ sampling 17. Dynamic models (or time series analysis in a changing world) 2

  3. 14. MCMC – an introductionGibbs’ sampling

  4. Simulate from posterior and predictive distributions In some cases there are no closed expressions for the posterior distributions. In most of these cases it is nevertheless possible to simulate from the posterior. Last time we illustrated this in a simple situation (where it also was possible to get the posterior directly) (y,x) is N((0,0), 1, 1, r)

  5. Standard way of simulating multivariate normal (p dimensional) • For comparison. VarianceS. • Find R the square root ofS. • e.g. diagonaliseS=ADA’, where D is diagonal and A orthonormal • Let root(D) be a diagonal matrix with the diagonal consisting of the positive roots of the elements in D • Define Aroot(D)A =root(S) = R • Check that RR = S • Construct a vector Z of p iid N(0,1) • NowRZis multivariate normal with varianceS. • Since E(RZ(RZ)’) = RE(ZZ’)R = RR = S

  6. With MCMC • Instead of describing the distribution with the bivariate normal N(0,0,1,1,r) wedescribe it by the one-dimendionalconditional distributions • Y|X=x € Normal(rx,1-r2); • X|Y=y € Normal(ry,1-r2); • It may be shown that the pair of marginals togetherdescribe the joint distribution completely

  7. Algorithm • Starting value: anything e.g. x0=0 • Take y1 from the conditional distribution of Y1 given that X1=x0. Y1 = Normalrand(rx0,1-r22); Takex1from the conditional distribution of X1 given that Y1=y1. X1 =Normalrand(ry1,1-r22) • Continue like this. • Take yi+1 from the conditional distribution of Yi+1 given that Xi+1=xi , Yi+1 =Normalrand(rxi,1-r22); Take xi+1 from the conditional distribution of Xi+1 given that Yi+1=yi+1. X1 =Normalrand(ry1,1-r22)

  8. Algorithm The pairs (Y1,X1) (Y2,X2) (Y3,X3) … form a Markov chain, which converges in distribution to N((0,0), 1, 1,r) (convergence is exponentially fast, but slow for |r|close to one) Thus after a ”burn-in” period (say B=100) the pairs will be a sequence of (dependent) random variables from the correct distribution N(0,0,1,1,r). (see Excel sheet) Histograms based on this sequence (Y101,X101) (Y102,X102) (Y103,X103) … describe the marginal distributions. In this way on can also construct scatter plots or describe the multivariate densities in other ways. You need more observations compared to inependent observations but just let the computer work, you can get arbitary good precision. This was an example where only one-dimensional random variables were obtained but used to describe multivariate distributions. More generally lower-dimensional conditional distributions may be used to describe higher dimensional situations) This was a simple example of an MCMC-technique called Gibbs’ sampling. Usually burn-ins of more than 100 are used and the chain is repeated for at least 10 000 times (computer time is cheap). Special techniques are used to hurry upp the convergence. (For more detais see last time)

  9. Next example • Let 1, 8, 7, 9, 2, 5, 8, 9, 5, 6 be observations from a Normal distribution with unknown mean m and unknown variance s2. (n is 10, x-bar is 6, and Sx2 = 430) • The priors • The mean m was guessed to be 4.5 with a precision corresponding to 4 measurements and • the variance s2 was guessed to be 6 with a precision corresponding to 5 d.f. • The conditional posteriors are • m given X and s2 is normal with mean 78/14=5.571 and variance s2/14 • s2 given X and m is inverse gamma with parameters 15 and (30+430-10*m2 + 4(5.571-4.52+10(5,571-6) 2))/15 (df and mean) • We may simulate from these two distributions in order to get a sequence of parameter estimates from their distribution given the data.

  10. Start value: anything say s2 = 6 • Given s2 = 6 we know that m = Normal((60+18)/14, 6/14) = Normal(5.571, 0.429) • Take a random number m1 with this distribution • The first random number in the first pair is 5.571+0.654*(-0,391) = 5.314 • Given m = 5.314 we know that s2 is Inverse Gamma(15/2, 2/(5*6+430-10*62+10(5.571-6)2+4(5.571-4.52))) = G-1(7.5, 0.0197) • Take a random number with this distribution • A random number from G(7.5, 0.0197) = 0.148*c2(15)/(15) is 0,137. The second random number in the first component is thus 7.30. • Given s2 = 7.30 we know that m = Normal(5.571,7.3/14). • Take a random number with this distribution • The first random number in the second pair is 5.571+0.722*1.324 = 6.527 • …

  11. First ten simulated values

  12. First 500 simulated values Note that the simulated means look fairly independent but having skew distributions. The two diagrams to the left are often used as diagnostics

  13. Frequency polygon for the posterior distribution of the mean (based on 500 Gibbs iterations.) A 95 % interval is 4.02 to 6.95 (usually longer simulations are done)

  14. Frequency polygon for the posterior distribution of the variance (based on 500 Gibbs iterations).A 95 % interval is 3.32 to 18.04

  15. 15. Exercises in subjective probability

  16. Results of your probability assessments Based on your 155 assessments. (13 persons 13 questions and a few item nonresponse) If your assessments had been correct, the line should have been going from the corners (0,0) to /(1,1) almost like a straight line

  17. Correct answers My length 179 cm (3/7) Length of the Nile 6670 km (3/6) Iron 7.86 g/cm2 (2/6) Mexico city 21.2 millions inhabitants (1/6) Gotland area 2994 km2 (1/4) Countries in Africa 54 sovereign states (6/7) (Average) distance to the sun 8.19 light minutes (150 millions km) (2/5) Students at Stockholm university 64000 (0/6) Temperature at Easter day, 6o C (4/6)

  18. Results of your probability assessments Only seven persons handed in. Their joint coverage rate was only 22 out of 53 answers (41 %). (Much less than the 95 % which we aimed for). Best was Jalnaz with 75 % correct intervals I also measured it with min( error/(intervall-length/4) ,8). (If your estimates had been normal the expected value of the mean would be (2/p) = 0.65. The average was 5.62 (without the minimum function many far off estimates would have increased it to 35.7) With this measure the Heeva was best with 3,49 To summarise you were far to optimistic. Your estimates were best on the question on the number of African states with only one interval of seven did not cover. You can easily train on probability assessment

  19. 16. Further examples of Gibbs’ sampling

  20. Example continued from last time • Consider an agricultural study on six piglets with two explanatory variables which are strongly colinear. The piglets stem from three litters. • Model Yi=a +bx1i+cx2i+hk(i)+ei where the error terms are independent and Normal(0,sh2) resp. Normal(0,se2) (d.f. and ”mean”) • (compared to last time we have added unknown variances)

  21. Data Parameters a=47 b=129 c=-3 ABC 0,87 -0,59 -1,2

  22. Priors • a,b,c are independent Normal with means (50, 1, 0) and variances 1000, 1000, 50 • sh2 and se2 are independent and Inverse Gamma with parameters (2, 1) resp. (10, 1)

  23. For this problem three marginal distributions will be used in the posterior. • For (a, b, c, h1,h2,h3) given Y, sh2 and se2 • For sh2given Y, a, b, c, h1,h2,h3 and se2 i.e. for sh2givenh1,h2 andh3 • For se2given Y, a, b, c, h1,h2,h3 and sh2 i.e. for se2given ei = yi-a-bx1i,-cx2i –hk(i) for i = 1,…,6 • These posterior marginals are obtained • in exactly the same way as when we did it last time with known variances (normal with mean m+ TA’(ATA’+S)-1(x-Am) and variance T-TA’(ATA’+S)-1AT) • Using the theory for conjugate distributions of chi-2-distributions i.e. Inverse Gamma with parameters 2+3/2 and (2+Shi2)/3.5 • Using the theory for conjugate distributions of chi-2-distributions i.e. Inverse Gamma with parameters 10+6/2 and (10+Sei2)/13 • One may thus simulate data from the posterior distribution iteratively by running through these steps iteratively

  24. Predictive distributions • The only motivation for models is their potential for predicting further observations – a model which cannot predict anything is useless • Let us suppose that one is interested in two more piglets one from a new litter and one from litter one. • It is easy to get the predictive distributions for them. • In the simulation put in on extra step after each iteration. • Take a random observation from the distribution of the data given a, b, c, h1, sh2 and se2. • After the iterations plot the distributions • E.g. if you want 95 % prediction intervals for those piglets just take the 2.5 % and 97.5 percentiles fromthe simulated distributions, since all observations are drawn from the same distribution as the true values given the data

  25. Example: Missing data or Item Non-response Missing data Suppose we should have observed two quantities X and Y for a sample of 100. Unfortunately some y-values are missing. We have observed 100 x-values but 5 y-values are missing due to chance Model Y = a + bx + ei where ei are iid normal i=1, …, 100 Put (vague ?) priors on a, b and s2. Derive the posterior distributions for s2 and for a,b given s2.

  26. Draw a random triplet of parameters from the posterior • First for s2 • Then for a and b given s2. • Draw a random five-tuple from the predictive distribution of the missing y-values given the three parameters and their resp. x-values. • Repeat B times (e.g. B = 5 or 100). You have now B sets with complete data. • You may now do the analysis that you intended B times on the sets of completed data

  27. Classical continuation: • The result will be B estimated values q*i and B estimated variances Vqi, i = 1, …, B • Estimate q by the mean Sq*i /B • Estimate the variance that you would have got woithout response by V1 = S Vqi /B • Estimate the variance due to imputation by the variance between the B estimates V2 = S(q*i - q)2/(B-1) • The total variance is V= V1 + (B+1)V2/B • Bayesian continuation • For each complete set B find the posterior of q. • Draw a (some, m) random observation(s) from the posterior • The set of B (mB) random drawings is a sample drawn from the true distribution of q given the data (and the model) • Plot the posterior • Similar approaches may be used for other types of problems/ analyses (e.g. factor analysis or life table models) but with suitable modifications

  28. 17. Dynamic models (or time series analysis in a changing world)

  29. AR-process • Standard classical model Yt = a + b1yt-1 + b2yt-2 + et , where is normal with mean 0 and variance s2. (a second order AR-process) • Usually one looks at the economy andf analyses the longest serie that can be thought of as under a stationary economy without any important changes • Bayesian model: The parameters a, b1, b2 and s2 are random variables • Dynamic model. The parameters also change with time according to some specified model e.g. at = at-1 + ht. A model identified 10 or twenty years ago is probably not fully adequate today since the economy/society always changes

  30. A simple dynamic model • Yt = at + b1tyt-1 + b2tyt-2 + et , where • (at, b1t, b2t) = (at-1, b1t-1, b2t-1) + ht • ht are trivariate normal with mean (0,0,0) and variance S, and independent for different t • et is normal with mean 0 and variance s2, and independent for different t • The prior is that (a0, b10, b20) had a trivariate normal distribution with mean (a0, b10, b20) and variance T (for simplicity we assume that all variances are known) • Analysis next page – fairly straight forward. Don´t worry about the formulas

  31. The prior is (a0, b10, b20) had a standard trivariate normal distribution with mean (a0, b10, b20) and variance T0 • At time 1 (a1, b11, b21) is normal with mean (a0, b10, b20) and variance T0 + S before observing y1. • When y1 has been observed the distribution of (a1, b11, b21) is updated through Bayes theorem being normal • with mean (a1, b11, b21) = (a0, b10, b20) + (T0+S)X’(X(T0+S)X’+ s2)-1(y-(a0, b10, b20) X’) • and variance T1 = T0 + S – (T0+S)X’(X(T0+S)X’+ s2)-1X(T0+S), • (formulas which we gave for linear models – but do not worry about them) • At time 2 (a2, b12, b22) is normal with mean (a1, b11, b21) and variance T1 + S before observing y2. • When y2 has been observed the distribution of (a2, b12, b22) is updated through Bayes theorem being normal with mean (a2, b12, b22) and variance T2 • A.s.o.

  32. Prediction • It is fairly easy to predict future values taking into account also that parameters will change in the future • It is not so important to use a time period where the parameters are fairly stable – this analyses will take care of that down-weighting the information from old data points if the parameters change • With this model their is no need to make the standard assumption that the process is stationary (eigenvalues less than one) The process may well be exposive for a short period • Often observation errors are also added to the model Z = Yt + zt = at + b1tyt-1 + b2tyt-2 + et +zt . This analyses is not more difficult in this setting then without the extra term – but impossible to make within the standard ARIMA frame-work. The updating formula is changed and Yt has to be estimated (the posterior is found)

  33. E.g. party preferences • The following is an analysis of the the party preferences (bourgouis alliance – red green block) • All measurements are have varying precision depending on size. • It is assumed that no other mechanism is working like protest preferences during periods between elections. The model is a little more complicated due to non-normality (e.g. sample space is the interval (0,1)). The model also includes both levels and trend terms.

  34. Party preference studiessept 2006- sept 2010

  35. Forecasts of the election result for the period between the last two elections Percentage of the red-green coalition(A new forecast is made after each party preference study. The time scale here is the number of party-preference studies)

  36. Forecast of the election result for the period between the last two elections(Probability calculated after each study)

  37. Thank you, for your patienceGood luck !

More Related