1 / 45

A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002. World Cup Algorithm.

emil
Download Presentation

A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

  2. World Cup Algorithm 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002  One-time only home win Europe  25 miles apart Played in Europe Played in exotic place Played in (L) America Played in Europe Played in L. America Center of the world football ? 2006

  3. A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani Joint work with Marina S. Paez (IM-UFRJ) Flavia Landim (IM-UFRJ) Victor de Oliveira (Arkansas) Alan Gelfand (Connecticut) Sudipto Banerjee (Minnesota) 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

  4. Introduction Environmental science – data in the form of a collection of time series that are geographically referenced. Some examples can be found in other areas • Examples: • 1) measurements of pollutants in time over a set of monitoring stations 2) selling price of properties around a neighborhood of interest 3) counts of morbidity/mortality events in time over a collection of geographic regions

  5. Main Objective: spatial interpolation Example: Pollution in Rio de Janeiro Paez, M.S. and Gamerman, D. (2001). Technical report. Statistical Laboratory, UFRJ.

  6. Picture showed mean interpolated values Other features of interest can be obtained Example: Pollution in Rio de JaneiroProb ( PM10 > 100 mg/m3 | Yobs )

  7. SpatialInterpolation • m = number of observations • g = number of grid points • s1, ... ,sm = observed sites • s1n,...,sgn = grid points (to interpolate) • Y1n,...,Ygn = observations in the grid points

  8. Interpolation • 1. Frequentist inference: generate Yn from • 2. Bayesian inference: i ) generate y from • ii ) generate Yn from • y- all model parameters • Ymis - missing data, treated as parameters • If y = y* with probability 1 then • Obtain P(Yn|Yobs)by simulation. • Steps to generate from Yn|Yobs :

  9. GaussianProcess (GP) (or Gaussian Random Field) S - region of Rp (in general, p=2) { w(s) : s S } is a GP if n, s1 , ... , sm S ( w(s1) , ... , w(sn) ) ~ Nn (, ) where  = ( 1, ... , n ) with i=E[w(si)] and  = (ij[w(si), w(sj)] )i,j Usual simplifications: 1) Isotropy [w(si),w(sj)]= q(hij) with hij=|si– sj| 2) Homoscedasticity i = , i Notation: w(.) ~ GP((.),,q)

  10. Statistical Analysis Starting point: regression models Yt(s) = t(s) + e t(s) where t(s) = 0 + 1 X t1(s) + ... + pXtp(s) and et(s) ~ N(0, e2) independent Suppose that Xtj(s) handles temporal autocorrelation Otherwise, we can include a temporal component t Usually et(s) remains spatially correlated In this case, et(s) = e0(s) + et1(s) e0(s)  errors spatially correlated et1(s)  pure residual (white noise) 0(s) = 0 + e0(s)

  11. How to estimate 0(s) ? (c) Inference based on Traditional approach: geostatistical 0(.) ~ GP(0,,q) or e0(.) = 0(.) 0 ~ GP(0,,q) then, 0obs ~ N(0 1, , R) 0obs = (0(s1) , ... , 0(sm) ) Hiperparameters: e2, 2 and 0 Inference 1. At first (3 steps) (a) 0 , 1 , ... , p estimated in the regression model and the residuals rt0(s) = Yt (s) mt(s) are constructed (b) e2, 2 and 0estimated from rt0(s)

  12. Problems: (a) rt0(s)  et (s) (b)  • 2) next step: • 0 , 1 , ... , p and estimated jointly •  solves (a) • but to incorporate uncertainty about is complicated • 3) Natural solution (Kitanidis, 1986; Handcock & Stein, 1993): • specify distribution for 0 • perform Bayesian inference

  13. Model generalization Recall: E[Yt(s)]=0(s) + 1Xt1(s) + ... + pXtp(s) Spatial heterogeneity doesn’t have to be restricted to 0 Example: site by site effect of temperature in the Rio pollution data

  14. We can accommodate spatial variation for other coefficients j, j=1, ... , p. a) prior independence b) same spatial structure and prior correlation between the j(.)´s previous model E [Yt(s)] = 0(s) + 1 Xt1(s) + ... + pXtp(s) Extension of the previous model E [Yt(s)] = 0(s) + 1(s)Xt1(s) + ... + p(s)Xtp(s) (.) = (0(.), 1(.),..., p(.)) One possibility: (.) ~ GP(, , ) Hyperparameters: Y = (g , se , S , q0), where g = (g0, g1,..., gp) Special cases for the j(.)´s:

  15. 1) classical solution (Oehlert, 1993; Solna & Switzer, 1996): (a) 0 (s), 1 (s), ... , p (s) estimated by b0(s), b1(s), ... , bp (s) in the local regression model (b) estimated from the bj(s) (c) inference based on Problems (the same as before): (a) bj(s) j(s) (b)  How to estimate j(s), j=0,1,...,p ? 2) natural solutions: Specify prior distribution for Y In general, independent and non informative priors are used

  16. Model Summary • Parameters: obs , where  = ( ,e2, ,) • jobs = (j(s1) , ... , j(sm) ), j=0, 1, ... , p • obs = (0obs , ... , pobs ) • = ( 0 , 1 , ... , p ) • Data: Yobs = (Y1(s1) , ... , YT(sm)) • Xobs = (X1(s1) , ... , XT(sm))

  17. Simulated data Yt(s) = mt(s) + et(s), t=1,...,30mt(s) = b0(s)+ b1(s) Xt(s)et(s) ~ N(0, e2) independent with e2=1 b0 ~N(g0, s0 ,0(.)) b1 ~N(g1, s1 ,1(.)) Xt(s)~N(g2, s2 ,2(.)), for all time tExponential correlation functions: rj(x) = exp{- qj x} g0= 100 g1= 5 g2= 0 q0= 0.4 q1= 0.8 q2= 1.5s02= 0.1 s12= 1 s22= 0.333

  18. Simulated Data b0 Y b1X e + = +

  19. Inference Parameters: (obs ,) = ( ,e2, ,) Likelihood: L(obs ,) = p(Yobs | obs , e2 ) Prior: p(obs ,)=p( obs | Y) p()p(e2) p(S) p(q) Posterior: (obs ,)  L (obs ,)  p(obs ,) • Many parameters • Complicated functional form • Solution by MCMC

  20. Full Conditionals (a) [ obs | rest ] ~ Normal (b) [  | rest] ~ Normal  use jobs as if they were data (c) [ e2 | rest ] ~ [ e2 | Yobs , obs ] ~ Inverse Gamma (d) [  |rest ] ~ [  | bobs ,  , q] ~Inverse Wishart (e) [  | rest ] ~ j p(j | jobs , j,) p()  again, use jobs as data (geostatistical analysis) hard to sample  Metropolis - Hastings

  21. Results (based on a regular grid of m=25 sites) Histogram of the parameters g0 g1 q0 q1 te t0 t1 ti = si-2

  22. Spatial Interpolation Interpolation grid: s1n , ... , sgn jn = (j(s1n) , ... , j(sgn) ), j=0, 1, ... , p n = (0n , ... , pn ) We need to obtain the interpolation of j´s to interpolate Yn

  23. Spatial Interpolation of ´s (n,obs,| Yobs) = ( n | obs, , Yobs) ( obs, | Yobs) = ( n | obs ,) ( obs, | Yobs) Simulation of [ n | Yobs ] in 2 steps: (a)[ obs, | Yobs ]  using MCMC (b)[ n | obs ,]  using Multivariate Normal • Interpolation of Y´s (Yn,n,| Yobs) = (Yn|n, , Yobs) (n,| Yobs) = (Yn| n ,) (n,| Yobs) Simulation of [Yn |Yobs] also in 2 steps: (a) [ n, | Yobs ]  MCMC and Spatial Interpolation (b) [ Yn| n ,]  using Multivariate Normal

  24. Simulated data: Interpolation of b1 Simulated values Interpolated values

  25. Simulated data: Interpolation of Y30(.) Simulated values Interpolated values

  26. Interpolation of X´s These interpolations assume that the interpolated covariates Xj are available for j=1, ... , p Otherwise, we must interpolate them

  27. Model may be completed with X(.) | x , x ,Sx ~ GP(x, Sx,x(.)) (Xn, x | Yobs , Xobs) = (Xn , x| Xobs ) = (Xn| x, Xobs) (x | Xobs ) Simulation of [Xn|Yobs,Xobs] in 2 steps: (a) [x | Xobs ]  MCMC (b) [Xn| x, Xobs ]  using Multivariate Normal

  28. Histogram of the parameters g0 g1 te q0 q1 q2 t2 t0 t1 Results obtained by interpolating X Precisions less sparse then when X is known

  29. Interpolation of X30(.) Simulated values Interpolated values

  30. Interpolation of Y30(.) Known X Unknown X

  31. Now, the temperature coefficient varies in space Yt(s)= b0 (s)+ b1(s)TEMPt + b´ Xt+ et(s) Application to the pollution data Yt (s) = square root of PM10 at site s and time tXt= (MON, TUE, WED, THU, FRI, SAT) et(s) independents N(0,se2)b0~N(g0, s0 ,0(.))b1~N(g1, s1 ,1(.))ri(.), i=0,1 are exponential correlation functions

  32. Histogram of the hiperparameters sample b0 te q0 t0 t1 q1 where ti = si-2 Results for the pollution data in Rio

  33. G(10-3,10-3) G(10,10c) SSDE = 0.1444 SSDE = 0.0637 where SSDE = Interpolation of the b1 coefficient Prior for t c  obtained by exploratory analysis site by site (OLS) Same idea can be used for q (explanatory geostatistical analysis)

  34. Extension of the previous model previous model Yt(s)= t(s) + et(s) where t(s)=t0(s)+t1(s)Xt1(s)+...+tp(s)Xtp(s) et(s) ~ N(0, e2) independent Yt(s)= t(s) + et(s) where t(s)= 0(s)+ 1(s)Xt1(s)+...+ p(s)Xtp(s) et(s) ~ N(0, e2) independent We can also accommodate temporal variation of the coefficients j, j=0,...,p. Natural specification t(.) | g t ~ GP(t ,S,), independent in time The model must be completed with: (a) prior for (se , S , q) as before (b) specification of the temporal evolution of the t´s

  35. Suggestion - use dynamic models (SVP/TVM) • (Landim & Gamerman, 2000) • t | t-1 ~ N( Gt t-1 , Wt ) •  unknown parameters of the  evolution Model parameters: obs , ,   = ( 0 , e2 , S ,  , W ) where  = ( 1, ... , T) and t= ( t0 , t1, ... , tp ), t=1, ... , T Simulation cycle has 2 changes: I) additional step to  II) modified step to 

  36. Histogram of the posterior of q Application to simulated data Multivariate observations: Yt (s) = (Yt1 (s), Yt2 (s)) Yt (s) = bt0 (s) + bt1(s)Xt1(s) + et(s) bt(.) ~GP (gt, S,) gt = gt-1 + wt same spatial correlation to b0 and b1 r(.) - exponential correlation function with q=1.

  37. Trajectory of g (t) - mean and credibility limits

  38. Interpolation Samples from ytn|yobs are obtained through the algorithm below: 1. Sample from btobs, y| yobs - through MCMC 2. Sample from btn| btobs, y - through Gaussian process 3. Sample from ytn| btn, y - Independent Normal draws Once again, Xtn must be known, otherwise, they will have to be interpolated.

  39. Spatially- and time-varying parameters (STVP) SVP/TVM: STVP: Another possibility: temporal evolution applied directly to the bt processes rather than to their means Yt (s) = bt0 (s) + bt1(s)Xt1(s) + et(s) bt(.) = bt-1(.) + wt(.) wt(.) ~ GP (gt, s,) independent in time Completed with: b0(.) = g0 ~ N(g0,R) Marginal Prior: bt(.)| gt ~ GP (gt, tS,) Not separable at the latent level, unlike SVP/TVM

  40. Computations MCMC algorithm must explore the correlation structures  parameters are visited in blocks (Landim and Gamerman, 2000; Fruhwirth-Schnatter, 1994) • Prediction Based on the forecast distribution of YT+h|Yobs, for YT+h = (YT+h(s1f ),..., YT+h(sFf )), and any collection (s1f,..., sFf) 1. Sample from bTobs, y| Yobs - through MCMC 2. Sample from bT+hf| bTobs, Y - obtained by introduction of bT+hobs bTobs bT+hobs by successive evolution of the b process bT+hobs  bT+hfby interpolation with gaussian process 3. Sample from Ytn| bTn, Y - Independent normal draws

  41. Time-varying locations SVP/TVM: Easily adapted STVP: requires introduction of for updated locations Assume locations st = (st1,..., stnt) at time t btobs is a nt-dimensional vector, t = 1,...,T Both densities in the integrand are multivariate normal The convolution of these two densities can be shown to be normal and required evolution equation for b can be obtained

  42. Can be normalized after suitable transformation gl(y) Example: Rio pollution data Non-Gaussian Observations Two distinct types of non-normality data: • Continuous: l estimated jointly with other model parameters (de Oliveira, Kedem and Short, 1997) • Count data: For example, in the bernoulli or poisson form standard approach: yt(s) ~ EF(mt(s)) spatio-temporal modeling issues: similar computations: harder

  43. Non-Gaussian Evolution Abrupt changes in the process  normality is not suitable Robust alternative: wt(.) ~ GP(gt,S,rq) is replaced by wt(.)| lt ~ GP(gt,lt-1S, rq) and lt ~ G(nt, nt), independent for t=1,...,T Therefore, wt(.) ~ tP(gt,S,rq) lt’s control the magnitude of the evolution

  44. Final Comments • More flexibility to accommodate variations in time and space. • Static coefficient models: samples from the posterior were generated in the software BUGS, with interpolation made in FORTRAN • Extensions to observations in the exponential family and estimation of the normalizing transformation. • Extension to accommodate anisotropic processes to some components of the model.

  45. A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

More Related