Feature-Based Approach to Ensemble Data Assimilation

Towards a Feature-based Approach to Ensemble Data Assimilation D. McLaughlin – MIT, USA How can we incorporate geometric structure into data assimilation? Can we do more to insure that our estimates are physically realistic? Can we apply ideas from computer vision & machine learning to geoscience data assimilation problems?

Robust design Available info. Risk ? $$$ Robust control Oil Prod. Ensemble Characterization Reservoir Characterization - Motivation/Context What would we like to obtain? A set of most likely reservoir configuration(s) To support design, risk assessment, control

What is a Realistic Characterization ? Characterization products should “look like” natural features. We need quantitative measures of geological realism. • Spatial means & covariances? • Multipoint statistics ? • Connectedness, flow paths, other measures? Constrain characterization to produce realistic features Computer vision approach --use observed features todefine realism and trainthecharacterization procedure.

Geophysical Examples Two characterization problems SGP summer rainstorms Features: Rainy areas Estimate rain intensity from remote sensing data – interpolation problem How can we preserve feature structure when we incorporate noisy meas ? Subsurface oil reservoir Features: Geological facies Estimate facies properties, saturation, pressure from production & seismic data – inverse problem

feature Posterior Likelihood Prior Meas vector Prob Likelihood Posterior Prior Feature property meas Bayesian Perspective Extend Bayesian formalism to accommodate geometric features Sample space(possible outcomes = geometric objects) Events(sets of outcomes = objects w. similar properties) Probability measure(assigns probabilities to events) Use Bayes rule tointegrate prior information w. new measurements Rather than derive posterior probability directly, generate an ensemble of representative samples EnKF, Importance sampling & particle filters, MCMC

proposal last accepted Prob of accepting proposal Markov Chain Monte Carlo MCMC – A conceptual framework for feature-based Bayesian estimation • Specify set of possible features & associated probability measures (prior, likelihood, proposal) • Generate a sequence of proposed features (geologically realistic) • Accept proposals that are consistent with Bayesian posterior • Identify most likely features from accepted proposals MCMC tests many candidate features, searching for those that are most compatible with prior info and meas. Accepted proposals samples from posterior Importance sampling, particle filtering & EnKF may be viewed as variants of MCMC

Naive Approach to MCMC and Importance Sampling Randomly try lots of solutions until you find a few that fit the data Given enough time, a hypothetical chimpanzee typing at random would, as part of its output, almost surely produce one of Shakespeare's plays. Quicker with infinite monkeys …. If the system is high dimensional and there are many measurements this naïve approach is not likely to work. It is essential to pick proposals intelligently – especially to learn from observations and from past successes – application dependent For feature estimation constraining proposals to geologically realistic options should focus the search.

* Ensemble Kalman filter[Naevdal et al, 2003; Jafarpour & McLaughlin, 2009] Proposals obtained by transforming prior replicates (Kalman update) All proposals are samples from posterior (if posterior is Gaussian) Easiest, most reliable option Variations, Special Cases Gaussian MCMC[Oliver et al, 1997; Ma et al, 2006, Efendiev et al, 2008] Prior & meas error are Gauss-Markov random fields Random walk, Langevin, EnKF proposals Distributional assumptions similar to EnKF, but MCMC is less efficient Importance sampling / Particle filtering[Zhou et al, 2006; Ng et al, 2009] Proposals are samples from the prior (may be non-Gaussian) Weight proposals according to their likelihood Accept all proposals, obtain posterior ensemble by resampling Inefficient Feature-based MCMC in computer vision/medical imaging [Fan et al, 2007] Proposals obtained by perturbing boundary curve Use Metropolis-Hastings, asymmetric proposal probability Appropriate only for simply connected objects

b) True (radar) a) Measured (µwave & IR) • d) Prior ensemble mean c) Three representative prior replicates f) Posterior ensemble mean e) Corresponding three posterior replicates Cloud Clear EnKF Performance for the Rainfall Problem US Great Plains Summer 2003 Intensity

Production Injection Layer 2 Layer 3 Layer 4 Layer 5 Mean True EnKF Performance for the Reservoir Problem SPE10 virtual reservoir Dynamic meas 10 layers, 13 wells Oil production from water flood P3 True Typ. Posterior Rep Ensemble mean P6 P9 Jafarpour & McLaughlin, 2009

Pixel-based Feature Descriptions How do we describe a feature ? -- Discretize over an n pixel grid Feature support 2n possible features Feature represented as a vector of pixel values Or … as a vector of truncated transform coefficients (image compression) Original image – pixel-based Feature support + texture ∞ possible features Only some of these possible features are realistic Can we restrict our attention to these features? Probability measures? Compressed 90% -- DCT/JPEG-based

Prior probability: Likelihood: Proposal: Gaussian assumption: Assume (characterized by x)is Gaussian: Feature Probability Measures - Gaussian For Bayesian analysis we need the probability of a specified feature : (for MCMC, IS) What is the prior probability of each of these features … ? Anisotropic Gaussian field Meandering stream (not Gaussian) Carbonate shoals (not Gaussian) Mean feature Prior: . Convenient but limited.

Training image Sarma et al., 2000 Strebele, 2002 Channel Template pattern Feature Ensembles – Training Images & Priors An alternative to Gaussian: Assume realistic feature = prior replicate generated from specified training image Infer corresponding from a prior ensemble Prior Replicates Subsurface reservoir example: Multipoint geostat (SNESIM) Multipoint technique identifies patterns within a moving template that scans image Number of times each alternative occurs Pattern probability Replicate generator

Feature Ensembles – Training Images & Priors Rainfall example Prior Replicates GOES cloud cover Rain support & intensity vary across replicates Cloudy region is fixed Training image Rain NOWRAD radar locates rain within cloudy region Assume meas errorrealistic feature = any Gauss-Markov field

= Forward model Oil rate d * Meas error probability is Gaussian * * pe ~ Gaussian N(0,Cee) G(F) * * * * * * Reasonable for production data * * * * * * Time Evaluate likelihood from Feature Ensembles – Measurement Errors & Likelihood Subsurface example – inverse problem Meas eq: Meas. error Likelihood: Gaussian Assumption:

Likelihood: Meas eq: Error structure is more complex -- infer from meas error ensemble Satellite meas Error replicate 1 Meas error is large & localized in vicinity of rainy area (non-Gaussian) Difference (note effect of position error) False Rain Rain Missed rain No rain NOWRAD (groundtruth) Feature Ensembles – Measurement Errors & Likelihood Rainfall example - interpolation An alternative to Gaussian: Difference noisy meas and ground truth (where available)  Meas error replicates

y2 x3 y1 x2 Use this to obtain: , perhaps x1 Inferring Non-Gaussian Probabilities from Ensembles Derive probabilities by clustering (binning) prior or meas error replicates in sample space. This becomes more difficult as dimensionality increases (emptiness of space). Map high dimensional sample space to lower dimensional space to assign probs. But how ? Essential attributes need to be preserved. One approach (computational chemistry, data mining): Construct low dimensional spaces that are distance-preserving Features close in original space are close in low dimensional space Define probabilities from feature densities in low dimensional space (e.g. use kernel densities) Ongoing research !

Generating Feature Proposals Proposals: Should be realistic consistent with prior information (e.g. training image) Should cover the supportof the posterior prob Should have high acceptance prob (efficient) A possible option: Obtain new proposals by perturbing feature level set functions: Compare to truncated Gaussian[Agbalaka & Oliver, 2008] Feature support Level set Signed distance function

Feature proposals 6400 pixel image Original image Larger level set perturbations Corresponding signed distance function Feature Proposals from Perturbed Level Set Functions Approximate level set function with 100 leading KLT coefs. Randomlyperturb coefs by to get proposals Perturbations should be constrained to insure that proposals are consistent with training image

Production Injection Total oil recovery Repl 10 most likely 700 most likely True Preliminary Tests -- Importance Sampling Select proposals from prior ensemble obtained from training image. Gaussian meas error. True and 5 most likely proposals, based on production data 2D water flood Dynamic meas in 13 wells Selected BHP repls w. meas * Meas

Research Challenges for Feature-based Ensemble DA • Identification of application-appropriate probabilistic measures – Cannot always rely on a Gaussian description. How can we infer feature probabilities from training images or meas error ensembles? • High dimensionality– Non-Gaussian probabilities are difficult to define and use when meas and/or feature spaces are high dimension. Probabilities need to be identified in low dimensional spaces. • Proposal generation– Proposals should be physically realistic and sufficiently diverse to improve on prior. But … it is more difficult to derive probabilities of complex proposals. • Ensemble collapse – Methods can become over-confident and get stuck. Versions of this problem are encountered in most Bayesian approaches. • Computational demands – Large ensemble sizes are infeasible with expensive models. We need reduced-order models to use in the proposal generation/selection process (e.g. Ma et al, 2006). • What do we do with results? – it is difficult to interpret very large posterior ensembles. Screen results, identify most likely features (modes)

Feature-Based Approach to Ensemble Data Assimilation