Introduction to Data Assimilation

Introduction to Data Assimilation 28 November 2012

Thematic Outline of Basic Concepts • What are the basic principles to follow when constructing an initial atmospheric state analysis? • What are the differences between intermittent and continuous data assimilation? • What is the underlying statistical framework for data assimilation?

Constructing an Initial Analysis • Starting Point: a first guess estimate of the atmospheric state • Commonly obtained from a numerical model product, particularly a short-range (1-6 h) forecast. • Partially alleviates issues with model spin-up; smaller-scale circulations are present within such an initial analysis. • Mitigates issues associated with missing observations; the first guess is carried forward until data exists to correct it.

Constructing an Initial Analysis • Incorporate data: modify the first guess based upon data characteristics and physical principles • Data should be incorporated holistically; modifications to a given field should manifest in other, interrelated fields. • Ensure that atmospheric features and gradients are faithfully and accurately depicted. • Modify the first guess most substantially where data density is greatest and less substantial where it is lowest. • Modify the initial state only on the scales that are truly resolved by the model.

Intermittent/Sequential DA n = 0 (3DDA) n ≠ 0 (4DDA) Typical values for m: ~1-6 h (only for limited-area simulations)

Intermittent/Sequential DA • Examples: 3D-Var, EnKF, optimal interpolation • Observations are nominally assimilated in batches at a given analysis (or forecast start) time. • Time-space conversion implicitly used as necessary. • Assimilation uses data to modify the first guess. • The value for m is known as the cycling interval. • Smaller m: more akin to continuous DA (described next).

Intermittent/Sequential DA m = 6 h (cycling interval) Observations assimilated in batches centered on 6 h analysis times

Continuous DA • Example: 4D-Var • Enables data to be assimilated when observed rather than in batches centered on some analysis time. • To continuously assimilate data, the model must always be running. • Thus, the cycling interval is indeterminately small. • This permits forecasts to be launched at any desired time from the cycled model/DA system.

Continuous DA

Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • Nudges the first guess toward a prescribed, presumably ‘correct’ value provided by observations or an external analysis of observations. • The amount of nudging is proportional to the difference between the first guess and the observation(s). It occurs over a specified relaxation time scale, or τ.

Continuous DA f = dependent model variable fobs = observed value for f F = physical process terms (advection, parameterizations, etc.) x = spatial vector (x, y, z) τ = relaxation time scale

Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • The model integrates the primitive equations, solving them at each model grid point and time step, as normal. • In this method, data assimilation is accomplished through the nudging forcing term that is present within each equation. • This forcing term is zero in the absence of data in close spatiotemporal proximity to the grid point being considered.

Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • Relaxation time scale τis a function of… • An amplitude factor (G), determining how much to change f in relation to the other forcingsF. • An influence factor (W), determining over how large of a 4-D area to use the observation to modify the atmospheric state. • An observation quality factor (ε), determining how much weight to give an observation in relation to its expected observational error.

Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • To best make use of an observation, the relaxation time scale must be appropriately set. • Too small: model adjusts quickly, causing simulated phenomena temporarily evolve in a non-physical manner. This can impact balance in the simulated atmosphere and the stability of the model. • Too large: observations do not sufficiently correct for errors within the continually-cycling model solution. • We will discuss considerations primarily related to the specification of W as we delve into specific DA examples.

Continuous DA • Assimilating continuously rather than in batches theoretically improves analysis (and forecast) quality. • However, doing so is somewhat more complex and computationally expensive than intermittent DA. • The model is continually running, ingesting observations, and is also being launched at specified intervals to obtain a longer-range forecast. • Time constraints associated with operational NWP have historically limited the widespread use of continuous DA in operational environments.

Cautionary Note • The resolution of observations is typically coarser than that of the model used for DA and forecasts. • Nudging a finer-scale analysis toward a large-scale set of observations can disrupt or dampen the largely spun-up finer-scale circulations within the model. • Care must be taken upon assimilation to only modify data on the larger scales resolved by the analysis.

Data Assimilation • The process by which observations are incorporated into an estimate of the initial atmospheric state. • The true atmospheric state is unknowable. • Want the best possible estimation while satisfying an appropriate balance condition within the model. • Observations are incorporated… • Over a period of time (i.e., not suddenly at one time) • Not just at the location of the observation • Not just for the variable(s) observed

Data Assimilation: Definitions • State vector (x): the vector that defines the simulated atmospheric state. • Analysis, observational, and representativeness errors all keep the state vector from matching reality. • True state vector (xt): the best possible representation of the simulated atmospheric state on the model grid. • Representativeness errors from discretizing a continuous fluid on a finite model grid keep this from matching reality.

Data Assimilation: Definitions • Perfect state vector (xp): reality • Background (xb): a “first guess” estimate of the initial atmospheric state • Analysis (xa): the post-assimilation estimate of the simulated atmospheric state • Each of the aforementioned vectors is of dimension n, where n = # of variables * # of grid points.

Data Assimilation: Definitions • Ideally, xa= xt. Since this is generally not feasible, however, we desire to minimize the error in xa… • The analysis and background are related to each other through the use of an analysis incrementδx that is dependent upon the observations…

Data Assimilation: Definitions • Observation vector (y): collection of observations of dimension p, where p = # of observations. • Data assimilation starts by comparing y to xb. • This is done in observation space, for the observed variable at its location, rather than on the model grid. • Forward/transform operator H(x): transform a field from model space to observation space. • In its most simple form, H(x) is an interpolation operator. It may also act to convert between related variable types.

Data Assimilation: Definitions • Innovation: difference between the observation vector and the transformed background estimate. • Estimate of how much correction of the background state is necessary based upon the observed field(s). • Analysis residual: difference between the observations and the transformed analysis. • Estimate of how the final analysis differs from observations.

Data Assimilation: Definitions • Because we do not have perfect observations available everywhere on the model grid, the analysis residual will be non-zero. • Instead, we seek to minimize the analysis residual. • Must also keep in mind observational error characteristics and the need to maintain dynamical consistency. observation space model space

Statistical Framework for Data Assimilation Least-Squares Estimation • Consider the temperature in Milwaukee. • True value: Tt • Two estimates of the temperature… • Observation: To • Background: Tb (however obtained) • Both To and Tb are imperfect measurements of Tt. • Observational error: εo • Background error: εb

Statistical Framework for Data Assimilation • To obtain an analysis temperature (Ta), we need to optimally combine Tb and To based upon their individual error characteristics. • Define: • Recall: expected value E( ) • Analogous to the mean of a infinitely-sampled discrete random variable.

Statistical Framework for Data Assimilation • Assume: that the means by which To and Tb are obtained are unbiased. • In order words, errors in To and Tb are random. • We also assume that we know something about the error characteristics of To (εo) and Tb (εb). • Recall: variance (σ2) μ = mean (analogous to Tt) x = estimate (like To or Tb) N = population size defines the average of the squared error

Statistical Framework for Data Assimilation • This allows us to write… • We assume that the background and observational errors are uncorrelated to each other, such that…

Statistical Framework for Data Assimilation • The least-squares best-fit of To and Tb to obtain Ta is given by: • The coefficients ao and ab are chosen to minimize the mean squared error of Ta (defined by σa2), i.e., (fractional coefficients on To and Tb) (noting that Tt is equivalent to aoTt + abTt in the above)

Statistical Framework for Data Assimilation • Substituting with εo and εb, • Since εo and εb are uncorrelated, the 2aoεoabεb term goes away. • By definition, E(εo2) = σo2 and E(εb2) = σb2. Thus,

Statistical Framework for Data Assimilation • Let ao = k. Thus, ab = 1-k. We call k an optimal weighting factor. Substituting, • Recall that we want to minimize the mean squared error σa2. By definition, this occurs for . Thus,

Statistical Framework for Data Assimilation • If we solve for k, we obtain: • This is the background error variance divided by the total error variance. • Larger background uncertainty = more correction by observations given that k is the weight on To. • Conversely, less background uncertainty = less correction to the background state given that 1-k is the weight on Tb.

Statistical Framework for Data Assimilation • Because of the definitions of Ta and k, we can write: • By definition, To-Tb is the innovation. • Formally, this requires a transform between model and observation space, but we’ll assume this has been done. • The analysis temperature is equal to the background temperature plus an optimally-weighted innovation. • The weighted innovation is simply the analysis increment!

Statistical Framework for Data Assimilation • If we plug in for k, we obtain: • Because ao + ab = 1, must equal

Statistical Framework for Data Assimilation • The analysis temperature thus depends upon the variances of the estimates of Tb and To (or, in other words, the expected errors of each estimate). • If the error in one is large, give other more weighting. • Minimal observation error: analysis resembles observation. • Large observation error: analysis resembles background. • Can make similar arguments based upon background error.

Statistical Framework for Data Assimilation • Plug in with the k minimizing σa2 to obtain: • This is equivalent to stating that σa2 = kσo2, or σa2 = (1-k)σb2. Since k ≤ 1, this means that σa2 is less than either σo2 or σb2. • In other words, the analysis variance is smaller than the variance of both the background and observation.

Statistical Framework for Data Assimilation • Likewise, take the inverse of the previous equation: • The inverse of the variance is known as the precision. • The precision of the analysis is equal to the additive precisions of the background and observation. • Estimates with less error have higher precision. Two good estimates result in a very good analysis!

Statistical Framework for Data Assimilation Cost Function Minimization • We want to find the analysis that minimizes the combined squared errors in To and Tb, each as weighted by the precision of their measurements: • Before: minimizing analysis variance (similar concept) J(Tb) J(To)

Statistical Framework for Data Assimilation Cost = squared error weighted by precision High cost = large sq. error, low precision Low cost = small sq. error, high precision

Statistical Framework for Data Assimilation • Similar to before, the minimum is defined by … • Manipulating to solve for Ta, we obtain:

Statistical Framework for Data Assimilation • Continuing from the previous slide, • Equivalent result to the least-squares method, except using a different framework for the problem!

Statistical Framework for Data Assimilation • Both least-squares and cost function minimization are used in real-world data assimilation systems. • 3D-Var, 4D-Var: cost function minimization • Kalman filter: form of least-squares minimization • Thus far, we’ve only considered a simple example. • One variable, one time, one location, no transformation from model to observation space needed. • In reality, however, the problem is multidimensional!

Statistical Framework for Data Assimilation • Whether in one or many dimensions, the accurate computation of the variance terms is crucial to obtaining the best-possible analysis state. • In our simple example, the observation and background variances determined the weighting upon each observation. • In other words, they influenced the magnitude of the analysis increment:

Statistical Framework for Data Assimilation • In multiple dimensions, they also influence the spread of information. • Controls how a point measurement impacts and/or is influenced by surrounding grid points. • Three different manifestations of information spread… • Between the same variable at different location. • Between different variables at the same location. • Between different variables at different locations. • Spread can be isotropic (conic decaying) or non-isotropic (e.g., non-uniform) in nature, depending on DA method.

Statistical Framework for Data Assimilation • We begin developing the multidimensional problem by considering the background variance, σb2. • The multidimensional analog is the background error covariance matrix, or B. • The purpose of B is to translate information from an innovation vector (y – H(xb)) into a spatially-varying analysis increment (δx) and apply it to the background to minimize the analysis error (xt – xa).

Statistical Framework for Data Assimilation • Simple 1-D Example: σb2 = E(εb2) = • Multidimensional analog: • This defines an n x n symmetric, square matrix. • Diagonals: variances between two background estimates • Off-diagonals: cross-covariances between two background estimates (avgd. sq. error) (T = transpose matrix)

Statistical Framework for Data Assimilation • For the case where n = 3, such as for three variables at one grid point, B takes the form (for em = εbm – εtm): • As noted before, this helps to define both the spread and amplitude of background adjustments.

Statistical Framework for Data Assimilation • But, how do we actually compute (or estimate) B? • Method 1: pre-calculated B • Often determined from an average of many different atmospheric states, whether from observations (climatology or otherwise) or from model analyses or forecasts. • Typically independent of current meteorological conditions. • i.e., “flow independent” – not necessarily ideal!

Statistical Framework for Data Assimilation • Simple multidimensional problem • 1 variable (z) • 1 altitude (500 hPa) • Only ‘spread’ is in space

Statistical Framework for Data Assimilation • Method 2: regime-dependent B • Makes use of the current, regime-dependent “errors of the day” to estimate the B applicable to the current case. • Robust method; represents current best practices. • Computing power constraints have historically limited operational NWP to the flow independent estimates of B. • However, these are slowly giving way to flow-dependent methods. • Prime example: Ensemble Kalman filtering (EnKF)

Statistical Framework for Data Assimilation Flow-Independent Case • ub flat (constant westerly u) • Observation y produces a positive innovation maximized at the observation location, nominally spread isotropically in space. • Result: local bulls-eye in ua, as demonstrated by the isotachs.

Introduction to Data Assimilation