Objective Analysis and Data Assimilation. Fred Carr COMAP NWP Symposium Monday, 13 December 1999. Exploring the Components of an NWP System. Objective Analysis Definition.
Fred CarrCOMAP NWP Symposium
Monday, 13 December 1999
The graphic to the right depicts the basic problem of objective analysis, namely that we have irregularly spaced observations that must provide values for points on a regularly spaced grid. (Red dots represent observations and blue dots are grid points.) Objective analysis in NWP is the process of interpolating observed values onto the grid points used by the model in order to define the initial conditions of the atmosphere.
Why isn’t this just a simple exercise in mathematical interpolation? There are several answers to this question.
1.We can use our knowledge of atmospheric behavior to infer additional information from the data available in the area. For example, we can use balance relationships such as geostrophy or mass continuity to introduce dynamical consistency into the analysis. If we use one type of data to improve the analysis of another, then the analysis is said to be multivariate (e.g., height data can be used to help the analysis of winds).
2. We can adjust the analysis procedure to filter out scales of motion that can’t be forecast by the model being used. For example, small mesoscale circulations represented in the observations may need to be smoothed out in an analysis for a global model.
3. We can make use of a first guess field or background field provided by an earlier forecast from the same model. The blending of the background fields and the observations in the objective analysis process is especially important in data sparse areas. It allows us to avoid extrapolation of observation values into regions distant from the observation sites. The background field can also provide detail (such as frontal locations that exist between observations).
Using a background field also helps to introduce dynamical consistency between the analysis and the model. In other words, that part of the analysis that comes from the background field is already consistent with the physical (dynamic) relationships implied by the equations used in the model.
4. We can also make use of our knowledge of the probable errors associated with each observation. We can weight the reliability of each type of observation based on past records of accuracy.
In simplest terms, the objective analysis equation attempts to determine the value of a particular meteorological variable at a particular grid point (at a particular valid time).
In words, the analysis equation can be expressed as shown below.Analysis Equation
In the simplest kind of objective analysis scheme, the background values would not be used and the analysis would be based solely on new observations. In this case the equation would become:
The observations themselves would be interpolated to the grid point by calculating a weighted average of the data. (One type of weight, for example, is proportional to the distance of the data from the grid point. The farther an observation is from the grid point, the less weight it gets.)
If a grid point has no nearby observations, the simple scheme described here is in trouble!
So how can we solve this problem of data void areas?Analysis Equation
One solution is to start our analysis using a short-range forecast of the same field from an earlier run of some NWP model (usually the same one that will use the analysis). If the forecast period is fairly short, say 3-6 hours, then very little error will have accumulated. This forecast (the background field, or first guess) will provide a much better estimate of the atmosphere over data sparse regions than would an extrapolation of distant observations. In the previous example, a 6-hr forecast of the surface low might produce an estimate at point “A” that would be in error by 2-4 hPa rather than 12 hPa.
The first place the background field is used is in calculating the “correction”values for each observation site.
This correction value, known as the observation increment, is the difference between the observed value and an interpolated background value for that observation point. In other words, the “new information” that will be analyzed to the grid point are the changes that the observations make to the background field, rather than the observations themselves.
The background field is also used in the final step of the analysis in that the final analysis value is defined as the background value plus the weighted sum of observation increments (corrections).
The use of the background field ensures that the analysis will blend smoothly from regions with good data to regions with no or sparse data (where the background field is allowed to dominate in determining the analysis value). Because this provides a better analysis of data sparse regions than an extrapolation of the observations, all objective analysis schemes used by NWP use background fields.
For this reason, a very high priority in improving objective analysis is to improve the background field. Two ways to do this include:
Each observation increment is weighted based on its perceived accuracy and validity. The biggest difference among objective analysis schemes is how the weighting of observation increments is done.
Ideally the weight factor should take into account:
Data should be weighted inversely proportional to their distance from the grid point. The closest observations will receive the most weight since they should be most representative of the value at the grid point.
Some objective analysis methods (such as the Cressman and Barnes schemes, which are no longer used in NWP) use only this factor in weighting. They are known as distance-dependent schemes.
If some observations were from a less reliable observing system, the weights should reflect this. More accurate observations should receive more weight.
If two or more observations of the same type are located very close to each other (e.g., surface observations, ACARS data), most operational centers will average these observations to form one value known as a “super-ob.” Since an average value is probably more reliable and representative than a single value, the error assigned to the super-ob will be less, which allows it to have more weight in the analysis.
Forecast errors should be taken into account, just as observations are. The error in the background field will be larger in regions which were not updated with new observations during the last analysis step.
If there are a lot of observations in one area, we do not want them to have an exaggerated effect on the analysis value. Redundant data have less independent information to provide to the analysis than an observation that represents a large area by itself (assuming it is reliable).
One objective analysis procedure that incorporates all four of the above factors into its weighting is the Optimum Interpolation or OI scheme. OI is based on a statistical estimation approach which seeks to minimize the analysis errors. Because of the assumptions made in applying OI in operational NWP, the scheme is not totally “optimal,” but its ability to include factors (b), (c), and (d) make it the most common objective analysis procedure used in NWP.
Consider this example of how an OI scheme handles the uneven distribution of observations.
Initially, all three values are at an equal distance from each other (we are also assuming no observational error). In this case, all analysis schemes that incorporate distance dependence compute the same weight for each value.
However, if we move observations 2 and 3 toward each other, the OI weights change. In a scheme in which only distance from the grid point is a factor, the weights would always be equal. The OI scheme recognizes that as observations 2 and 3 approach each other, they become more correlated. Thus they represent less independent information to the analysis, and, consequently, will be given less weight.
Note also that even though observation 1 does not move, its weight in the OI scheme increases. As an observation becomes more “lonely” (is less correlated with the other observations), it becomes more important to the analysis.
value at the grid point toward the values to the left of the region.Analysis Equation
An important question you might be asking yourself is, “How can a forecaster tell if an analysis is any good?”
It is a useful step for a forecaster to estimate how accurate the analysis is over a particular region. This will help in determining the reliability of the subsequent forecast. Although this is a difficult thing to do, it becomes easier with experience.
Here are three guidelines that may prove useful.
Final Comments on the Analysis Process
Because the previous forecast (or background field) is so important to the analysis, this forecast should be accurate as possible. Data Assimilation systems attempt to ensure this in two ways.Data Assimilation Definition
They make shorter-range forecasts to be used as background fields. Shorter-range forecasts should be more accurate since they are not extrapolating as far into the future. Therefore, the changes to the background fields made by new observations (the “corrections”) should be smaller.Data Assimilation Definition
by periodic re-analysis (intermittent 4DDA)
by gradual insertion (dynamic relaxation or “nudging”)
by more advanced mathematical blending techniques (e.g., variational 4DDA)Data Assimilation Definition
In the diagram, the first analysis produced is A. Although it fits the data well at T-3, it leads to a forecast that doesn’t fit the observations well by T=0. The band of green dots are the observations. Note that even data collected at the same time do not necessarily agree with each other.Future Data Assimilation
NWP COMAP Symposium
In spring 1998, NCEP replaced the Regional Optimum Interpolation system (DiMego, 1988) with a variational objective analysis scheme known as 3D-Var. This scheme is similar to that implemented in the global model in June 1991 which was initially known as the Spectral Statistical Interpolation (SSI) analysis system (Derber et, 1991; Parrish and Derber, 1992).
The 3D-Var is the analysis component of an intermittent data assimilation procedure known as EDAS (Eta Data Assimilation System) during which an analysis is produced every 3 hours. The 3D-Var has most of the beneficial properties of optimum interpolation discussed earlier but has several advantages over OI.
Like OI, 3D-Var seeks to produce an analysis by minimizing the difference between the analysis and a judicious combination of a previous forecast (the background or first guess field) and the observations. That is, we want to minimize a “distance function” J which consists of
J = JB +JO +JC
J = JB +JO +JC
A common form for the JB term is
is the analysis variable (e.g. - temp.),
is the background
is the background field and
error covariance matrix, or, in other words, the
weight given to the first guess field (good forecasts
get high weight: poor forecasts get low weight).
However, if represents radiance data from satellites, then is a set of radiative transfer equations which computes radiances from model temperature and moisture data. Thus the relatively-accurate observed radiances are used directly to correct model-estimated radiances and these corrections are fed back into the analyzed temperature and moisture variables through the solution process.
The solution is obtained by minimizing J through “standard techniques” which we won’t get into. It is important to note that all the points are analyzed at once, using all of the available data.
1. Many more “non-traditional” observational types can be included and can be included “more properly”. In other words, the analysis variables do not have to be the model variables.
For example, before 3D-Var, satellite radiance data were used to “retrieve” temperature soundings that are of much lower quality. These temperature retrievals, while useful in the Southern Hemisphere, did not improve forecasts over North America. Use of radiance data directly, however, does improve all forecasts.
Thus 3D-Var can be used to optimize the information content from all types of satellite imagery and sounder data, GPS data, Doppler radar radial wind and reflectivity, ground-based sensors (e.g. - lidars), etc.
2. All of the observations are used for all of the grid points. Previous schemes used only 30-40 data values per grid point via subjective “data selection” routines. The chosen data may not be optimal for a particular point, if, e.g., the atmospheric structure is highly anisotropic. It also removes potential discontinuities in the analysis.
3. No separate initialization step is required.
The eta model forecast variables for which analyses are needed are temperature, wind, specific humidity and surface pressure (no analysis is done for the cloud water/ice variable due to lack of observations, and this field is allowed to “spin up” during the EDAS cycle). These fields make up the vector. Recall, however, that the data types are not restricted to these variables.
Current data types used in the analysis include:
1. Surface land wind/temp./moisture obs.
2. Surface marine obs. (ships, buoys)
3. Rawinsondes (u, v, Z, T, RH)
4. Conventional and ACARS aircraft data
5. Cloud-tracked winds from GOES, Japanese and European satellites (via visible, IR and WV imagery)
6. Wind speeds over water from SSM/I
8. GOES and SSM/I precipitable water retrievals ( from microwave radiances)
9. Infrared radiances from polar-orbiting and GOES satellites
10. Profiler winds from NOAA’s WPDN
11. VAD wind profiles.