Retrieval Algorithm Frameworks Dave Turner NOAA National Severe Storms Laboratory Kerstin Ebell University of Cologne DOE / EU Ground-based Cloud and Precipitation Retrieval Workshop
Motivation • Remote sensors seldom measure the quantity that is really desired • So we must “retrieve” the quantity we desire from the observations that are made • Often an ill-defined problem (i.e., there is usually not enough information in the observations) • Classical analogy from Stephens 1991: “Remote sensing is like characterizing an animal from the tracks it makes in the sand”
Tracks in the Sand • What type of animal? • Large or small? • Young or old? • Male or female? • What color?
Tracks in the Sand • What type of animal?
From Observations to Geophysical Variables Forward RT Model F X Y Geophysical Variable (What we want to know) Radiance or Backscatter (What we observe) X+δX Ym=Y+ε ⌃ X Retrieval
The Retrieval Challenge • Desire “observations” of geophysical variables to improve our understanding of the Earth system • Remote sensing observations provide information about the Earth system, but are not direct observations of the geophysical variables we desire • Must “retrieve” the geophysical variables from the observations • Typically is an ill-defined problem • Noise hinders the retrieval; so does resolution • Metadata (data about the data) can help constrain the problem • Additional observations also help • Important to consider the uncertainties in the retrieved quantities • Calibration, calibration, calibration
Basic Retrieval Classes • “Regression” methods • Linear, quadradic approaches, Neural networks, etc • “Tuned” to mean conditions; no guarantee that retrieved profiles are consistent with observation • Computationally fast and always produces an “answer” • Could be developed from • Simulated observations • Collocated observations • “Iterative” methods • Iterative, uses forward model and a first guess • Retrieved profiles are consistent with observation • Significantly slower than regression methods • Often case-specific error characterization is provided
Example: Liquid Water Path • Many MWRs observe around 23 and 31 GHz • These observations are sensitive to the LWP and the amount of precipitable water vapor (PWV) in the column • Because of the small size of cloud droplets with respect to the wavelength, the cloud droplets are in the Rayleigh scattering regime and thus the MWR observations are insensitive to cloud droplet size • Observed signal is proportional to the third moment (i.e., <r3>) of the size distribution spectrum
“Orthogonal” LWP PWV
Retrieving LWP from the MWR (1) • Purely a statistical method • Can use historical dataset to determine coefficients ax • Requires direct observations of LWP (e.g., from aircraft) • OR • Forward radiative transfer model • Coefficients are site and season dependent • Fast and easy
Retrieving LWP from the MWR (2) Compute “opacity” at each frequency • Also a purely statistical method • Again, use historical data or simulated data to determine retrieval coefficients Tmr,dry, Lx • Advantages over other method: • Linear rather than quadratic • Less scatter than other method (i.e., better statistical fit) • Coefficients (Tmr,dry,L1, L2) are site/season dependent
Clear Sky LWP Retrieval Liquid Water Path [g/m2] Opacity Regression Retrieval Hour [UTC]
Improved Regression Retrieval • More information can often improve retrievals • J.C. Liljegren used surface meteorology to “predict” the retrieval coefficients Tmr, dry, L1, L2 • Removed the site and seasonal dependence • Improved accuracy of retrieved LWP
Improved Clear Sky LWP Retrieval Liquid Water Path [g/m2] Improved Regression Retrieval Opacity Regression Retrieval Hour [UTC]
Iterative Retrieval • Retrievals are used to ‘invert’ the radiative transfer • Regression approaches frequently will not agree with the observation in a ‘closure study’ • Iterative retrieval uses the actual forward model in an iterative manner • Start with first guess of atmospheric property of interest • Compute radiance (obs) using forward model • Compute computed “obs” with real observation, and modify the first guess accordingly • Repeat steps 2-4 until computed “obs” matches the real observations (within uncertainties) • Results will “close” with observations if retrieval converged
Considerations • The forward model may have limited sensitivity to the desired variable • Forward model may be highly non-linear, which affects how the solution is found • Multiple solutions may exist for a given observation (i.e., problem is ill-defined) • Uncertainties in the observations should be propagated to the retrieved solution • Retrievals often use other data and/or assumptions that may affect the retrieved solution; uncertainties in these parameters should also be propagated to the solution • Includes model parameters, which are often ignored • Often only partial prior info on the solution is known
Likelihood x Prior Posterior = Normalizing Constant Maximum a Posteriori (MAP) • One of several iterative retrieval methods • Uses Bayes theorem • Incorporates a priori knowledge into the maximum likelihood solution A: the variable we desire B: the observation we have
Estimating the Temperature Outside Climatology
Estimating the Temperature Outside Climatology Obs with its Uncertainty
Estimating the Temperature Outside Solution with its uncertainty
Jacobian Observation A priori Forward model State vector A priori’s Covariance “Obs” Covariance Optimal Estimation - 1 • Technique is an old one, with long history • Excellent book by Rodgers (2000) • Many good examples exist in literature • Assumes problem is linear and uncertainties are Gaussian • However, the accuracies of the uncertainty in X is directly related to ability to properly define the covariance matrix of the observations Sε, which is a non-trivial exercise • Key advantage is that uncertainties in the retrieved state vector X are automatically generated by method !
Optimal Estimation - 2 • Linear • Forward model of the form y = K x • A priori is Gaussian • Nearly linear • Problem is non-linear, but linearization about some prior state is adquate to find a solution • Moderately non-linear • Problem is non-linear, but linearization is adequate for error analysis but not for finding a solution Many problems are like this • Grossly non-linear • Problem is non-linear even within the range of the errors
Optimal Estimation - 3 • Moderately non-linear problems • No general expression for locating optimal solutions as for linear and slightly non-linear problems • Solutions must be found numerically and iteratively • Follow maximum a posteriori (MAP) approach and minimize the cost function that is the sum of the “distance” between the observation and current calculation (weighted by observational covariance” and the distance between prior and current state weighted by prior covariance • Numerical method is the Newtonian method to find successfully better approximations to the roots of the function g
Newton Method From Wikipedia
Optimal Estimation - 4 • Does not provide an explicit solution • Does provide a class of solutions and assigns a probability density to each • We chose one state from the ensemble that is described by the posterior covariance matrix • Diagonal elements of provide mean squared error of • Off-diagonal elements provide information on the correlation between elements of
Making It More Concrete: MWR Retrieval X is the state vector Y is the observation vector F is the forward radiative transfer model K is the Jacobian (2x2 matrix) S is the covariance of the observations
Radiometric Uncertainty: 15 g/m2 Even Better LWP Retrieval Iterative Retrieval Liquid Water Path [g/m2] Improved Regression Retrieval Opacity Regression Retrieval Hour [UTC]
Relative Uncertainty [%] Liquid Water Path [g/m2] LWP Relative Uncertainty • Radiometric uncertainty in MWR results in large relative uncertainty in LWP when the LWP is small • Combine different observations to improve retrieval From the posterior
Combined Infrared + Microwave Retrieval Forward models FMW and FIR need to be consistent!
Relative Uncertainty [%] Liquid Water Path [g/m2] LWP Relative Uncertainty • Combining the infrared and microwave significantly reduces the relative uncertainty in LWP for small LWP clouds From the posterior
The Classification Problem • Many retrieval algorithms are only applicable for certain types of clouds (e.g., liquid only stratiform, ice-cloud only) • Running incorrect retrieval method often grossly violates the assumptions in the retrieval, leading to huge errors • Need automated methods to classify the cloud conditions at a given time • Allows the correct retrievals to be performed • Classification algorithms provide discrete (vs. continuous) output • There is (and will always be) uncertainty in the sky classification; how to capture this uncertainty and propagate it into the retrieval uncertainty? • Simulaneously retrieve classification and cloud prop’ties?
What is Sε ? • Uncertainty in the observations and forward model • Treated as the sum of two covariance matrices • Instrument covariance matrix Sy can be difficult to determine • How to quantify this matrix? • How does it depend on conditions? • Forward model parameter uncertainties in Sb • Virtually every forward model has some tunable parameters that have some uncertainty – “unknown knowns” • Many forward models make other assumptions that we may not realize which have uncertainties – “unknown unknowns” • Often KbSbKbT is orders of magnitude larger than Sy
An Example for SyQuantifying the Noise in the AERI Radiance Observations Unfiltered PCA Filtered Applying NF reduces random error 4x but introduces some correlated error
An Example for SbQuantifying the Impact of other Trace Gases on AERI Retrievals Spectral region used for H2O Profiling
Summary • Retrieving geophysical variables from observations is a non-trivial process • Error sources include random noise in the obs, bias in obs or forward model, retrieval technique applied, small sensitivity, etc. • Adding information typically improves the retrieval • Reduces noise using ‘redundant’ channels • Improves accuracy when more sensitive channels are added • Try to add channels that are “orthogonal” • Allows additional variables to be retrieved • Forward model uncertainty and parameters important • Defining prior and observational covariances non-trivial
Good Outcome for WorkshopMy Opinion Anyway • Quantifying Sεfor the different instruments typically used for cloud / precipitation retrievals • Quantifying Sy • Identifying the important (tunable) forward model parameters • Quantifying Sb • Quantifying Safor the different geophysical variables • 1-sigma uncertainty in the atmospheric variable we desire • Between variables a and b • Between different height levels i and j • In the matrices S?, both the diagonal and off-diagonal elements are important!