Biomedical Imaging 2 Class 3,4 – Regularization; Time Series Analysis (Pt. 1); Image Post-processing (Pt. 1) 02/05/08
Well-Posedness, Ill-Posedness • Definition due to Hadamard, 1915: Given the mapping A: X→Y, the equation Ax = y is well-posed if • (Existence) For every y in Y, there is an x in X such that Ax = y. • (Uniqueness) If Ax1 = Ax2, then x1 = x2. • (Stability) A-1 is continuous • x = A-1y • A-1(y + dy) = x + dx • Ax = y is ill-posed if it is not well-posed
Well- and Ill-conditioned Problems • Overdetermined linear systems (more equations than unknowns) are ill-posed, strictly speaking • No exact Solution exists! • Existence is imposed by using least-squares solution • Underdetermined linear systems (fewer equations than unknowns) are ill-posed, strictly speaking • Infinitely many Solutions exist! • Uniqueness is imposed by using minimum-norm solution • Can a discrete linear system be unstable?
Well- and Ill-conditioned Problems • Can a discrete linear system be unstable? • Strictly speaking, no! • A-1(x0 + x) = A-1x0+ A-1∙x • All elements of A-1 and of x are finite • Therefore, all elements of A-1∙x must be finite • However, it certainly can be true that ||A-1∙x||/||A-1x0|| >> ||x||/||x0|| • Small change in input → large change in output • Such a system is called ill-conditioned
Example of Ill-conditioning 4×4 Hilbert matrix: This is a full-rank, non-singular matrix, and so it has a well-defined inverse:
Example of Ill-conditioning Positive and negative products cancel in exactly the right manner: But what happens if we change any element by even a small amount?
Our “image reconstruction” operator is unbiased But it has high variance Example of Ill-conditioning
-4 60 -180 140 Tradeoff Between Bias and Variance The numerical values in y are eventually represented as gray levels or colors in an image: As long as the color pattern makes an interpretable image, do you care if the numerical values are exactly right? That is, are you willing to give up accuracy to gain precision (i.e., decrease variance by increasing bias)?
Regularization term Regularization parameter Regularization • For overdetermined systems, we define the pseudo-inverse A+ = (ATA)-1AT. • For underdetermined systems, we define the pseudo-inverse A+ = AT(AAT)-1. • For the Hilbert matrix, both of the preceding reduce to the true inverse: • (ATA)-1AT = [A-1(AT)-1]AT = A-1[(AT)-1AT] = A-1 • AT(AAT)-1 = AT[(AT)-1A-1] = [AT(AT)-1 ]A-1 = A-1 • Now we introduce one additional term: • A+ = (ATA + αI)-1AT • A+ = AT(AAT + αI)-1
What Does Regularization Do? • Now we introduce one additional term: • A+ = (ATA + αI)-1AT, A+ = AT(AAT + αI)-1 • This particular variety is called Tikhonov regularization • Imposes continuity on the computed y • That is, limits the spatial scale on which solution can change (long-pass filter) • Could replace the I in the regularization term with a discrete 1st, 2nd, etc., derivative operator • Then continuity would be imposed on the corresponding derivative of the solution
Something To Watch Out for • Two things that can go wrong when Tik. Reg. is used: • α is too small (under-regularized case): noise continues to wreak havoc • α is too large (over-regularized case): ability to capture spatial variations of interest is lost • Is there an algorithm guaranteed to produce the optimal α? • Alas, no (when is life ever that easy?) • Special cases; Monte Carlo simulations; trial-and-error
Impact of Noisy Data Unregularized solution (i.e., α = 0): Conclusion: Under-regularized!
Impact of Noisy Data Unregularized solution (i.e., α = 0): Conclusion: Over-regularized!
Impact of Noisy Data Unregularized solution (i.e., α = 0): Conclusion: Getting close?
Other Types of Regularization • Truncated singular value decomposition • Discrete-cosine transform • Statistical (Bayesian) • Requires knowledge of the solution and noise covariances • Iterative • Steepest descent • Conjugate-gradient descent • Richardson-Lucy • Landweber
Time Series Analysis… • Definitions • The branch of quantitative forecasting in which data for one variable are examined for patterns of trend, seasonality, and cycle. nces.ed.gov/programs/projections/appendix_D.asp • Analysis of any variable classified by time, in which the values of the variable are functions of the time periods. www.indiainfoline.com/bisc/matt.html • An analysis conducted on people observed over multiple time periods. www.rwjf.org/reports/npreports/hcrig.html • A type of forecast in which data relating to past demand are used to predict future demand. highered.mcgraw-hill.com/sites/0072506369/student_view0/chapter12/glossary.html • In statistics and signal processing, a time series is a sequence of data points, measured typically at successive times, spaced apart at uniform time intervals. Time series analysis comprises methods that attempt to understand such time series, often either to understand the underlying theory of the data points (where did they come from? what generated them?), or to make forecasts (predictions). en.wikipedia.org/wiki/Time_series_analysis
Time Series Analysis… Varieties • Frequency (spectral) analysis • Fourier transform: amplitude and phase • Power spectrum; power spectral density • Auto-spectral density • Cross-spectral density • Coherence • Correlation Analysis • Cross-correlation function • Cross-covariance • Correlation coefficient function • Autocorrelation function • Cross-spectral density • Auto-spectral density
Time Series Analysis… Varieties • Time-frequency analysis • Short-time Fourier transform • Wavelet analysis • Descriptive Statistics • Mean / median; standard deviation / variance / range • Short-time mean, standard deviation, etc. • Forecasting / Prediction • Autoregressive (AR) • Moving Average (MA) • Autoregressive moving average (ARMA) • Autoregressive integrated moving average (ARIMA) • Random walk, random trend • Exponential weighted moving average
Time Series Analysis… Varieties • Signal separation • Data-driven [blind source separation (BSS), signal source separation (SSS)] • Principal component analysis (PCA) • Independent component analysis (ICA) • Extended spatial decomposition, extended temporal decomposition • Canonical correlation analysis (CCA) • Singular-value decomposition (SVD) an essential ingredient of all • Model-based • General linear model (GLM) • Analysis of variance (ANOVA, ANCOVA, MANOVA, MANCOVA) • e.g., Statistical Parametric Mapping, BrainVoyager, AFNI
c a b A “Family Secret” of Time Series Analysis… • Scary-looking formulas, such as • Are useful and important to learn at some stage, but not really essential for understanding how all these methods work • All the math you really need to know, for understanding, is • How to add: 3 + 5 = 8, 2 - 7 = 2 + (-7) = -5 • How to multiply: 3 × 5 = 15, 2 × (-7) = -14 • Multiplication distributes over addition u× (v1 + v2 + v3 + …) = u×v1 + u×v2 + u×v3 + … • Pythagorean theorem: a2 + b2 = c2
A “Family Secret” of Time Series Analysis… A most fundamental mathematical operation for time series analysis: The xi time series is measurement or image data. The yi time series depends on what type of analysis we’re doing: Fourier analysis: yi is a sinusoidal function Correlation analysis: yi is a second data or image time series Wavelet or short-time FT: non-zero yi values are concentrated in a small range of i, while most of the yis are 0. GLM: yi is an ideal, or model, time series that we expect some of the xi time series to resemble
Example: Fourier Analysis = Σ × sin10πt
Another “Family Secret” of Time Series Analysis… • The second operation that is fundamental to myriad forms of time-series analysis is singular value decomposition (SVD) • Variations of SVD underlie: • Principal component analysis (PCA) • Independent component analysis (ICA) • Canonical correlation analysis (CCA) • Extended spatial/temporal decorrelation
Given an arbitrary N×Nmatrix A and N×1 vector x: ordinarily, b = Ax is different from x in both magnitude and direction. b x Significance of angle between x and b However, for any A there will always be some particular directions such that b will be parallel to x (i.e., b is a simple scalar multiple of x, or Ax = λx) if x lies in one of these directions. An x that satisfies Ax = λx is an eigenvector, and λ is the corresponding eigenvalue.
Homogeneous Linear System: Ax = 0 Recall definition of eigenvectors and eigenvalues: Ax = λx, x 0. Then Ax - λx = Ax - λIx = (A - λI)x = 0. That is, the eigenvalues are those specific values of λ for which the matrix A - λI is singular, and the eigenvectors are the corresponding nullspaces.
Significance of eigenvalues and eigenvectors An N×NA always has N eigenvalues. If A is symmetric, and λ1 and λ2 are two distinct eigenvalues, the corresponding eigenvectors x1 and x2 are necessarily orthogonal. If λ1 = λ2, we can always subtract off x1’s projection onto x2 from x1 (Gram-Schmidt orthogonalization). If A is not symmetric, then its eigenvectors generally are not mutually orthogonal. But recall that the matrices AAT and ATA are always symmetric. The square roots of the eigenvalues of AAT or ATA are the singular values of A. The eigenvectors of AAT or ATA are the singular vectors of A. Computation of the eigenvalues and eigenvectors ofAAT and ATA underlies a very useful linear algebraic technique called singular value decomposition (SVD). SVD is the method that allows us to, among other things, tackle the one case we have not yet seen an explicit example of: finding the “solution” of a linear system when A is not of full rank.
M is symmetric, so MTM = MMT A diagonal matrix is very easy to invert: just reciprocate each diagonal element An orthogonal matrix is very easy to invert: X-1 = XT Significance of eigenvalues and eigenvectors
Significance of eigenvalues and eigenvectors Eigenvalues of A: 1.5002, 0.16914, 0.0067383, 9.6702×10-5 Eigenvalues of A-1: 0.66657, 5.9122, 148.41, 10341 Any arbitrary vector x is equal to a sum of the eigenvectors of A-1: x = av1 + bv2 + cv3 + dv4, for some numbers a, b, c, d. So A-1x = aA-1v1 + bA-1v2 + cA-1v3 + dA-1v4 = 0.66657av1 + 5.9122bv2 + 148.41cv3 + 10341dv4
These two equations are inconsistent. After second round of elimination: But there is a pseudoinverse, A+, which we can find by using SVD: Gaussian Elimination Redux What happens if we try to use Gaussian elimination to solve Ax = b, but A is singular? There is no Solution!
Gaussian Elimination Redux As indicated, for this case AA+≠ I and A+A ≠ I: What is the pseudoinverse “solution,” and what is its significance?