AUTOMATICALLY REGULARIZED NONNEGATIVE SOLUTONS FOR ILL-CONDITIONED LINEAR SYSTEMS

AUTOMATICALLY REGULARIZED NONNEGATIVE SOLUTONS FOR ILL-CONDITIONED LINEAR SYSTEMS RONDALL E. JONESKtech Corporation, Albuquerque, NM 87185rejones@sandia.gov Inverse Problems In Engineering Seminar 2006

Motivation • First look solution for sophisticated professionals • Makes advanced methods available to non-expert • Enables adaptive regularization in automated contexts

Historical Background • Tiknohov regularization augments a linear system as follows: Ax = b λx = 0 • Where the value to use for λ is problematic • The larger the value of λ the larger the residual and the smoother the solution • If the error in b is known approximately, the discrepancy principle says to adjust λuntil the RMS of the residual equals λ.

Historical Background • Wahba (1970s) • Generalized Cross Validation • Heuristic • Not reliable enough

Historical Background • Jones (1980s) • Picard condition plus a simple error estimate and use of discrepancy principle • Fundamental analysis • Not sufficiently developed for general use

Historical Background • Hansen (1990s) • Automated L-curve analysis • Heuristic • Quite good; but software not widely available

Historical Background • O-Leary (2000s) • Based on certain assumptions, minimized the “distance” of the regularized solution to the true solution • Fundamental analysis; unbiased, which turns out not necessarily an advantage

Historical Background • Jones (2005+) • More sophisticated Picard analysis; more robust error analysis • Fundamental analysis • Software package with realistic constraints for general usage

The Picard Condition • Ax = b (with A any shape) • A=USVT (using “compact version of SVD”) • USVT x = b • VT x = S+UTb == t, a vector • Si 0 in ill-conditioned case; (UTb)i epsilon • x = Vt • The Picard condition says that the magnitudes of the ti should generally decrease • If they instead grow after some point, then noise dominates the bi and ti and the solution will be bad

Example S Vector(Random matrix)

Example t vector: Satisfying Picard?(Random matrix – magnitudes of ti)

Example t vector: Satisfying Picard?(Hansen’s deriv2)

Typical t vector for Highly Ill-conditioned Problem(Hansen’s ilaplace)

A Common Difficulty (Hansen’s Phillips)

Usable Rank Determination • So how to determine when to cut off more elements of t as noise dominated? • Typical approach is to compute a moving average of the (absolute values of the) ti centered at each ti • But the average will then have an odd number of terms and will echo any oscillation

Usable Rank Determination: Solution • Use an even number of terms in the averages • Find the minimum of that series, avgm • See if the average ever gets “significantly larger” • If not, consider it stable. • If so, find the first average, avgk, which is 3x larger than the minimum • Look inside that avgk’s span to find the largest ti and set ur=i-1 • Backtrack ur as far as the start of the minimum average as long as the ti are backing down • This is obviously heuristic

Our Algorithm • Compute the SVD and the usable rank, ur, as described • Compute an estimate of σ, the error in the right hand side, as follows • Use the discrepancy principle to pick λ • Apply constraints • Only the determination of ur is heuristic

Simple Error Estimation • Ax = b (with A any shape) • USVT x = b • VT x = S+UTb == t • {VT x = t}1ur usable portion • {VT x = t}ur+1m noise-dominated portion • RMS(t(ur+1:m)) is a decent estimate of the error (the standard deviation of the elements of b)

Simple Error Estimation • Problem: there may be as few as ONE element in RMS(t(ur+1:m)) • Too crude… and there is a better way

Better Error Estimation • Ax = b original • {VT x = t}1ur orthogonalized usable part • Subtract from each row of (1) its projection on every row of (2) leaving Ax = b where b has essentially no image of the true b… just image of the noise • RMS(b) is almost an estimate of error in b… • But the bi are biased estimates because VT was derived from A. So we must adjust for lost degrees of freedom.

Better Error Estimation • To correct for bias, use typical adjustment for lost degrees of freedom i=m • σ2 = (1/(m-ur)) sum bi2 i=1 • The identical formula can be obtained by replacing the orthogonalized usable part with a row-wise orthogonalized decomposition of A (with b carried along) • The correctness of this formula is rapidly seen in practice

Our Algorithm • Compute the SVD and the usable rank, ur, as described • Compute the estimate of σ as just described • Use the discrepancy principle to pick λ • Apply constraints

Non-Negativity • Traditional approaches exist… but rarely for the ill-conditioned case • We researched several approaches • We need some degree of simplicity because non-negativity is just the first phase of constraint we intend

Non-Negativity • I experimented with non-traditional approaches. For example… a11 a12 a13 a14 a15 x1 b1 a21 a22 a23 a24 a25 x2 b2 a31 a32 a33 a34 a35 x x3 = b3 a41 a42 a43 a44 a45 x4 b4 a51 a52 a53 a54 a55 x5 b5 • x4 can be made zero by subtracting from b its projection onto column 4 of A. • To see this, note X = (ATA)-1ATb. The 4th component of ATb will be zero after the projection

Non-Negativity • The problem with such techniques is that there is no control over the increase in problem residual. And the residual typically increases dramatically more than with column zeroing.

Non-Negativity • Simple setting of a variable (and its column) to zero was compelling • But in what sequence? • Experimentation suggests that the most effective variable to set to zero is the one which is hardest to adjust to zero by adjusting the RHS… which is usually the most negative value • We recognize the existence of much more sophisticated non-negativity algorithms. But we question the added value in our ill-conditioned context with other constraints to be added.

Approach • Perform autoreg algorithm • Iteratively, - Zero most negative xi (i.e., its column of A) • (in massive problems, zero several) • Re-solve regularized problem using the λ determined originally

Other Constraints - Equality • Equality constraints can be implemented by subtracting from each regular equation its projection onto each of the equality constraint equations. • Then the reduced problem can be solved as before. • The solutions to parts 1 and 2 are added. • Some theoretical questions about this process do arise. • What if the equality constraints are “bad”? That is, redundant, conflicting, or ill-conditioned? • Our software handles all these cases comparatively easily.

Other Constraints – Inequality(available in RMatrix) • Inequality constraints can be implemented by iteratively selecting certain of the inequality constraints to be advanced to equality status. • Reasoning as for non-negativity, the natural choice of next inequality constraint to advance seems to be the most-violated constraint. • The algorithm is to do an initial solution to pick lambda, then iteratively pick an inequality constraint to advance to equality status, and resolve using the equality constraint algorithm. • Some theoretical questions about this process arise. These have not been adequately studied. • What if the inequality constraints are “bad”? That is, redundant, conflicting, or ill-conditioned? • Our software handles all these cases. Basically, the first constraint selected overrides others which conflict or are redundant.

Inverse Heat Example

Beyond Tikhonov • Tikhonov solves Ax = b λIx = 0 • But sometimes the engineer wants a matrix other than I, such as a matrix representing the solution’s first, second, or third derivative rather than the solution itself. Ax = b λLx = 0 where L1(1)= (-1 1 0 0 0 0 …) or L2(1) =( 1 -2 1 0 0 0 …) or L3(1) =(-1 3 -3 1 0 0 … )

Beyond Tikhonov • The efficiency of the SVD when searching for the correct lambda (only one decomposition for all lambdas tried for one solution) is not available for non-identity L matrices. • So the computation is relatively expensive. • Non-negativity can be achieved for these cases in the same manner as Tikhonov. But now the computation is quite expensive because many SVDs (or QRs or whatever) must be done for each newly zeroed column. • The offered software provides all these features. • Second-derivative L matrix looks very useful.

Caveats • No automatic method can ever be perfect • The “bad” oscillations could be right! • Our heuristics have been tuned on a variety of problems • Always check the answer for suitability

Software • Autoreg3 is in beta test • Packaged as a single *.h file • Email the author for a copy • Matlab versions are available • RMatrix version is in commercial evaluation • These are NOT public domain • Wed site will be coming

AUTOMATICALLY REGULARIZED NONNEGATIVE SOLUTONS FOR ILL-CONDITIONED LINEAR SYSTEMS