Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition

Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition Lidan Miao AICIP Group Meeting Feb. 23, 2006

Motivation • Why do we choose maximum entropy? • What role does the maximum entropy play in the algorithm? • In what sense the algorithm converges to the optimal solution? (Is there anything else beyond the maximum entropy?)

Decomposition Problem • Decompose a mixture into constituent materials (assumed to be known in this presentation) and their proportions. • Mixing model • Given A, x, find s • Linear regression problem • Physical constraints • Non-negativity • Sum of s’ components is 1

QP & FCLS[1] • Quadratic programming (QP) • Nonlinear optimization • Computationally expensive • Fully constrained least squares (FCLS) • Integrates SCLS and NCLS • NCLS is based on standard algorithm NNLS Is the least square estimation the best in all cases?

Geometric Illustration • Mixing model & convexcombination • Unconstrained least squares • Solve linear combination problem • Constrained least squares (QP and FCLS) • Sum-to-one: on the line connecting a1 and a2 • Nonnegative: in the cone C determined by a1 and a2 • Feasible set of convex combination: line segment a1a2

MaxEnt[2] • Objective function • Maximize entropy • Optimization method • Penalty function method • Limitation • Low convergence rate • Theoretically, Kk needs to go to infinity • For each Kk, s has no closed form solution, numerical method is needed (gradient descent) • Low performance when SNR is high • It can never fits the measurement model as Kk can not be infinity Negative relative entropy

Gradient Descent MaxEnt • Optimization formulation • Minimize negative entropy • Optimization method • Lagrange multiplier method • Gradient descent learning

Convergence Analysis of GDME • Initialization: (neg-entropy) • Lambda will warp the original objective function to fit the data measurement model. • Searching in the feasible set True s1: 0.1619 Estimation: 0.1638 Like MaxEnt[2], the solution is obtained by warping the objective function. Unlike MaxEnt[2], the force is a vector instead of a scalar, which is not necessary to be infinity to fit the measurement model.

The key is the inner product Convergence Analysis (cont) • Take the first iteration as an example • The multiplier is the scaled error vector • S depends on the inner product of A and lambda • The denominator of s is only for normalization • Exponential function is used to generate a nonnegative number

Convergence Analysis (cont) • 2D case is simple for visualization Where to move? Objective? Proof

Convergence Rate

Stopping Conditions • FCLS: S components are all non-negative • Generates least square solutions, minimizes ||As-x|| • When SNR is low, the algorithm overfits the noise • MaxEnt[2]:the same with FCLS • The solution is not least squares • Can never fits the data perfectly • GDME: S is relatively stable • Is able to giveleast square solution • The solution lies somewhere between equal abundances and the least square estimation, which determined by the stopping condition

Experimental Results • MaxEnt[2] is too slow to be applied to a big image, so the simple data in ref[2] is used Two groups of testing Data Results 250 mixed pixels with the abundance randomly generated The average of 50 runs Par: 500, 4

Experimental Results (cont) • Apply to synthetic hyperspectral images • Metrics: ARMSE, AAD, AID

Summary of GDME • The same target as QP and FCLS, i.e., min ||As-x|| • The maximum entropy formulation is used to incorporate the two constraints through exponential function and normalization. • Does maximum entropy really play a role? • By carefully selecting stopping condition, GDME on average is able to generate better performance in terms of abundance estimation. • The convergence rate is faster than QP and MaxEnt[2] and similar to FCLS (based on experiments) • GDME is more flexible, which presents strong robustness under low SNR cases • GDME presents better performance when source vectors are close to each other

Future Work • Speed up the learning algorithm • Investigate optimal stop conditions (what’s the relationship between SNR and stop condition?) • Study the performance w.r.t the number of constituent materials

Reference • [1] D. C. Heinz and C.-I Chang. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sensing, vol.39, no. 3, pp. 529-545, 2001. • [2] S. Chettri and N. Netanyahu. Spectral unmixing of remotely sensed imagery using maximum entropy. In Proc. of SPIE, vol. 2962, pp. 55–62, 1997.

Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition