1 / 28

Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0 . Generate a sequence of solutions x 1 , x 2 , …and stop if the sequence converges to a solution with  f(x)=0. Solve -  f(x k ) ≈  2 f(x k )  x

mairi
Download Presentation

Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x0. Generate a sequence of solutions x1, x2, …and stop if the sequence converges to a solution with f(x)=0. Solve -f(xk) ≈ 2f(xk)x Let xk+1=xk+x. 3. let k=k+1

  2. Newton’s Method applied to LS Not directly applicable to most nonlinear regression and inverse problems (not equal # of model parameters and data points, no exact solution to G(m)=d). Instead we will use N.M. to minimize a nonlinear LS problem, e.g. fit a vector of n parameters to a data vector d. f(m)=∑ [(G(m)i-di)/i]2 Let fi(m)=(G(m)i-di)/i i=1,2,…,m, F(m)=[f1(m) … fm(m)]T So that f(m)= ∑ fi(m)2 f(m)=∑ fi(m)2] m i=1 m i=1 m i=1

  3. NM: Solve -f(mk) ≈ 2f(mk)m LHS:f(mk)j = -∑ 2fi(mk)jF(m)j = -2 J(mk)F(mk) RHS: 2f(mk)m = [2J(m)TJ(m)+Q(m)]m, where Q(m)= 2 ∑ fi(m) fi(m) -2 J(mk)F(mk) = 2 H(m) m m = -H-1 J(mk)F(mk) = -H-1f(mk) (eq. 9.19) H(m)= 2J(m)TJ(m)+Q(m) fi(m)=(G(m)i-di)/i i=1,2,…,m, F(m)=[f1(m) … fm(m)]T

  4. Gauss-Newton (GN) method 2f(mk)m = H(m) m = [2J(mk)TJ(mk)+Q(m)]m ignores Q(m)=2∑ fi(m) fi(m) : f(m)≈2J(m)TJ(m), assuming fi(m) will be reasonably small as we approach m*. That is, Solve -f(xk) ≈ 2f(xk)x f(m)j=∑ 2fi(m)jF(m)j, i.e. J(mk)TJ(mk)m=-J(mk)TF(mk) fi(m)=(G(m)i-di)/i i=1,2,…,m, F(m)=[f1(m) … fm(m)]T

  5. Newton’s Method applied to LS Levenberg-Marquardt (LM) method uses [J(mk)TJ(mk)+I]m=-J(mk)TF(mk) ->0 : GN ->large, steepest descent (SD) (down-gradient most rapidly). SD provides slow but certain convergence. Which value of  to use? Small values when GN is working well, switch to larger values in problem areas. Start with small value of , then adjust.

  6. Steepest descent (SD) Problem of minimizing a quadratic function f(x) = 0.5 xTAx − bTx, (equivalent to solving Ax=b) where b is in Rn, A is an n x n symmetric positive definite matrix. The gradient of f(x) is Ax-b, i.e. xk+1=xk-αk(Axk-b) where αk is chosen to minimize f(x) along the direction of the negative gradient.

  7. Thus for the quadratic case:

  8. Alternative: line search

  9. Example of slow convergence for SD Rosenbrock function: f(x1,x2) = (1 – x1)2 + 100 (x2 – x12)2

  10. Statistics of iterative methods Cov(Ad)=A Cov(d) AT (d has multivariate N.D.) Cov(mL2)=(GTG)-1GT Cov(d) G(GTG)-1 Cov(d)=2I: Cov(mL2)=2(GTG)-1 However, we don’t have a linear relationship between data and estimated model parameters for the nonlinear regression, so cannot use these formulas. Instead: F(m*+m)≈F(m*)+J(m*)m Cov(m*)≈(J(m*)TJ(m*))-1 not exact due to linearization, confidence intervals may not be accurate ri=G(m*)i-di , i=1 s=[∑ ri2/(m-n)] Cov(m*)=s2(J(m*)TJ(m*))-1 establish confidence intervals, 2

  11. Implementation Issues Explicit (analytical) expressions for derivatives Finite difference approximation for derivatives When to stop iterating? f(m)~0 ||mk+1-mk||~0, |f(m)k+1-f(m)k|~0 eqs 9.47-9.49 4. Multistart method to optimize globally

  12. Iterative Methods SVD impractical when matrix has 1000’s of rows and columns e.g., 256 x 256 cell tomography model, 100,000 ray paths, < 1% ray hits 50 Gb storage of system matrix, U ~ 80 Gb, V ~ 35 Gb Waste of storage when system matrix is sparse Iterative methods do not store the system matrix Provide approximate solution, rather than exact using Gaussian elimination Definition of iterative method: Starting point x0, do steps x1, x2, … Hopefully converge toward right solution x

  13. Iterative Methods Kaczmarz’s algorithm: Each of m rows of Gi.m = didefine an n-dimensional hyperplane in Rm Project initial m(0) solution onto hyperplane defined by first row of G Project m(1) solution onto hyperplane defined by second row of G … until projections have been done onto all hyperplanes Start new cycle of projections until converge If Gm=d has a unique solution, Kaczmarz’s algorithm will converge towards this If several solutions, it will converge toward the solution closest to m(0) If m(0)=0, we obtain the minimum length solution If no exact solution, converge fails, bouncing around near approximate solution If hyperplanes are nearly orthogonal, convergence is fast If hyperplanes are nearly parallel, convergence is slow

  14. Kaczmarz’s algorithm: Let Gi. be the ith row of G Consider the hyperplane defined by Gi+1. m = di+1. Since GTi+1. is perpendicular to this hyperplane, the update to m(i) from the constraint due to row i+1 of G will be proportional to GTi+1. m(i+1) = m(i)+βGTi+1. Since Gi+1. m(i+1) = di+1 then Gi+1. (m(i) +βGTi+1. ) = di+1 Gi+1. m(i) - di+1 = - βGi+1. GTi+1. β = [Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. ]

  15. Kaczmarz’s algorithm: • Let m(0) = 0 • For i=0,1,…,m, let • m(i+1) = m(i) - GTi+1. [Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. ] • 3) If the solution has not yet converged, go back to step 1

  16. Kaczmarz’s Method y y=x-1 y=1 x

  17. ART algorithm: Specialized for tomography Kaczmarz: m(i+1) = m(i) - GTi+1. [Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. ] Thus always adding a multiple of a row of G to the current solution Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. is a normalized error in equation for i+1, and the correction is spread over the elements of m appearing in equation i+1 Often used approximation in ART is to replace all non-zero elements in row i+1 of G with ones, thus smearing the needed correction in traveltime equally over all cells in ray path i+1 SIRT Algorithm: ART with higher accuracy, each cell updated separately

  18. Kaczmarz’s Algorithm function x=kac(A,b,tolx,maxiter) [m,n]=size(A); AP=A’; x=zeros(n,1); iter=0; n2=zeros(m,1); for i=1:m n2(i)=norm(AP(:,i),2)^2; end while (iter <= maxiter) iter=iter+1; newx=x; for i=1:m newx=newx-((newx'*AP(:,i)-b(i))/(n2(i)))*AP(:,i); end if (norm(newx-x)/(1+norm(x)) < tolx) x=newx; return; end x=newx; end disp('Max iterations exceeded.'); return;

  19. SIRT Algorithm function x=sirt(A,b,tolx,maxiter) alpha=1.0; [m,n]=size(A); alpha=1.0; A1=(A>0); AP=A’; A1P=A1'; x=zeros(n,1); iter=0; N=zeros(m,1); L=zeros(m,1); NRAYS=zeros(n,1); for i=1:m N(i)=sum(A1P(:,i)); L(i)=sum(AP(:,i)); end for i=1:n NRAYS(i)=sum(A1(:,i)); end while (1==1) iter=iter+1; if (iter > maxiter) disp('Max iterations exceeded.'); return; end newx=x; deltax=zeros(n,1); for i=1:m, q=A1P(:,i)'*newx; delta=b(i)/L(i)-q/N(i); deltax=deltax+delta*A1P(:,i); end newx=newx+alpha*deltax./NRAYS; if (norm(newx-x)/(1+norm(x)) < tolx) return; end; x=newx; end

  20. ART Algorithm function x=art(A,b,tolx,maxiter) alpha=1.0; [m,n]=size(A); A1=(A>0); AP=A’; A1P=A1’; x=zeros(n,1); iter=0; N=zeros(m,1); L=zeros(m,1); for i=1:m N(i)=sum(A1(i,:)); L(i)=sum(A(i,:)); end while (1==1) iter=iter+1; if (iter > maxiter) disp('Max iterations exceeded.'); x=newx; return; end newx=x; for i=1:m q=A1P(:,i)'*newx; delta=b(i)/L(i)-q/N(i); newx=newx+alpha*delta*A1P(:,i); end if (norm(newx-x)/(1+norm(x)) < tolx) x=newx; return; end x=newx; end

  21. True Model Reconstruction From Kaczmarz, ART, SIRT (all similar)

  22. Conjugate Gradients Method Symmetric, positive definite system of equations Ax=b min (x) = min(1/2 xTAx - bTx) • (x) = Ax - b = 0 or Ax = b CG method: construct basis p0, p1, … , pn-1 such that piTApj=0 when i≠j. Such basis is mutually conjugate w/r to A. Only walk once in each direction and minimize x = ∑ipi - maximum of n steps required! () = 1/2 [ ∑ ipi]TA∑ipi - bT [∑ipi] = 1/2 ∑∑ijpiTApj - bT [∑ipi] (summations 0->n-1) n-1 i=0

  23. Conjugate Gradients Method piTApj=0 when i≠j. Such basis is mutually conjugate w/r to A. (pi are said to be ‘A orthogonal’) () = 1/2 ∑∑ijpiTApj - bT [∑ipi] = 1/2 ∑ipiTApi - bT [∑ipi] = 1/2 (∑ipiTApi - 2 ibT pi) - n independent terms Thus min () by minimizing ith term ipiTApi - 2 ibT pi ->diff w/r i and set derivative to zero:i = bTpi / piTApi i.e., IF we have mutually conjugate basis, it is easy to minimize ()

  24. Conjugate Gradients Method CG constructs sequence of xi, ri=b-Axi, pi Start: x0=0, r0=b, p0=r0, 0=r0Tr0/p0TAp0 Assume at kth iteration, we have x0, x1, …, xk; r0, r1, …, rk; p0, p1, …, pk; 0, 1, …, k Assume first k+1 basis vectors pi are mutually conjugate to A, first k+1 ri are mutually orthogonal, and riTpj=0 for i ≠j Let xk+1= xk+kpk and rk+1= rk-kApk which updates correctly, since: rk+1 = b - Axk+1 = b - A(xk+kpk) = (b-Axk) - kApk = rk-kApk

  25. Conjugate Gradients Method Let xk+1= xk+kpk and rk+1= rk-kApk k+1 = rk+1Trk+1/rkTrk pk+1= rk+1+k+1pk bTpk=rkTrk (eq 6.33-6.38) Now we need proof of the assumptions rk+1 is orthogonal to ri for i≤k (eq 6.39-6.43) rk+1Trk=0 (eq 6.44-6.48) rk+1 is orthogonal to pi for i≤k (eq 6.49-6.54) pk+1TApi = 0 for i≤k (eq 6.55-6.60) i=k: pk+1TApk = 0 ie CG generates mutually conjugate basis

  26. Conjugate Gradients Method Thus shown that CG generates a sequence of mutually conjugate basis vectors. In theory, the method will find an exact solution in n iterations. Given positive definite, symmetric system of eqs Ax=b, initial solution x0, let 0=0, p-1=0,r0=b-Ax0, k=0 If k>0, let k = rkTrk/rk-1Trk-1 Let pk= rk+kpk-1 Let k = rkTrk / pkTApk 4. Let xk+1= xk+kpk 5. Let rk+1= rk-kApk 6. Let k=k+1

  27. Conjugate Gradients Least Squares Method CG can only be applied to positive definite systems of equations, thus not applicable to general LS problems. Instead, we can apply the CGLS method to min ||Gm-d||2 GTGm=GTd

  28. Conjugate Gradients Least Squares Method GTGm=GTd rk = GTd-GTGmk = GT(d-Gmk) = GTsk sk+1=d-Gmk+1=d-G(mk+kpk)=(d-Gmk)-kGpk=sk-kGpk Given a system of eqs Gm=d, k=0, m0=0, p-1=0, 0=0, r0=GTs0. If k>0, let k = rkTrk/[rk-1Trk-1] Let pk= rk+kpk-1 Let k = rkTrk / [Gpk]T[Gpk] 4. Let mk+1= mk+kpk 5. Let sk+1=sk-kGpk 6. Let rk+1= rk-kGpk 7. Let k=k+1; never computing GTG, only Gpk, GTsk+1

More Related