4. Method of Steepest Descent
1 / 26

4. Method of Steepest Descent - PowerPoint PPT Presentation

  • Uploaded on

4. Method of Steepest Descent There are two problems associated with the Wiener filtering in practical applications. The matrix inversion operation is difficult to implement. The R and P may not may easy to estimate .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' 4. Method of Steepest Descent ' - marlin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

4. Method of Steepest Descent

  • There are two problems associated with the Wiener filtering in practical applications.

    • The matrix inversion operation is difficult to implement.

    • The R and P may not may easy to estimate.

  • To overcome the first problem, we may solve the Wiener solution iteratively.

  • Consider a optimization problem.

  • A simplest procedure to solve the optimization problem iteratively is called the method of steepest descent.

  • Method of steepest descent (SD):

    • Initial guess : w(0)

    • Compute the gradient vector

    • update w

    • Repeat the process (i=i+1)

  • The parameter  is called the step size. It controls the rate of the convergence.

  • For complex signals, we have

  • The the update equation for the SD method is

  • The SD method is a recursive algorithm. It is subject to the possibility of unstable.

  • Let

  • The weight update equation can be written as

  • Since R is the correlation matrix, R=QQH (QHQ=I).

  • Let v(i)=QHc(i). Then

  • For the k-th component of v(i), we have

  • Thus, for vk(i) to converge, it is necessary that

  • Since all eigenvalues are nonnegative,

  • To ensure every mode is convergent, we have

  • For the i-th component of w(i), we have

  • Let a be the time constant of wi(i). Then

  • As we can see that the convergence speed is limited by min. However, we can adjust the step size such that the mode corresponding to max converges fast.

  • We conclude that the factor control the rate of convergence is the eigenvalue spread (max/min). The smaller the eigenvalue spread, the faster the convergence rate we can achieve.

  • The MSE can be analyzed similarly.

  • If the step size is properly chosen,

  • The curve by plotting J[w(i)] versus i is called the learning curve. The time constant associated with the k-th mode is

  • Eigenvectors/eigenvalues of R:

  • Thus, J(w) is a paraboloid. If we cut the paraboloid with planes parallel to w plane [J(w)=constant]. We obtain concentric ellipses.

  • Let c=w-wopt. Then, vHRv=-Jmin and J=2Rc. Note that J is normal to cHRc. The principle axis of an ellipsoid passes the origin (c=0) and is normal to vTRv. If cp is a principle axis, it must satisfy

  • Thus, the eigenvectors of R define the principle axes of the error surface.

  • The eigenvalues of R give the second derivative of the error surface r.w.t. the principle axes of J=c (what does this mean?).

  • Thus, if the eigenvalue spread is larger, the shape of the ellipsoid is more peculiar.

  • Note that

  • If we can translate and rotate the coordinates of w (to v), components of weights can be decoupled. As a matter of fact, we can use a different step size for different mode. This can have a fastest convergence rate.



  • Newton’s method

    • Newton’s method is primarily a method for finding zeros of a equation.

  • Finding the minimum of a function g(x) means solve the equation g’(x)=0. This leads to the searching algorithm


  • Convergence properties:

  • Thus, Newton’s method will converge if

  • Properties

    • Convergence of Newton’s method is same for every mode and doesn’t dependent on the eigenvalue spread of R.

    • The computation is more intensive (require R-1).

    • For nonquadratic cost function, Newton’s method is easy to become unstable.

  • Question: if we know R-1, we can directly find wopt. Why do we have to use Newton’s method?

  • Reason:

    • Exact R-1 may not be necessary. Some efficient methods can be applied to find an approximated of R-1. This is specially true when the input is time-variant.

  • In general, straightforward Newton’s method is seldom used. Only the concept is adopted.