Lecture 5

1 / 20

# Lecture 5 - PowerPoint PPT Presentation

Lecture 5. Function Optimization.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Lecture 5' - sissy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 5

Function Optimization

Lecture 5

There are three main reasons why most problems in robotics, vision, and arguably every other science or endeavor take on the form of optimization problems . One is that the desired goal may not be achievable, and so we try to get as close as possible to it. The second reason is that there may be more ways to achieve the goal, and so we can choose one by assigning a quality to all the solutions and selecting the best one. The third reason is that we may not know how to solve the system of equations f(x) = 0, so instead we minimize the norm If(x)I, which is a scalar function of the unknown vector x.

Why Function Optimization ?

Lecture 5

Suppose that we want to find a local minimum for the scalar function f of the vector variable x, starting from an initial point xo . Picking an appropriate xo is crucial, but also very problem-dependent . We start from xo , and we go downhill . At every step of the way, we must make the following decisions :

• Whether to stop.
• In what direction to proceed .
• How long a step to take.
• The following algorithm reflects various ‘descent minimization’ procedures:

Local Minimization and Steepest Descent

k=0;

while xk is not a minimum

compute step direction Pk with IIPkII = 1

compute step size ak

xk+l = xk + akPk;

k=k+1

end.

Lecture 5

The best direction of descent is not necessarily the direction of steepest descent. Consider a function:

where Q is a symmetric, positive definite matrix. Positive definite means that for every nonzero x the quantity xTQx is positive. In this case, the graph of f(x) - c is a plane aTx plus a paraboloid xTQx.

Of course, if f were this simple, no descent methods would be necessary. In fact the minimum of f can be found by setting its gradient to zero:

so that the minimum x* is the solution to the linear system

Qx=-a

Since Q is positive definite, it is also invertible (why?), and the solution x* is unique.

(1)

Minimization of Positive Definite functions 1

Lecture 5

then we have

so that e and f differ only by a constant.

Since e is simpler, we consider that we are minimizing e rather than f.

what shifts the origin of the domain to x*, and study the function

Minimization of Positive Definite functions 2

(2)

Lecture 5

where e in minimum reaches a value of zero :

let our steepest descent algorithm find this minimum by starting from the initial point

the algorithm chooses the direction of steepest descent:

Which is opposite to the gradient of e evaluated at yk:

The steepest descent direction

Lecture 5

The most favorable step size will take us from yk to the lowest point in the direction of pk. This can be found by differentiating the function

with respect toα, and setting the derivative to zero to obtain the optimal step αk

and setting this to zero yields

The step size

Lecture 5

or

The step size 2

(3)

Lecture 5

How much closer does one step bring us to the solution y* = 0? In other words, how much smaller is e(yk+1) relatively to e(yk)? From the definition (2) of e(y) and equation (3) for yk+1, we obtain:

e(y) descend rate

(4)

Lecture 5

Since Q is invertible, we have:

And

,what allows to rewrite (4) as:

or

e(y) descend rate 2

Lecture 5

Kantorovich inequality: Let Q be a positive definite, symmetric, n x n matrix. Then, for any vector y there holds:

This inequality allows to prove the Steepest Descent Rate theorem.

Kantorovich inequality.

Lecture 5

Steepest Descent Rate theorem:

Let

be a quadratic function of x, with Q symmetric and positive definite .

For any xo, the method of steepest descent

,where

Steepest Descent Rate theorem 1

Lecture 5

Converges to the unique minimum point

The difference at every step satisfies

, where σ1 and σn are the respectively the largest and the smallest singular values of Q

Steepest Descent Rate theorem 2

Lecture 5

From the definitions

we obtain

Here, the Kantorovich inequality was used.

Proof

Lecture 5

The ratio

Is called a condition number of Q. The larger the condition number (ratio between the largest and the smallest singular values), the smaller the ratio

And therefore the slower the convergence.

Analysis

Lecture 5

Consider the two dimensional case, x R2. The figure shows a trajectory xk, imposed in the isocontours of f(x).

The greater the ratio between the singular values,

of Q (which is the aspect ratio of the

ellipses), the slower the

convergence rate. If

the isocontours are

circular (k(Q)=1)

or the trajectory

started from the ellipses axis, the single step brings us to x*.

Illustration

Lecture 5

To characterize the speed of convergence of different minimization algorithms, we introduce the order of convergence. It is defined as the largest value of q, for which the

Is finite. If is the limit, then we can write (for large values of k)

The distance from x* is reduced by the q-th power at every step, therefore thehigher the order of convergence, the better.

Convergence rate

Lecture 5

We do not know the x* and therefore f(x*). Thus the stop criteria is not trivial.

The criteria can be |f(xk)-f(xk-1)| or |xk-xk-1|. The second criteria is better, since it indicates proximity of x*.

Stop criteria

Lecture 5

The steepest descend can be applied to general cases of f, not necessarily quadratic and not defined via

In these cases, Q is the matrix of the second derivatives of f with respect to x, called a Hessian of f. In this case, only n first derivatives are needed to calculate the direction pk. The step size requires calculation of Hessian of f(x), which requires computing second derivatives, and therefore is very expensive.

Using the line search allows to reach the minimum of f(x) in the direction pk without the Hessian calculation.

(1)

Line search

Lecture 5

a

c

u

b

Line search runs as following. Let

Be the scalar function of α representing the possible values of f(x) in the direction of pk. Let (a,b,c) be the three points of α, such, that the point of (constrained) minimum x’, is between a and c: a<x’<c.

Then the following algorithm allows to

approach x’ arbitrarily close:

If b-a>c-b,

u=(a+b)/2;

If f(u)<f(b)

(a,b,c)=(a,u,b)

Else

(a,b,c)=(u,b,c)

Line search 2

If b-a<c-b,

u=(b+c)/2;

If f(u)<f(b)

(a,b,c)=(b,u,c)

Else

(a,b,c)=(a,b,u)

Lecture 5