Tutorial 5 6
Download
1 / 21

- PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on

Tutorial 5-6. Function Optimization. Line Search. Taylor Series for R n Steepest Descent Newton’s Method Conjugate Gradients Method. a. c. u. b. Line search runs as following. Let

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - feng


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Tutorial 5 6 l.jpg

Tutorial 5-6

Function Optimization.

Line Search.

Taylor Series for Rn

Steepest Descent

Newton’s Method

Conjugate Gradients Method

Tutorial 5-6


Line search l.jpg

a

c

u

b

Line search runs as following. Let

Be the scalar function of α representing the possible values of f(x) in the direction of pk. Let (a,b,c) be the three points of α, such, that the point of (constrained) minimum x’, is between a and c: a<x’<c.

Then the following algorithm allows to

approach x’ arbitrarily close:

If b-a>c-b,

u=(a+b)/2;

If f(u)<f(b)

(a,b,c)=(a,u,b)

Else

(a,b,c)=(u,b,c)

Line search

If b-a<c-b,

u=(b+c)/2;

If f(u)<f(b)

(a,b,c)=(b,u,c)

Else

(a,b,c)=(a,b,u)

Tutorial 5-6


Taylor series l.jpg

The Taylor series for f(x) is

,where

For the function of m variables, the expression is

Taylor Series

Tutorial 5-6


2d taylor series example l.jpg

Consider the elliptic function: f(x,y)=(x1-1)2+(2x2-2)2and find the first three terms of Taylor expansion.

2D Taylor Series: Example

Tutorial 5-6


Steepest descent l.jpg

-f’(0)

2

1

Consider the elliptic function: f(x,y)=(x1-1)2+4(x2-2)2and find the first three terms of Taylor expansion.

Steepest Descent

Tutorial 5-6


Newton s method l.jpg

In Lecture 5 we have seen that the steepest descent method can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function f(x) near x* can be approximated by a paraboloid:

,where

and

(1)

Newton’s Method

Tutorial 5-6


Newton s method 2 l.jpg

Here can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function gk is the gradient and Qk is the Hessian of the function f, evaluated at xk. They appear in the 2nd and 3rd terms of the Taylor expansion of f(xk). Minimum of the function should require:

The solution of this equation gives the step direction and the step size towards the minimum of (2), which is, presumably, close to the minimum of f(x). The minimization algorithm in which xk+1=y(xk)=xk+∆, with ∆ defined by (2) is called a Newton’s method.

(2)

Newton’s Method 2

Tutorial 5-6


Newton s method example l.jpg

-f’(0) can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function

2

1

Consider the same elliptic function: f(x,y)=(x1-1)2+4(x2-2)2and find the first step for Newton’s Method.

Newton’s Method: Example

Tutorial 5-6


Conjugate gradient l.jpg

Suppose that we want to minimize the quadratic function can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function

where Q is a symmetric, positive definite matrix, and x has n components. As we saw in explanation of steepest descent, the minimum x* is the solution to the linear system

The explicit solution of this system requires about O(n3) operations and O(n2) memory, what is very expensive.

Conjugate Gradient

Tutorial 5-6


Conjugate gradients 2 l.jpg

We now consider an alternative solution method that does not need Q, but only the gradient of f(xk)

evaluated at n different points x1 , . . ., xn.

Conjugate Gradients 2

Gradient

Conjugate Gradient

Tutorial 5-6


Conjugate gradients 3 l.jpg

Consider the case need n = 3, in which the variable x in f(x) is a three-dimensional vector . Then the quadratic function f(x) is constant over ellipsoids, called isosurfaces, centered at the minimum x* . How can we start from a point xo on one of these ellipsoids and reach x* by a finite sequence of one-dimensional searches? In the steepest descent, for the poorly conditioned Hessians orthogonal directions lead to many small steps, that is, to slow convergence.

Conjugate Gradients 3

Tutorial 5-6


Conjugate gradients spherical case l.jpg

When the ellipsoids are spheres, on the other hand, the convergence is much faster: first step takes from xo to x1 , and the line between xo and x1 is tangent to an isosurface at x1 . The next step is in the direction of the gradient, takes us to x* right away. Suppose however that we cannot afford to compute this special direction p1 orthogonal to po, but that we can only compute some direction p1 orthogonal to po (there is an n-1 -dimensional space of such directions!) and reach the minimum of f(x) in this direction.

In that case n steps will take us to x* of the sphere, since coordinate of the minimum in each on the n directions is independent of others.

Conjugate Gradients: Spherical Case

Tutorial 5-6


Conjugate gradients elliptical case l.jpg

Any set of orthogonal directions, with a line search in each direction, will lead to the minimum for spherical isosurfaces. Given an arbitrary set of ellipsoidal isosurfaces, there is a one-to-one mapping with a spherical system: if Q = UEUT is the SVD of the symmetric, positive definite matrix Q, then we can write

,where

Conjugate Gradients: Elliptical Case

(4)

(5)

Tutorial 5-6


Elliptical case 2 l.jpg

Consequently, there must be a condition for the original problem (in terms of Q) that is equivalent to orthogonality for the spherical problem. If two directions qi and qj are orthogonal in the spherical context, that is, if

what does this translate into in terms of the directions pi and pj for the ellipsoidal problem? We have

(6)

Elliptical Case 2

Tutorial 5-6


Elliptical case 3 l.jpg

Consequently, problem (in terms of

What is

This condition is called Q-conjugacy, or Q-orthogonality : if equation (7) holds, then pi and pj are said to be Q-conjugate or Q-orthogonal to each other. Or simply say "conjugate".

(7)

Elliptical Case 3

Tutorial 5-6


Elliptical case 4 l.jpg

In summary, if we can find n directions problem (in terms of po, . . .,pn_1 that are mutually conjugate, i.e. comply with (7), and if we do line minimization along each direction pk, we reach the minimum in at most n steps. Of course, we cannot use the transformation (5) in the algorithm, because E and especially UT are too large. So we need to find a method for generating n conjugate directions without using either Q or its SVD .

Elliptical Case 4

Tutorial 5-6


Hestenes stiefel procedure l.jpg

Hestenes Stiefel Procedure problem (in terms of

Where

Tutorial 5-6


Hestenes stiefel procedure 2 l.jpg

It is simple to see that p problem (in terms of k and pk+1 are conjugate. In fact,

Hestenes Stiefel Procedure 2

The proof that pi and pk+1 for i = 0, . . . , k are also conjugate can be

done by induction, based on the observation that the vectors pk are found by a generalization of Gram-Schmidt to produce conjugate rather than orthogonal vectors.

Tutorial 5-6


Removing the hessian l.jpg

In the described algorithm the expression for problem (in terms of yk contains the Hessian Q, which is too large. We now show that yk can be rewritten in terms of the gradient values gk and gk+1 only. To this end, we notice

That

Or

Proof:

So that

Removing the Hessian

Tutorial 5-6


Removing the hessian 2 l.jpg

We can therefore write problem (in terms of

and Q has disappeared .

This expression for yk can be further simplified by noticing that

because the line along pk is tangent to an isosurface at xk+l , while the gradient gk+l is orthogonal to the isosurface at xk +l.

Removing the Hessian 2

Tutorial 5-6


Polak ribiere formula l.jpg

Similarly, problem (in terms of

Then, the denominator of yk becomes

In conclusion, we obtain the Polak-Ribiere formula

Polak-Ribiere formula

Tutorial 5-6


ad