Nonlinear programming Unconstrained optimization techniques
Introduction • This chapter deals with the various methods of solving the unconstrained minimization problem: • It is true that rarely a practical design problem would be unconstrained; still, a study of this class of problems would be important for the following reasons: • The constraints do not have significant influence in certain design problems. • Some of the powerful and robust methods of solving constrained minimization problems require the use of unconstrained minimization techniques. • The unconstrained minimization methods can be used to solve certain complex engineering analysis problems. For example, the displacement response (linear or nonlinear) of any structure under any specified load condition can be found by minimizing its potential energy. Similarly, the eigenvalues and eigenvectors of any discrete system can be found by minimizing the Rayleigh quotient.
Direct search methods Random search method Grid search method Univariate method Pattern search methods Powell’s method Hooke-Jeeves method Rosenbrock’s method Simplex method Descent methods Steepest descent (Cauchy method) Fletcher-Reeves method Newton’s method Marquardt method Quasi-Newton methods Davidon-Fletcher-Powell method Broyden-Fletcher-Goldfarb-Shanno method Classification of unconstrained minimization methods
Direct search methods • They require only the objective function values but not the partial derivatives of the function in finding the minimum and hence are often called the nongradient methods. • The direct search methods are also known as zeroth-order methods since they use zeroth-order derivatives of the function. • These methods are most suitable for simple problems involving a relatively small numbers of variables. • These methods are in general less efficient than the descent methods.
Descent methods • The descent techniques require, in addition to the function values, the first and in some cases the second derivatives of the objective function. • Since more information about the function being minimized is used (through the use of derivatives), descent methods are generally more efficient than direct search techniques. • The descent methods are known as gradient methods. • Among the gradient methods, those requiring only first derivatives of the function are called first-order methods; those requiring both first and second derivatives of the function are termed second-order methods.
General approach • All unconstrained minimization methods are iterative in nature and hence they start from an initial trial solution and proceed toward the minimum point in a sequential manner. • Different unconstrained minimization techniques differ from one another only in the method of generating the new point Xi+1 from Xi and in testing the point Xi+1 for optimality.
Convergence rates In general, an optimization method is said to have convergence of order p if where Xi and Xi+1 denote the points obtained at the end of iterations i and i+1, respectively, X* represents the optimum point, and ||X|| denotes the length or norm of the vector X:
Convergence rates • If p=1 and 0 k 1, the method is said to be linearly convergent(corresponds to slow convergence). • If p=2, the method is said to be quadratically convergent(corresponds to fast convergence). • An optimization method is said to have superlinear convergence (corresponds to fast convergence) if: • The above definitions of rates of convergence are applicable to single-variable as well as multivariable optimization problems.
Condition number • The condition number of an n x n matrix, [A] is defined as:
Scaling of design variables • The rate of convergence of most unconstrained minimization methods can be improved by scaling the design variables. • For a quadratic objective function, the scaling of the design variables changes the condition number of the Hessian matrix. • When the condition number of the Hessian matrix is 1, the steepest descent method, for example, finds the minimum of a quadratic objective function in one iteration.
Scaling of design variables • If f=1/2 XT[A] X denotes a quadratic term, a transformation of the form can be used to obtain a new quadratic term as: The matrix [R] can be selected to make diagonal (i.e., to eliminate the mixed quadratic terms).
Scaling of design variables • For this, the columns of the matrix [R] are to be chosen as the eigenvectors of the matrix [A]. • Next, the diagonal elements of the matrix can be reduced to 1 (so that the condition number of the resulting matrix will be 1) by using the transformation
Scaling of design variables • Where the matrix [S] is given by: • Thus, the complete transformation that reduces the Hessian matrix of f to an identity matrix is given by: so that the quadratic term
Scaling of design variables • If the objective function is not a quadratic, the Hessian matrix and hence the transformations vary with the design vector from iteration to iteration. For example, the second-order Taylor’s series approximation of a general nonlinear function at the design vector Xi can be expressed as: where
Scaling of design variables • The transformations indicated by the equations: can be applied to the matrix [A] given by
Example Find a suitable scaling (or transformation) of variables to reduce the condition number of the Hessian matrix of the following function to 1: Solution: The quadratic function can be expressed as: where As indicated above, the desired scaling of variables can be accomplished in two stages.
Example Stage 1: Reducing [A] to a Diagonal Form, The eigenvectors of the matrix [A] can be found by solving the eigenvalue problem: where i is the ith eigenvalue and ui is the corresponding eigenvector. In the present case, the eigenvalues, i are given by: which yield 1=8+52=15.2111 and 2=8-52=0.7889.
Example The eigenvector ui corresponding toi can be found by solving
Example Thus the transformation that reduces [A] to a diagonal form is given by: This yields the new quadratic term as where
Example And hence the quadratic function becomes: Stage 2: Reducing to a unit matrix The transformation is given by , where
Example Stage 3: Complete Transformation The total transformation is given by
Example With this transformation, the quadratic function of becomes
Example The contour the below equation is:
Example The contour the below equation is:
Example The contour the below equation is:
Direct search methods Random Search Methods: Random serach methods are based on the use of random numbers in finding the minimum point. Since most of the computer libraries have random number generators, these methods can be used quite conveniently. Some of the best known random search methods are: • Random jumping method • Random walk method
Random jumping method • Although the problem is an unconstrained one, we establish the bounds liand ui for each design variable xi, i=1,2,…,n, for generating the random values of xi: • In the random jumping method, we generate sets of n random numbers, (r1, r2,….,rn), that are uniformly distributed between 0 and 1. Each set of these numbers, is used to find a point, X, inside the hypercube defined by the above equation as and the value of the function is evaluated at this point X.
Random jumping method • By generating a large number of random points Xand evaluating the value of the objective function at each of these points, we can take the smallest value of f(X)as the desired minimum point.
Random walk method • The random walk method is based on generating a sequence of improved approximations to the minimum, each derived from the preceding approximation. • Thus, if Xi is the approximation to the minimum obtained in the (i-1)th stage (or step or iteration), the new or improved approximation in the ith stage is found from the relation where is a prescribed scalar step length and ui is a unit random vector generated in the ith stage.
Random walk method The detailed procedure of this method is given by the following steps: • Start with an initial point X1, a sufficiently large initial step length , a minimum allowable step length , and a maximum permissable number of iterations N. • Find the function value f1 = f(X1). • Set the iteration number as i=1 • Generatea set of n random numbers r1, r2, …,rneach lying in the interval [-1 1] and formulate the unit vector u as:
Random walk method 4. The directions generated by the equation are expected to have a bias toward the diagonals of the unit hypercube. To avoid such a bias, the length of the vector R, is computed as: and the random numbers (r1, r2, …,rn) generated are accepted only if R≤1 but are discarded if R>1. If the random numbers are accepted, the unbiased random vector ui is given by:
Random walk method 5. Compute the new vector and the corresponding function value as 6. Compare the values of f and f1. If f < f1, set the new values as X1 =X and f1=f, and go to step 3. If f ≥ f1, go to step 7. • If i ≤ N, set the new iteration number as i = i+1 and go to step 4. On the other hand, if i > N, go to step 8. • Compute the new, reduced step length as = /2. If the new step length is smaller than or equal to , go to step 9. Otherwise (i.e., if the new step length is greater than ), go to step 4. • Stop the procedure by taking Xopt X1 andfopt f1
Example • Minimize using random walk method from the point with a starting step length of =1.0. Take =0.05 and N = 100
Random walk method with direction exploitation • In the random walk method explained, we proceed to generate a new unit random vector ui+1as soon as we find that ui is successful in reducing the function value for a fixed step length . • However, we can expect to achieve a further decrease in the function value by taking a longer step length along the direction ui. • Thus, the random walk method can be improved if the maximum possible step is taken along each successful direction. This can be achieved by using any of the one-dimensional minimization methods discussed in the previous chapter.
Random walk method with direction exploitation • According to this procedure, the new vector Xi+1 is found as: where i* is the optimal step length found along the direction uiso that The search method incorporating this feature is called the random walk method with direction exploitation.
Advantages of random search methods • These methods can work even if the objective function is discontinuous and nondifferentiable at some of the points. • The random methods can be used to find the global minimum when the objective function possesses several relative minima. • These methods are applicable when other methods fail due to local difficulties such as sharply varying functions and shallow regions. • Although the random methods are not very efficient by themselves, they can be used in the early stages of optimization to detect the region where the global minimum is likely to be found. Once this region is found, some of the more efficient techniques can be used to find the precise location of the global minimum point.
Grid-search method • This method involves setting up a suitable grid in the design space, evaluating the objective function at all the grid points, and finding the grid point corresponding to the lowest function values. For example if the lower and upper bounds on the ith design variable are known to be liand ui, respectively, we can divide the range (li ,ui) into pi-1equal partsso that xi(1), xi(2),…, xi(pi) denote the grid points along the xi axis ( i=1,2,..,n). • It can be seen that the grid method requires prohibitively large number of function evaluations in most practical problems. For example, for a problem with 10 design variables (n=10), the number of grid points will be 310=59049 with pi=3 and 410=1,048,576 with pi=4 (i=1,2,..,10).
Grid-search method • For problems with a small number of design variables, the grid method can be used conveniently to find an approximate minimum. • Also, the grid method can be used to find a good starting point for one of the more efficient methods.
Univariate method • In this method, we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. • By starting at a base point Xi in the ith iteration, we fix the values of n-1 variables and vary the remaining variable. Since only one variable is changed, the problem becomes a one-dimensional minimization problem and any of the methods discussed in the previous chapter on one dimensional minimization methods can be used to produce a new base point Xi+1. • The search is now continued in a new direction. This new direction is obtained by changing any one of the n-1 variables that were fixed in the previous iteration.
Univariate method • In fact, the search procedure is continued by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence we repeat the entire process of sequential minimization. • The procedure is continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows: • Choose an arbitrary starting point X1and set i=1 • Find the search direction S as
Univariate method • Determine whether i should be positive or negative. For the current direction Si, this means find whether the function value decreases in the positive or negative direction. For this, we take a small probe length () and evaluate fi=f (Xi), f +=f(Xi+ Si), and f -=f(Xi- Si). Iff+ < fi , Si will be the correct direction for decreasing the value of f and if f - < fi , -Si will be the correct one. If both f+ and f – are greater than fi, we takeXi as the minimum along the direction Si.
Univariate method 4. Find the optimal step length i* such that where + or – sign has to be used depending upon whether Si or -Si is the direction for decreasing the function value. 5. Set Xi+1 = Xi ±i*Si depending on the direction for decreasing the function value, andf i+1= f (Xi+1). 6. Set the new value of i=i+1 , and go to step 2. Continue this procedure until no significant change is achieved in the value of the objective function.
Univariate method • The univariate method is very simple and can be implemented easily. • However, it will not converge rapidly to the optimum solution, as it has a tendency to oscillate with steadily decreasing progress towards the optimum. • Hence it will be better to stop the computations at some point near to the optimum point rather than trying to find the precise optimum point. • In theory, the univariate method can be applied to find the minimum of any function that possesses continuous derivatives. • However, if the function has a steep valley, the method may not even converge.
Univariate method For example, consider the contours of a function of two variables with a valley as shown in figure. If the univariate search starts at point P, the function value can not be decreased either in the direction ±S1, or in the direction ±S2. Thus, the search comes to a halt and one may be misled to take the point P, which is certainly not the optimum point, as the optimum point. This situation arises whenever the value of the probe length needed for detecting the proper direction (±S1 or ±S2) happens to be less than the number of significant figures used in the computations.
Example Minimize With the starting point (0,0). Solution: We will take the probe length as 0.01 to find the correct direction for decreasing the function value in step 3. Further, we will use the differential calculus method to find the optimum step length i* along the direction ± Si in step 4.
Example Iteration i=1 Step 2: Choose the search direction S1as Step 3: To find whether the value of f decreases along S1or –S1, we use theprobe length. Since -S1is the correct direction for minimizing f from X1.
Example Step 4: To find the optimum step length 1*, we minimize Step 5: Set
Example Iteration i=2 Step 2: Choose the search direction S2as Step 3: Since S2is the correct direction for decreasing the value of f from X2.