Computacion inteligente
This presentation is the property of its rightful owner.
Sponsored Links
1 / 80

Computacion Inteligente PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Computacion Inteligente. Derivative-Based Optimization. Contents. Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient. OPTIMIZATION PROBLEMS.

Download Presentation

Computacion Inteligente

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Computacion Inteligente

Derivative-Based Optimization


  • Optimization problems

  • Mathematical background

  • Descent Methods

  • The Method of Steepest Descent

  • Conjugate Gradient


  • Objective function – mathematical function which is optimized by changing the values of the design variables.

  • Design Variables – Those variables which we, as designers, can change.

  • Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.

3 basic ingredients…

  • an objective function,

  • a set of decision variables,

  • a set of equality/inequality constraints.

The problem is

to search for the values of the decision variables that minimize the objective function while satisfying the constraints…


Decision vector



  • Design Variables: decision and objective vector

  • Constraints: equality and inequality

  • Bounds: feasible ranges for variables

  • Objective Function: maximization can be converted to minimization due to the duality principle

  • Identify the quantity or function, f, to be optimized.

  • Identify the design variables: x1, x2, x3, …,xn.

  • Identify the constraints if any exist

    a. Equalities

    b. Inequalities

  • Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.

  • Objective functions may be unimodal or multimodal.

    • Unimodal – only one optimum

    • Multimodal – more than one optimum

  • Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.

  • The global optimum is the best of all local optimum designs.

  • Existence of global minimum

  • If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S

    • A set S is closed if it contains all its boundary pts.

    • A set S is bounded if it is contained in the interior of some circle

compact = closed and bounded



saddle point

local max

  • Derivative-based optimization (gradient based)

    • Capable of determining “search directions” according to an objective function’s derivative information

      • steepest descent method;

      • Newton’s method; Newton-Raphson method;

      • Conjugate gradient, etc.

  • Derivative-free optimization

    • random search method;

    • genetic algorithm;

    • simulated annealing; etc.


The scalar xTMx= is called a quadratic form.

for all x ≠ 0

  • A square matrix M is positive definiteif

  • It is positive semidefiniteif

for all x

  • A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)

    • Proof (→): Let vi the eigenvector for the i-th eigenvalue λi

    • Then,

    • which implies λi > 0,

prove that positive eigenvalues imply positive definiteness.

  • Proof. Let’s f be defined as

  • If we can show that f is always positive then M must be positive definite. We can write this as

  • Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.

  • so f must always be positive

  • Theorem: If a matrix M = UTU then it is positive definite

  • f: Rn→ R is a quadratic function if

    • where Q is symmetric.

  • It is no necessary for Q be symmetric.

    • Suposse matrix P non-symmetric

Q is symmetric

  • Suposse matrix P non-symmetric. Example

Q is symmetric

  • Given the quadratic function

If Q is positive definite, then f is a parabolic “bowl.”

  • Two other shapes can result from the quadratic form.

    • If Q is negative definite, then f is a parabolic “bowl” up side down.

    • If Q is indefinite then f describes a saddle.

  • Quadratics are useful in the study of optimization.

    • Often, objective functions are “close to” quadratic near the solution.

    • It is easier to analyze the behavior of algorithms when applied to quadratics.

    • Analysis of algorithms for quadratics gives insight into their behavior in general.

  • The derivative of f: R → R is a function f ′: R → R given by

  • if the limit exists.

  • Along the Axes…

  • In general direction…

  • Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives

  • exist for each x in Rnand are continuous functions of x.

  • In this case, we say f C1(a smoothfunctionC1)

  • Definition: The gradient of f: in R2→ R:

    It is a function ∇f: R2→ R2given by

In the plane

  • Definition: The gradient of f: Rn→ R is a function ∇f: Rn→ Rngiven by

  • The gradient defines (hyper) plane approximating the function infinitesimally

  • By the chain rule

  • Proposition 1:

    is maximal choosing

intuitive: the gradient points at the greatest change direction

Prove it!

  • Proof:

    • Assign:

    • by chain rule:

  • Proof:

    • On the other hand for general v:

  • Proposition 2: let f: Rn→ R be a smooth function C1 around p,

  • if f has local minimum (maximum) at p then,

Intuitive: necessary for local min(max)

  • Proof: intuitive

  • We found the best INFINITESIMAL DIRECTION at each point,

  • Looking for minimum: “blind man” procedure

  • How can we derive the way to the minimum using this knowledge?

  • The gradient of f: Rn→ Rmis a function Df: Rn→ Rm×ngiven by

called Jacobian

Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.

  • If the derivative of ∇f exists, we say that f is twice differentiable.

    • Write the second derivative as D2f (or F), and call it the Hessianof f.

  • The level set of a function f: Rn→ R at level c is the set of points S = {x: f(x) = c}.

  • Fact: ∇f(x0) is orthogonal to the level set at x0

  • Proof of fact:

    • Imagine a particle traveling along the level set.

    • Let g(t) be the position of the particle at time t, with g(0) = x0.

    • Note that f(g(t)) = constant for all t.

    • Velocity vector g′(t) is tangent to the level set.

    • Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,

    • Hence, ∇f(x0) and g′(0) are orthogonal.

  • Suppose f: R → R is in C1. Then,

  • o(h) is a term such that o(h) = h → 0 as h → 0.

  • At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.

  • Suppose f: R → R is in C2. Then,

  • At x0, f can be approximated by a quadratic function.

  • Suppose f: Rn→ R.

    • If f in C1, then

    • If f in C2, then

  • We already know that ∇f(x0) is orthogonal to the level set at x0.

    • Suppose ∇f(x0) ≠ 0.

  • Fact: ∇f points in the direction of increasing f.

  • Consider xα = x0 + α∇f(x0), α > 0.

    • By Taylor's formula,

  • Therefore, for sufficiently small ,

    f(xα) > f(x0)


  • This theorem is the link from the previous gradient properties to the constructive algorithm.

  • The problem:

  • We introduce a model for algorithm:


Step 0: set i = 0

Step 1: ifstop,

else, compute search direction

Step 2: compute the step-size

Step 3: setgo to step 1

  • The Theorem:

    • Suppose f: Rn→ R C1 smooth, and exist continuous function: k: Rn→ [0,1], and,

    • And, the search vectors constructed by the model algorithm satisfy:

  • And

  • Then

    • if is the sequence constructed by the algorithm model,

    • then any accumulation pointy of this sequence satisfy:

    • The theorem has very intuitive interpretation:

    • Always go in descent direction.

    The principal differences between various descent algorithms lie inthe first procedure for determining successive directions


    • We now use what we have learned to implement the most basic minimization technique.

    • First we introduce the algorithm, which is a version of the model algorithm.

    • The problem:

    • Steepest descent algorithm:


    Step 0: set i = 0

    Step 1: ifstop,

    else, compute search direction

    Step 2: compute the step-size

    Step 3: setgo to step 1

    • Theorem:

      • If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:

      • Proof: from Wolfe theorem

    Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

    • How long a step to take?

    Note search direction is

    • We are limited to a line search

  • Choose λ to minimize f .

  • . . . directional derivative is equal to zero.

    • How long a step to take?

      • From the chain rule:

    • Therefore the method of steepest descent looks like this:

    They are orthogonal !


    Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

    λ arbitrary


    Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.


    • We from now on assume we want to minimize the quadratic function:

    • This is equivalent to solve linear problem:

    If A symmetric

    • La solucion es la interseccion de las lineas

    • Cada elipsoide tiene f(x) constante

    In general, the solution x lies at the intersection point

    of n hyperplanes, each having dimension n– 1.

    • What is the problem with steepest descent?

      • We can repeat the same directions over and over…

    • Wouldn’t it be better if, every time we took a step, we got it right the first time?

    • What is the problem with steepest descent?

      • We can repeat the same directions over and over…

    • Conjugate gradient requires n gradient evaluations and n line searches.


    • First, let’s define de error as

    • eiis a vector that indicates how far we are from the solution.

    Start point

    (should span Rn)

    • Let’s pick a set of orthogonal search directions

    • In each search direction, we’ll take exactly one step,

    that step will be just the right length to line up evenly with

    • Using the coordinate axes as search directions…

    • Unfortunately, this method only works if you already know the answer.

    • We have

    • Given , how do we calculate ?

    • ei+1 should be orthogonal to di

    • Given , how do we calculate ?

      • That is

    • How do we find ?

      • Since search vectors form a basis

    On the other hand

    • We want that after n step the error will be 0:

      • Here an idea: if then:

    So if:

    • So we look for such that

      • Simple calculation shows that if we take

    The correct choice is


    Step 0:

    Step 1:

    Step 2:

    Step 3:

    Step 4: and repeat n times

    • Conjugate gradient algorithm for minimizing f:


    • J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.

    • Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005

    • Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004

    • Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005


    • Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000

    • Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996

    • Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994

    • Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004

  • Login