computacion inteligente
Skip this Video
Download Presentation
Computacion Inteligente

Loading in 2 Seconds...

play fullscreen
1 / 80

Computacion Inteligente - PowerPoint PPT Presentation

  • Uploaded on

Computacion Inteligente. Derivative-Based Optimization. Contents. Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient. OPTIMIZATION PROBLEMS.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Computacion Inteligente' - junior

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computacion inteligente

Computacion Inteligente

Derivative-Based Optimization

  • Optimization problems
  • Mathematical background
  • Descent Methods
  • The Method of Steepest Descent
  • Conjugate Gradient
Objective function – mathematical function which is optimized by changing the values of the design variables.
  • Design Variables – Those variables which we, as designers, can change.
  • Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.
3 basic ingredients…
  • an objective function,
  • a set of decision variables,
  • a set of equality/inequality constraints.

The problem is

to search for the values of the decision variables that minimize the objective function while satisfying the constraints…



Decision vector



  • Design Variables: decision and objective vector
  • Constraints: equality and inequality
  • Bounds: feasible ranges for variables
  • Objective Function: maximization can be converted to minimization due to the duality principle
Identify the quantity or function, f, to be optimized.
  • Identify the design variables: x1, x2, x3, …,xn.
  • Identify the constraints if any exist

a. Equalities

b. Inequalities

  • Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
Objective functions may be unimodal or multimodal.
    • Unimodal – only one optimum
    • Multimodal – more than one optimum
  • Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.
  • The global optimum is the best of all local optimum designs.
Existence of global minimum
  • If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S
    • A set S is closed if it contains all its boundary pts.
    • A set S is bounded if it is contained in the interior of some circle

compact = closed and bounded





saddle point

local max

Derivative-based optimization (gradient based)
    • Capable of determining “search directions” according to an objective function’s derivative information
      • steepest descent method;
      • Newton’s method; Newton-Raphson method;
      • Conjugate gradient, etc.
  • Derivative-free optimization
      • random search method;
      • genetic algorithm;
      • simulated annealing; etc.

The scalar xTMx= is called a quadratic form.

for all x ≠ 0

  • A square matrix M is positive definiteif
  • It is positive semidefiniteif

for all x

A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)
    • Proof (→): Let vi the eigenvector for the i-th eigenvalue λi
    • Then,
    • which implies λi > 0,

prove that positive eigenvalues imply positive definiteness.


Proof. Let’s f be defined as

  • If we can show that f is always positive then M must be positive definite. We can write this as
  • Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.
  • so f must always be positive
  • Theorem: If a matrix M = UTU then it is positive definite
It is no necessary for Q be symmetric.
    • Suposse matrix P non-symmetric

Q is symmetric

Given the quadratic function

If Q is positive definite, then f is a parabolic “bowl.”

Two other shapes can result from the quadratic form.
    • If Q is negative definite, then f is a parabolic “bowl” up side down.
    • If Q is indefinite then f describes a saddle.
Quadratics are useful in the study of optimization.
    • Often, objective functions are “close to” quadratic near the solution.
    • It is easier to analyze the behavior of algorithms when applied to quadratics.
    • Analysis of algorithms for quadratics gives insight into their behavior in general.
The derivative of f: R → R is a function f ′: R → R given by
  • if the limit exists.
Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives
  • exist for each x in Rnand are continuous functions of x.
  • In this case, we say f C1(a smoothfunctionC1)
Definition: The gradient of f: in R2→ R:

It is a function ∇f: R2→ R2given by

In the plane

Proposition 1:

is maximal choosing

intuitive: the gradient points at the greatest change direction

Prove it!

    • Assign:
    • by chain rule:
    • On the other hand for general v:
Proposition 2: let f: Rn→ R be a smooth function C1 around p,
  • if f has local minimum (maximum) at p then,

Intuitive: necessary for local min(max)

We found the best INFINITESIMAL DIRECTION at each point,
  • Looking for minimum: “blind man” procedure
  • How can we derive the way to the minimum using this knowledge?
The gradient of f: Rn→ Rmis a function Df: Rn→ Rm×ngiven by

called Jacobian

Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.

If the derivative of ∇f exists, we say that f is twice differentiable.
    • Write the second derivative as D2f (or F), and call it the Hessianof f.
Proof of fact:
    • Imagine a particle traveling along the level set.
    • Let g(t) be the position of the particle at time t, with g(0) = x0.
    • Note that f(g(t)) = constant for all t.
    • Velocity vector g′(t) is tangent to the level set.
    • Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,
    • Hence, ∇f(x0) and g′(0) are orthogonal.
Suppose f: R → R is in C1. Then,
  • o(h) is a term such that o(h) = h → 0 as h → 0.
  • At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.
Suppose f: R → R is in C2. Then,
  • At x0, f can be approximated by a quadratic function.
Suppose f: Rn→ R.
    • If f in C1, then
    • If f in C2, then
We already know that ∇f(x0) is orthogonal to the level set at x0.
    • Suppose ∇f(x0) ≠ 0.
  • Fact: ∇f points in the direction of increasing f.
Consider xα = x0 + α∇f(x0), α > 0.
    • By Taylor\'s formula,
  • Therefore, for sufficiently small ,

f(xα) > f(x0)

This theorem is the link from the previous gradient properties to the constructive algorithm.
  • The problem:

We introduce a model for algorithm:


Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

The Theorem:
    • Suppose f: Rn→ R C1 smooth, and exist continuous function: k: Rn→ [0,1], and,
    • And, the search vectors constructed by the model algorithm satisfy:
  • Then
    • if is the sequence constructed by the algorithm model,
    • then any accumulation pointy of this sequence satisfy:
The theorem has very intuitive interpretation:
  • Always go in descent direction.

The principal differences between various descent algorithms lie inthe first procedure for determining successive directions

We now use what we have learned to implement the most basic minimization technique.
  • First we introduce the algorithm, which is a version of the model algorithm.
  • The problem:

Steepest descent algorithm:


Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

    • If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:
    • Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

How long a step to take?

Note search direction is

    • We are limited to a line search
  • Choose λ to minimize f .

. . . directional derivative is equal to zero.

How long a step to take?
    • From the chain rule:
  • Therefore the method of steepest descent looks like this:

They are orthogonal !



Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

λ arbitrary



Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

We from now on assume we want to minimize the quadratic function:
  • This is equivalent to solve linear problem:

If A symmetric

Cada elipsoide tiene f(x) constante

In general, the solution x lies at the intersection point

of n hyperplanes, each having dimension n– 1.

What is the problem with steepest descent?
    • We can repeat the same directions over and over…
  • Wouldn’t it be better if, every time we took a step, we got it right the first time?
What is the problem with steepest descent?
    • We can repeat the same directions over and over…
  • Conjugate gradient requires n gradient evaluations and n line searches.


  • First, let’s define de error as
  • eiis a vector that indicates how far we are from the solution.

Start point


(should span Rn)

  • Let’s pick a set of orthogonal search directions
  • In each search direction, we’ll take exactly one step,

that step will be just the right length to line up evenly with

Using the coordinate axes as search directions…
  • Unfortunately, this method only works if you already know the answer.
Given , how do we calculate ?
  • ei+1 should be orthogonal to di
How do we find ?
    • Since search vectors form a basis

On the other hand

So we look for such that
    • Simple calculation shows that if we take

The correct choice is



Step 0:

Step 1:

Step 2:

Step 3:

Step 4: and repeat n times

  • Conjugate gradient algorithm for minimizing f:
  • J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.
  • Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005
  • Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004
  • Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005
  • Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000
  • Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996
  • Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994
  • Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004