computacion inteligente
Download
Skip this Video
Download Presentation
Computacion Inteligente

Loading in 2 Seconds...

play fullscreen
1 / 80

Computacion Inteligente - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Computacion Inteligente. Derivative-Based Optimization. Contents. Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient. OPTIMIZATION PROBLEMS.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Computacion Inteligente' - junior


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computacion inteligente

Computacion Inteligente

Derivative-Based Optimization

contents
Contents
  • Optimization problems
  • Mathematical background
  • Descent Methods
  • The Method of Steepest Descent
  • Conjugate Gradient
slide4
Objective function – mathematical function which is optimized by changing the values of the design variables.
  • Design Variables – Those variables which we, as designers, can change.
  • Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.
slide5
3 basic ingredients…
  • an objective function,
  • a set of decision variables,
  • a set of equality/inequality constraints.

The problem is

to search for the values of the decision variables that minimize the objective function while satisfying the constraints…

slide6

Obective

Decision vector

Bounds

constrains

  • Design Variables: decision and objective vector
  • Constraints: equality and inequality
  • Bounds: feasible ranges for variables
  • Objective Function: maximization can be converted to minimization due to the duality principle
slide7
Identify the quantity or function, f, to be optimized.
  • Identify the design variables: x1, x2, x3, …,xn.
  • Identify the constraints if any exist

a. Equalities

b. Inequalities

  • Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
slide8
Objective functions may be unimodal or multimodal.
    • Unimodal – only one optimum
    • Multimodal – more than one optimum
  • Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.
  • The global optimum is the best of all local optimum designs.
slide9
Existence of global minimum
  • If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S
    • A set S is closed if it contains all its boundary pts.
    • A set S is bounded if it is contained in the interior of some circle

compact = closed and bounded

slide10

x2

x1

slide11

saddle point

local max

slide12
Derivative-based optimization (gradient based)
    • Capable of determining “search directions” according to an objective function’s derivative information
      • steepest descent method;
      • Newton’s method; Newton-Raphson method;
      • Conjugate gradient, etc.
  • Derivative-free optimization
      • random search method;
      • genetic algorithm;
      • simulated annealing; etc.
slide14

The scalar xTMx= is called a quadratic form.

for all x ≠ 0

  • A square matrix M is positive definiteif
  • It is positive semidefiniteif

for all x

slide15
A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)
    • Proof (→): Let vi the eigenvector for the i-th eigenvalue λi
    • Then,
    • which implies λi > 0,

prove that positive eigenvalues imply positive definiteness.

slide16

Proof. Let’s f be defined as

  • If we can show that f is always positive then M must be positive definite. We can write this as
  • Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.
  • so f must always be positive
  • Theorem: If a matrix M = UTU then it is positive definite
slide18
It is no necessary for Q be symmetric.
    • Suposse matrix P non-symmetric

Q is symmetric

slide20
Given the quadratic function

If Q is positive definite, then f is a parabolic “bowl.”

slide21
Two other shapes can result from the quadratic form.
    • If Q is negative definite, then f is a parabolic “bowl” up side down.
    • If Q is indefinite then f describes a saddle.
slide22
Quadratics are useful in the study of optimization.
    • Often, objective functions are “close to” quadratic near the solution.
    • It is easier to analyze the behavior of algorithms when applied to quadratics.
    • Analysis of algorithms for quadratics gives insight into their behavior in general.
slide23
The derivative of f: R → R is a function f ′: R → R given by
  • if the limit exists.
slide27
Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives
  • exist for each x in Rnand are continuous functions of x.
  • In this case, we say f C1(a smoothfunctionC1)
slide28
Definition: The gradient of f: in R2→ R:

It is a function ∇f: R2→ R2given by

In the plane

slide32
Proposition 1:

is maximal choosing

intuitive: the gradient points at the greatest change direction

Prove it!

slide33
Proof:
    • Assign:
    • by chain rule:
slide34
Proof:
    • On the other hand for general v:
slide35
Proposition 2: let f: Rn→ R be a smooth function C1 around p,
  • if f has local minimum (maximum) at p then,

Intuitive: necessary for local min(max)

slide37
We found the best INFINITESIMAL DIRECTION at each point,
  • Looking for minimum: “blind man” procedure
  • How can we derive the way to the minimum using this knowledge?
slide38
The gradient of f: Rn→ Rmis a function Df: Rn→ Rm×ngiven by

called Jacobian

Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.

slide39
If the derivative of ∇f exists, we say that f is twice differentiable.
    • Write the second derivative as D2f (or F), and call it the Hessianof f.
slide42
Proof of fact:
    • Imagine a particle traveling along the level set.
    • Let g(t) be the position of the particle at time t, with g(0) = x0.
    • Note that f(g(t)) = constant for all t.
    • Velocity vector g′(t) is tangent to the level set.
    • Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,
    • Hence, ∇f(x0) and g′(0) are orthogonal.
slide43
Suppose f: R → R is in C1. Then,
  • o(h) is a term such that o(h) = h → 0 as h → 0.
  • At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.
slide44
Suppose f: R → R is in C2. Then,
  • At x0, f can be approximated by a quadratic function.
slide45
Suppose f: Rn→ R.
    • If f in C1, then
    • If f in C2, then
slide46
We already know that ∇f(x0) is orthogonal to the level set at x0.
    • Suppose ∇f(x0) ≠ 0.
  • Fact: ∇f points in the direction of increasing f.
slide47
Consider xα = x0 + α∇f(x0), α > 0.
    • By Taylor\'s formula,
  • Therefore, for sufficiently small ,

f(xα) > f(x0)

slide49
This theorem is the link from the previous gradient properties to the constructive algorithm.
  • The problem:
slide50

We introduce a model for algorithm:

Data

Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

slide51
The Theorem:
    • Suppose f: Rn→ R C1 smooth, and exist continuous function: k: Rn→ [0,1], and,
    • And, the search vectors constructed by the model algorithm satisfy:
slide52
And
  • Then
    • if is the sequence constructed by the algorithm model,
    • then any accumulation pointy of this sequence satisfy:
slide53
The theorem has very intuitive interpretation:
  • Always go in descent direction.

The principal differences between various descent algorithms lie inthe first procedure for determining successive directions

slide55
We now use what we have learned to implement the most basic minimization technique.
  • First we introduce the algorithm, which is a version of the model algorithm.
  • The problem:
slide56

Steepest descent algorithm:

Data

Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

slide57
Theorem:
    • If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:
    • Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

slide58
How long a step to take?

Note search direction is

    • We are limited to a line search
  • Choose λ to minimize f .

. . . directional derivative is equal to zero.

slide59
How long a step to take?
    • From the chain rule:
  • Therefore the method of steepest descent looks like this:

They are orthogonal !

slide61

Given:

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

λ arbitrary

slide62

Given:

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

slide64
We from now on assume we want to minimize the quadratic function:
  • This is equivalent to solve linear problem:

If A symmetric

slide66
Cada elipsoide tiene f(x) constante

In general, the solution x lies at the intersection point

of n hyperplanes, each having dimension n– 1.

slide67
What is the problem with steepest descent?
    • We can repeat the same directions over and over…
  • Wouldn’t it be better if, every time we took a step, we got it right the first time?
slide68
What is the problem with steepest descent?
    • We can repeat the same directions over and over…
  • Conjugate gradient requires n gradient evaluations and n line searches.
slide69

solution

  • First, let’s define de error as
  • eiis a vector that indicates how far we are from the solution.

Start point

slide70

(should span Rn)

  • Let’s pick a set of orthogonal search directions
  • In each search direction, we’ll take exactly one step,

that step will be just the right length to line up evenly with

slide71
Using the coordinate axes as search directions…
  • Unfortunately, this method only works if you already know the answer.
slide73
Given , how do we calculate ?
  • ei+1 should be orthogonal to di
slide75
How do we find ?
    • Since search vectors form a basis

On the other hand

slide77
So we look for such that
    • Simple calculation shows that if we take

The correct choice is

slide78

Data

Step 0:

Step 1:

Step 2:

Step 3:

Step 4: and repeat n times

  • Conjugate gradient algorithm for minimizing f:
sources
Sources
  • J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.
  • Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005
  • Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004
  • Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005
sources1
Sources
  • Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000
  • Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996
  • Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994
  • Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004
ad