1 / 80

# Computacion Inteligente - PowerPoint PPT Presentation

Computacion Inteligente. Derivative-Based Optimization. Contents. Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient. OPTIMIZATION PROBLEMS.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Computacion Inteligente' - junior

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Computacion Inteligente

Derivative-Based Optimization

• Optimization problems

• Mathematical background

• Descent Methods

• The Method of Steepest Descent

• Objective function – mathematical function which is optimized by changing the values of the design variables.

• Design Variables – Those variables which we, as designers, can change.

• Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.

• an objective function,

• a set of decision variables,

• a set of equality/inequality constraints.

The problem is

to search for the values of the decision variables that minimize the objective function while satisfying the constraints…

Decision vector

Bounds

constrains

• Design Variables: decision and objective vector

• Constraints: equality and inequality

• Bounds: feasible ranges for variables

• Objective Function: maximization can be converted to minimization due to the duality principle

• Identify the quantity or function, f, to be optimized.

• Identify the design variables: x1, x2, x3, …,xn.

• Identify the constraints if any exist

a. Equalities

b. Inequalities

• Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.

• Objective functions may be unimodal or multimodal.

• Unimodal – only one optimum

• Multimodal – more than one optimum

• Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.

• The global optimum is the best of all local optimum designs.

• Existence of global minimum

• If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S

• A set S is closed if it contains all its boundary pts.

• A set S is bounded if it is contained in the interior of some circle

compact = closed and bounded

x2

x1

local max

• Capable of determining “search directions” according to an objective function’s derivative information

• steepest descent method;

• Newton’s method; Newton-Raphson method;

• Derivative-free optimization

• random search method;

• genetic algorithm;

• simulated annealing; etc.

The scalar xTMx= is called a quadratic form.

for all x ≠ 0

• A square matrix M is positive definiteif

• It is positive semidefiniteif

for all x

• A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)

• Proof (→): Let vi the eigenvector for the i-th eigenvalue λi

• Then,

• which implies λi > 0,

prove that positive eigenvalues imply positive definiteness.

• Proof. Let’s f be defined as

• If we can show that f is always positive then M must be positive definite. We can write this as

• Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.

• so f must always be positive

• Theorem: If a matrix M = UTU then it is positive definite

• f: Rn→ R is a quadratic function if

• where Q is symmetric.

Q is symmetric

Q is symmetric

If Q is positive definite, then f is a parabolic “bowl.”

• Quadratics are useful in the study of optimization.

• Often, objective functions are “close to” quadratic near the solution.

• It is easier to analyze the behavior of algorithms when applied to quadratics.

• Analysis of algorithms for quadratics gives insight into their behavior in general.

• Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives

• exist for each x in Rnand are continuous functions of x.

• In this case, we say f C1(a smoothfunctionC1)

• Definition: The gradient of f: in R2→ R:

It is a function ∇f: R2→ R2given by

In the plane

• Definition: The gradient of f: Rn→ R is a function ∇f: Rn→ Rngiven by

intuitive: the gradient points at the greatest change direction

Prove it!

• Proof function infinitesimally:

• Assign:

• by chain rule:

• Proof function infinitesimally:

• On the other hand for general v:

• Proposition 2 function infinitesimally: let f: Rn→ R be a smooth function C1 around p,

• if f has local minimum (maximum) at p then,

Intuitive: necessary for local min(max)

• Proof function infinitesimally: intuitive

• The gradient of function infinitesimallyf: Rn→ Rmis a function Df: Rn→ Rm×ngiven by

called Jacobian

Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.

• If the derivative of ∇ function infinitesimallyf exists, we say that f is twice differentiable.

• Write the second derivative as D2f (or F), and call it the Hessianof f.

• Fact function infinitesimally: ∇f(x0) is orthogonal to the level set at x0

• Proof of fact function infinitesimally:

• Imagine a particle traveling along the level set.

• Let g(t) be the position of the particle at time t, with g(0) = x0.

• Note that f(g(t)) = constant for all t.

• Velocity vector g′(t) is tangent to the level set.

• Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,

• Hence, ∇f(x0) and g′(0) are orthogonal.

• Suppose function infinitesimallyf: R → R is in C1. Then,

• o(h) is a term such that o(h) = h → 0 as h → 0.

• At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.

• Suppose function infinitesimallyf: R → R is in C2. Then,

• At x0, f can be approximated by a quadratic function.

• Suppose function infinitesimallyf: Rn→ R.

• If f in C1, then

• If f in C2, then

• We already know that ∇ function infinitesimallyf(x0) is orthogonal to the level set at x0.

• Suppose ∇f(x0) ≠ 0.

• Fact: ∇f points in the direction of increasing f.

• Consider function infinitesimallyxα = x0 + α∇f(x0), α > 0.

• By Taylor's formula,

• Therefore, for sufficiently small ,

f(xα) > f(x0)

DESCENT METHODS function infinitesimally

Data

Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

• The Theorem properties to the constructive algorithm.:

• Suppose f: Rn→ R C1 smooth, and exist continuous function: k: Rn→ [0,1], and,

• And, the search vectors constructed by the model algorithm satisfy:

• And properties to the constructive algorithm.

• Then

• if is the sequence constructed by the algorithm model,

• then any accumulation pointy of this sequence satisfy:

• The principal differences between various descent algorithms lie inthe first procedure for determining successive directions

STEEPEST DESCENT properties to the constructive algorithm.

Data

Step 0: set i = 0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

• Theorem minimization technique.:

• If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:

• Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

Note search direction is

• We are limited to a line search

• Choose λ to minimize f .

• . . . directional derivative is equal to zero.

• How long a step to take? minimization technique.

• From the chain rule:

• Therefore the method of steepest descent looks like this:

They are orthogonal !

Given: minimization technique.

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

λ arbitrary

Given: minimization technique.

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

If A symmetric

In general, the solution x lies at the intersection point

of n hyperplanes, each having dimension n– 1.

• What is the minimization technique.problem with steepest descent?

• We can repeat the same directions over and over…

• Wouldn’t it be better if, every time we took a step, we got it right the first time?

• What is the minimization technique.problem with steepest descent?

• We can repeat the same directions over and over…

solution minimization technique.

• First, let’s define de error as

• eiis a vector that indicates how far we are from the solution.

Start point

(should span minimization technique.Rn)

• Let’s pick a set of orthogonal search directions

• In each search direction, we’ll take exactly one step,

that step will be just the right length to line up evenly with

• Unfortunately, this method only works if you already know the answer.

• ei+1 should be orthogonal to di

On the other hand

So if:

The correct choice is

Data minimization technique.

Step 0:

Step 1:

Step 2:

Step 3:

Step 4: and repeat n times

• Conjugate gradient algorithm for minimizing f:

Sources minimization technique.

• J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.

• Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005

• Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004

• Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005

Sources minimization technique.

• Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000

• Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996

• Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994

• Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004