1 / 80

Computacion Inteligente

Computacion Inteligente. Derivative-Based Optimization. Contents. Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient. OPTIMIZATION PROBLEMS.

junior
Download Presentation

Computacion Inteligente

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computacion Inteligente Derivative-Based Optimization

  2. Contents • Optimization problems • Mathematical background • Descent Methods • The Method of Steepest Descent • Conjugate Gradient

  3. OPTIMIZATION PROBLEMS

  4. Objective function – mathematical function which is optimized by changing the values of the design variables. • Design Variables – Those variables which we, as designers, can change. • Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.

  5. 3 basic ingredients… • an objective function, • a set of decision variables, • a set of equality/inequality constraints. The problem is to search for the values of the decision variables that minimize the objective function while satisfying the constraints…

  6. Obective Decision vector Bounds constrains • Design Variables: decision and objective vector • Constraints: equality and inequality • Bounds: feasible ranges for variables • Objective Function: maximization can be converted to minimization due to the duality principle

  7. Identify the quantity or function, f, to be optimized. • Identify the design variables: x1, x2, x3, …,xn. • Identify the constraints if any exist a. Equalities b. Inequalities • Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.

  8. Objective functions may be unimodal or multimodal. • Unimodal – only one optimum • Multimodal – more than one optimum • Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design. • The global optimum is the best of all local optimum designs.

  9. Existence of global minimum • If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S • A set S is closed if it contains all its boundary pts. • A set S is bounded if it is contained in the interior of some circle compact = closed and bounded

  10. x2 x1

  11. saddle point local max

  12. Derivative-based optimization (gradient based) • Capable of determining “search directions” according to an objective function’s derivative information • steepest descent method; • Newton’s method; Newton-Raphson method; • Conjugate gradient, etc. • Derivative-free optimization • random search method; • genetic algorithm; • simulated annealing; etc.

  13. MATHEMATICAL BACKGROUND

  14. The scalar xTMx= is called a quadratic form. for all x ≠ 0 • A square matrix M is positive definiteif • It is positive semidefiniteif for all x

  15. A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0) • Proof (→): Let vi the eigenvector for the i-th eigenvalue λi • Then, • which implies λi > 0, prove that positive eigenvalues imply positive definiteness.

  16. Proof. Let’s f be defined as • If we can show that f is always positive then M must be positive definite. We can write this as • Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e. • so f must always be positive • Theorem: If a matrix M = UTU then it is positive definite

  17. f: Rn→ R is a quadratic function if • where Q is symmetric.

  18. It is no necessary for Q be symmetric. • Suposse matrix P non-symmetric Q is symmetric

  19. Suposse matrix P non-symmetric. Example Q is symmetric

  20. Given the quadratic function If Q is positive definite, then f is a parabolic “bowl.”

  21. Two other shapes can result from the quadratic form. • If Q is negative definite, then f is a parabolic “bowl” up side down. • If Q is indefinite then f describes a saddle.

  22. Quadratics are useful in the study of optimization. • Often, objective functions are “close to” quadratic near the solution. • It is easier to analyze the behavior of algorithms when applied to quadratics. • Analysis of algorithms for quadratics gives insight into their behavior in general.

  23. The derivative of f: R → R is a function f ′: R → R given by • if the limit exists.

  24. Along the Axes…

  25. In general direction…

  26. Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives • exist for each x in Rnand are continuous functions of x. • In this case, we say f C1(a smoothfunctionC1)

  27. Definition: The gradient of f: in R2→ R: It is a function ∇f: R2→ R2given by In the plane

  28. Definition: The gradient of f: Rn→ R is a function ∇f: Rn→ Rngiven by

  29. The gradient defines (hyper) plane approximating the function infinitesimally

  30. By the chain rule

  31. Proposition 1: is maximal choosing intuitive: the gradient points at the greatest change direction Prove it!

  32. Proof: • Assign: • by chain rule:

  33. Proof: • On the other hand for general v:

  34. Proposition 2: let f: Rn→ R be a smooth function C1 around p, • if f has local minimum (maximum) at p then, Intuitive: necessary for local min(max)

  35. Proof: intuitive

  36. We found the best INFINITESIMAL DIRECTION at each point, • Looking for minimum: “blind man” procedure • How can we derive the way to the minimum using this knowledge?

  37. The gradient of f: Rn→ Rmis a function Df: Rn→ Rm×ngiven by called Jacobian Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.

  38. If the derivative of ∇f exists, we say that f is twice differentiable. • Write the second derivative as D2f (or F), and call it the Hessianof f.

  39. The level set of a function f: Rn→ R at level c is the set of points S = {x: f(x) = c}.

  40. Fact: ∇f(x0) is orthogonal to the level set at x0

  41. Proof of fact: • Imagine a particle traveling along the level set. • Let g(t) be the position of the particle at time t, with g(0) = x0. • Note that f(g(t)) = constant for all t. • Velocity vector g′(t) is tangent to the level set. • Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule, • Hence, ∇f(x0) and g′(0) are orthogonal.

  42. Suppose f: R → R is in C1. Then, • o(h) is a term such that o(h) = h → 0 as h → 0. • At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.

  43. Suppose f: R → R is in C2. Then, • At x0, f can be approximated by a quadratic function.

  44. Suppose f: Rn→ R. • If f in C1, then • If f in C2, then

  45. We already know that ∇f(x0) is orthogonal to the level set at x0. • Suppose ∇f(x0) ≠ 0. • Fact: ∇f points in the direction of increasing f.

  46. Consider xα = x0 + α∇f(x0), α > 0. • By Taylor's formula, • Therefore, for sufficiently small , f(xα) > f(x0)

  47. DESCENT METHODS

  48. This theorem is the link from the previous gradient properties to the constructive algorithm. • The problem:

  49. We introduce a model for algorithm: Data Step 0: set i = 0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1

More Related