Econ 600: Mathematical Economics

Econ 600: Mathematical Economics July/August 2006 Stephen Hutton

Why optimization? • Almost all economics is about solving constrained optimization problems. Most economic models start by writing down an objective function. • Utility maximization, profit maximization, cost minimization, etc. • Static optimization: most common in microeconomics • Dynamic optimization: most common in macroeconomics

My approach to course • Focus on intuitive explanation of most important concepts, rather than formal proofs. • Motivate with relevant examples • Practice problems and using tools in problem sets • Assumes some basic math background (people with strong background might not find course useful) • For more details, see course notes, textbooks, future courses • Goal of course: introduction to these concepts

Order of material • Course will skip around notes a bit during the static course; specifically, I’ll cover the first half of lecture 1, then give some definitions from lecture 3, then go back to lecture 1 and do the rest in order. • Sorry! 

Why not basic optimization? • Simplest method of unconstrained optimization (set deriv = 0) often fails • Might not identify the optima, or optima might not exist • Solution unbounded • Function not always differentiable • Function not always continuous • Multiple local optima

Norms and Metrics • It is useful to have some idea of “distance” or “closeness” in vector space • The most common measure is Euclidean distance; this is sufficient for our purposes (dealing with n-dimensional real numbers) • General requirements of norm: anything that satisfies conditions 1), 2), 3) (see notes)

Continuity • General intuitive sense of continuity (no gaps or jumps). Whenever x is close to x’, f(x) is close to f(x’) • Formal definitions:A sequence of elements, {xn} is said to converge to a point, x in Rn if for every  > 0 there is a number, N such that for all n < N, ||xn-x|| < . • A function f:RnRn is continuous at a point, x if for ALL sequences {xn} converging to x, the derived sequence of points in the target space {f((xn)} converges to the point f(x). • A function is continuous if it is continuous at all points in its domain. • What does this mean in 2d? Sequence of points converging from below, sequence of points converging from above. Holds true in higher levels of dimensionality.

Continuity 2 • Why continuity? Needed to guarantee existence of solution • So typically assume continuity on functions to guarantee (with other assumptions) that a solution to the problem exists • Sometimes continuity is too strong. To guarantee a maximum, upper semi-continuity is enough. To guarantee a minimum, lower semi-continuity • Upper semi-continuity: For all xnx, limnf(xn) ≤ f(x) • Lower semi-continuity: For all xnx, limnf(xn)f(x) • Note that if these hold with equality, we have continuity. • Note, figure 6 in notes is wrong

Open sets(notes from lecture 3) • For many set definitions and proofs we use the concept of an open ball of arbitrarily small size. • An open ball is a set of points (or vectors) within a given distance from a particular point (or vector). Formally:Let ε be a small real number. Bε(x)={y| ||x-y||< ε}. • A set of points S in Rn is open if for all points in S, there exists an open ball that is entirely contained within S. Eg (1,2) vs (1,2]. • Any union of open sets is open. • Any finite intersection of open sets is open.

Interior, closed set(notes in lecture 3) • The interior of a set S is the largest open set contained in S. Formally, Int(S) = UiSi where Si is an open subset of S. • If S is open, Int(S)=S • A set is closed if all sequences within the set converge to points within the set. Formally, fix a set S and let {xm} be any sequence of elements in S. If limmxm=r where r is in S, for all convergent sequences in S, then S is closed. • S is closed if and only if SC is open.

Boundary, bounded, compact(notes in lecture 3) • The boundary of a set S [denoted B(S)] is the set of points such that for all ε>0, Bε(x)∩S is not empty and Bε(x)∩SC is not empty. Ie any open ball contains points both in S and not in S. • If S is closed, S=B(S) • A set S is bounded if the distance between all objects in the set is finite. • A set is compact if it is closed and bounded. • These definitions correspond to their commonsense interpretations.

Weierstrass’s Theorem(notes in lecture 3) • Gives us a sufficient condition to ensure that a solution to a constrained optimization problem exists. If the constraint set C is compact and the function f is continuous, then there always exists at least one solution tomax f(x) s.t. x is in C • Formally: Let f:RnR be continuous. If C is a compact subset of Rn, then there exists x* in C, y* in C s.t. f(x*)f(x)f(y*) for all x in C.

Vector geometry • Want to extend intuition about slope = 0 idea of optimum to multiple dimensions. We need some vector tools to do this • Inner product: x·y=(x1y1+x2y2+…+xnyn) • Euclidean norm and inner product related: ||x||2=x·x • Two vectors are orthogonal (perpendicular) if x·y = 0. • Inner product of two vectors v, w is v’w in matrix notation. • v’w > 0 then v, w form acute angle • v’w < 0 then v, w form obtuse angle. • v’w = 0 then v, w orthogonal.

Linear functions • A function f:VW is linear if for any two real numbers a,b and any two elements v,v’ in V we have f(av+bv’) = af(v)+bf(v’) • Note that our usual interpretation of “linear” functions in R1 (f(x)=mx+b) are not generally linear, these are affine. (Only linear if b=0). • Every linear function defined on Rn can be represented by an n-dimensional vector (f1,f2,…fn) with the feature that f(x) = Σfixi • Ie value of function at x is inner product of defining vector with x. • [Note, in every situation we can imagine dealing with, functionals are also functions.]

Hyperplanes • A hyperplane is the set of points given by {x:f(x)=c} where f is a linear functional and c is some real number. • Eg1: For R2 a typical hyperplane is a straight line. • Eg2: For R3 a typical hyperplane is a plane. • Think about a hyperplane as one of the level sets of the linear functional f. As we vary c, we change level sets. • The defining vector of f(x) is orthogonal to the hyperplane.

Separating Hyperplanes • A half-space is the set of points on one side of a hyperplane. Formally: HS(f) = {x:f(x)c} or HS(f)= {x:f(x)≤c}. • Consider any two disjoint sets: when can we construct a hyperplane that separates the sets? • Examples in notes. • If C lies in a half-space defined by H and H contains a point on the boundary of C, then H is a supporting hyperplane of C.

Convex sets • A set is convex if the convex combination of all points in a set is also in the set. • No such thing as a concave set. Related but different idea to convex/concave functions. • Formally: a set C in Rn is convex if for all x, y in C, for all  between [0,1] we have x+(1-)y is in C. • Any convex set can be represented as intersection of halfspaces defined by supporting hyperplanes. • Any halfspace is a convex set.

Separating Hyperplanes 2 • Separating hyperplane theorem: Suppose X, Y are non-empty convex sets in Rn such that the interior of Y∩X is empty and the interior of Y is not empty.Then there exists a vector a in Rn which is the defining vector of a separating hyperplane between X and Y.Proof: in texts. • Applications: general equilibrium theory, second fundamental theorem of welfare economics. Conditions where a pareto optimum allocation can be supported as a price equilibrium. Need convex preferences to be able to guarantee that there is a price ratio (a hyperplane) that can sustain an equilibrium.

Graphs • The graph is what you normally see when you plot a function. • Formally: the graph of a function from V to W is the ordered pair of elements,

Derivatives • We already know from basic calculus that a necessary condition for x* to be an unconstrained maximum of a function f is that its derivative be zero (if the derivative exists) at x*. • A derivative tells us something about the slope of the graph of the function. • We can also think about the derivative as telling us the slope of the supporting hyperplane to the graph of f at the point (x,f(x)).(see notes)

Multidimensional derivativesand gradients • We can extend what we know about derivatives from single-dimensional space to multi-dimensional space directly. • The gradient of f at x is just the n-dimensional (column) vector which lists all the partial derivatives if they exist. • This nx1 matrix is also known as the Jacobian. • The derivative of f is the transpose of the gradient. • The gradient can be interpreted as a supporting hyperplane of the graph of f.

Second order derivatives • We can think about the second derivative of multidimensional functions directly as in the single dimension case. • The first derivative of the function f was an nx1 vector; the second derivative is an nxn matrix known as the Hessian. • If f is twice continuously differentiable (ie all elements of Hessian exist) then the Hessian matrix is symmetric (second derivatives are irrespective of order).

Homogeneous functions • Certain functions in Rn are particularly well-behaved and have useful properties that we can exploit without having to prove them every time. • A function f:RnR is homogeneous of degree k if f(tx1,tx2,….,tkf(x).In practice we will deal with homogeneous functions of degree 0 and degree 1.Eg: demand function is homog degree 0 in prices (in general equilibrium) or in prices and wealth: double all prices and income has no impact on demand. • Homogeneous functions allow us to determine the entire behavior of the function from only knowing about the behavior in a small ball around the originWhy? Because for any point x’, we can define x’ as a scalar multiple of some point x it that ball, so x’=tx • If k=1 we say that f is linearly homogeneous. • Euler’s theorem: if f is h.o.d. k then

Homogenous functions 2 • A ray through x is the line (or hyperplane) running through x and the origin running forever in both directions.Formally: a ray is the set {x’ in Rn|x’=tx, for t in R} • The gradient of a homogenous function is the essentially the same along any ray (linked by a scalar multiple). Ie the gradient at x’ is linearly dependent with the gradient at x’.Thus level sets along any ray have the same slope.Application: homogeneous utility functions rule out income effects in demand. (At constant prices, consumers demand goods in the same proportion as income changes.)

Homothetic functions • A function f:R+nR+ is homothetic if f(x)=h(v(x)) where h:R+R+ is strictly increasing and v:R+R+ is h.o.d. k. • Application: we often assume that preferences are homothetic. This gives that indifference sets are related by proportional expansion along rays. • This means that we can deduce the consumer’s entire preference relation from a single indifference set.

More properties of gradients(secondary importance) • Consider a continuously differentiable function, f:RnR. The gradient of f (Df(x)) is a vector in Rn which points in the direction of greatest increase of f moving from the point x. • Define a (very small) vector v s.t. Df(x)’v=0 (ie v is orthogonal to the gradient). Then the vector v is moving us away from x in a direction that adds zero to the value of f(x). Thus, any points on the vector v are at the same level of f(x). So we have a method of finding the level sets of f(x) – by solving Df(x)’v=0. Also, v is tangent to the level set of f(x). • The direction of greatest increase of a function at a point x is at right angles to the level set at x.

Upper contour sets • The level sets of a function are the set of points which yield the same value of the function. Formally, for f:RnR the level set is {x:f(x)=c}Eg: indifference curves are level sets of utility functions. • The upper contour set is the set of points above the level set, ie the set {x:f(x) c}.

Concave functions • For any two points, we can trace out the line of points joining them through tx+(1-t)y, varying t between 0 and 1. This is a convex combination of x and y. • A function is concave if for all x, y:ie line joining any two points is (weakly) less than the graph of the function between those two points • A function is strictly concave if the inequality is strict for all x,y.

Convex functions • A function is convex if for all x, y:ie line joining any two points is (weakly) greater than the graph of the function between the points. • A function is strictly convex if the inequality is strict for all x,y. • A function f is convex if –f is concave. • The upper contour set of a convex function is a convex set. The lower contour set of a concave function is a convex set.

Concavity, convexity and second derivatives • If f:RR and f is C2, then f is concave iff f’’(x)≤0 for all x. (And strictly concave for strict inequality). • If f:RR and f is C2, then f is convex iff f’’(x)0 for all x. (And strictly convex for strict inequality).

Concave functions and gradients • Any concave function lies below its gradient (or below its subgradient if f is not C1). • Any convex function lies above its gradient (or above subgradient if f is not C1. • Graphically: function lies below/above line tangent to graph at any point.

Negative and positive (semi-) definite • Consider any square symmetric matrix A. • A is negative semi-definite if x’Ax≤0 for all x.If in addition x’Ax=0 implies that x=0, then A is negative definite. • A is positive semi-definite if x’Ax0 for all x.If in addition x’Ax=0 implies that x=0, then A is positive definite.

Principal minors and nsd/psd • Let A be a square matrix. The k’th order leading principal minor of A is the determinant of the kxk matrix obtained by deleting the last n-k rows and columns. • An nxn square symmetric matrix is positive definite if its n leading principal minors are strictly positive. • An nxn square symmetric matrix is negative definite if its n leading principal minors are alternate in sign with a11 < 0. • [There are conditions for getting nsd/psd from principal minors.]

Reminder: determinant of a 3x3 matrix • You won’t have to take the determinant of a matrix bigger than 3x3 without a computer, but for 3x3:

Concavity/convexity and nd/pd • Any ease way to identify if a function is convex or concave is from the Hessian matrix. • Suppose f:RnR is C2. Then: • f is strictly concave iff the Hessian matrix is negative definite for all x. • f is concave iff the Hessian matrix is negative semi-definite for all x. • f is strictly convex iff the Hessian matrix is positive definite for all x. • f is convex iff the Hessian matrix is positive semi-definite for all x.

Quasi-concavity • A function is quasi-concave if f(tx + (1-t)y)min{f(x),f(y)} for x,y in Rn, 0≤t≤1 • Alternatively: a function is quasi-concave if its upper contour sets are convex sets. • A function is strictly quasi-concave if in addition f(tx + (1-t)y)=min{f(x),f(y)} for 0<t<1 implies that x=y • All concave functions are quasi-concave (but not vice versa). • [Why quasi-concavity? Strictly quasi-concave functions have a unique maximum.]

Quasi-convexity • A function is quasi-convex if f(tx + (1-t)y) ≤max{f(x),f(y)} for x,y in Rn, 0≤t≤1 • Alternatively: a function is convex if its lower contour sets are convex sets. • A function is strictly quasi-convex if in addition f(tx + (1-t)y)=max{f(x),f(y)} for 0<t<1 implies that x=y • All convex functions are quasi-convex (but not vice versa). • [Why quasi-convexity? Strictly quasi-convex functions have a unique miniumum.]

Bordered Hessian • The bordered hessian matrix H is just the hessian matrix next to the Jacobian and its transpose: • If the leading principal minors of H from k=3 onwards alternate in sign with the first lpm>0, then f is quasi-concave. If they are all negative, then f is quasi-convex.

Concavity and monotonic transformations • (Not in the lecture notes, but useful for solving some of the problem set problems). • The sum of two concave functions is concave (proof in PS2). • Any monotonic transformation of a concave function is quasiconcave (though not necessarily concave). Formally, if h(x)=g(f(x)), where f(x) is concave and g(x) is monotonic, then h(x) is quasi-concave. • Useful trick: the ln(x) function is a monotonic transformation.

Unconstrained optimization • If x* is a solution to the problem maxxf(x), x is in Rn, what can we say about characteristics of x*? • A point x is a global maximum of f if for all x’ in Rn, f(x)f(x’). • A point x is a local maximum of f if there exists an open ball of positive radius around x, Bε(x) s.t. for all x’ in the ball, f(x)  f(x’). • If x is a global maximum then it is a local maximum (but not necessarily vice versa). • If f is C1, then if f is a local maximum of f, then the gradient of f at x = 0. [Necessary but not sufficient.]This is the direct extension of the single dimension case.

Unconstrained optimization 2 • If x is a local maximum of f, then there is an open ball around x, Bε(x) s.t. f is concave on Bε(x). • If x is a local minimum of f, then there is an open ball around x, Bε(x) s.t. f is convex on Bε(x). • Suppose f is C2. If x is a local maximum, then the Hessian of f at x is negative semi-definite. • Suppose f is C2. If x is a local minimum, then the Hessian of f at x is positive semi-definite. • To identify a global max, we either solve for all local maxima and then compare them, or look for additional features on f that guarantee that any local max are global.

Unconstrained optimization 3 • If f: RnR is concave and C1, then Df(x)=0 implies that x is a global maximum of f. (And x being a global maximum implies that the gradient is zero.) This is both a necessary and sufficient condition. • In general, we only really look at maximization, since all minimization problems can be turned into maximization problems by looking at –f. • x solves max f(x) if and only if x solves min f(x).

Non-differentiable functions(secondary importance) • In economics, we rarely have to deal with non-differentiable functions; normally we assume these away. • The superdifferential of a concave function f at a point x is the set of all supporting hyperplanes of the graph of f at the point (x*,f(x*)). • A supergradient of a function f at a point x* is an element of the superdiffential of f at x*. • If x* is an unconstrained local maximum of a function f:RnR, then the vector of n zeros must be an element of the superdifferential of f at x*. • [And equivalently subdifferential, subgradient, local minimum for convex functions.]

Constrained optimization • General form of constrained optimization • Normally we write the constraint by writing out restrictions (eg x 1) rather than using set notation. • Sometimes (for equality constraints) it is more convenient to solve problems by substituting the constraint(s) into the objective function, and so solving an unconstrained optimization problem. • Most common restrictions: equality or inequality constraints. • Eg: Manager trying to induce worker to provide optimal effort (moral hazard contract).

Constrained optimization 2 • No reason why can only have one restriction. Can have any number of constraints, which may be of any form. Most typically we use equality and inequality constraints; these are easier to solve analytically than constraints that x belong to some general set. • These restrictions define the constraint set. • Most general notation, while using only inequality constraints: where G(x) is a mx1 vector of inequality constraints (m is number of constraints). • Eg: For the restrictions 3x1+x2≤10, x12, we have:

Constrained optimization 3 • We will need limitations on the constraint set to guarantee solution of existence (Weierstrass’ theorem). • What can happen if constraint set not convex, closed? (examples) • Denoting constraint sets:characterizes all values of x in Rn where f(x)  c

General typology of constrained maximization • Unconstrained maximization. C is just the whole vector space that x lies in (usually Rn). We know how to solve these. • Lagrange Maximization problems. Here the constraint set is defined solely by equality constraints. • Linear programming problems. Not covered in this course. • Kuhn-Tucker problems. These involve inequality constraints. Sometimes we also allow equality constraints, but we focus on inequality constraints. (Any problem with equality constraints could be transformed by substitution to deal only with inequality constraints.)

Lagrange problems • Covered briefly here, mostly to compare and contrast with Kuhn-Tucker. • Canonical Lagrange problem is of form: • Often we have a problem with inequality constraints, but we can use economic logic to show that at our solution the constraints will bind, and so we can solve the problem as if we had equality constraints. • Eg: Consumer utility maximization; if utility function is increasing in all goods, then consumer will spend all income. So budget constraint px≤w becomes px=w.

Lagrange problems 2 • Lagrange theorem: in the canonical Lagrange problem (CL) above, suppose that f and G are C1and suppose that the nxm matrix DG(x*) has rank m. Then if x* solves CL, there exists a vector λ* in Rn such that Df(x*) + DG(x*) λ*=0. Ie: • This is just a general form of writing what we know from solving Lagrange problems: we get n FOCs that all equal zero at the solution. • Rank m requirement is called “Constraint qualification”, we will come back to this with Kuhn Tucker. But this is a necessary (not sufficient) condition for the existence of Lagrange Multipliers.

Basic example: • max f(x1,x2) s.t. g1(x1,x2) = c1, g2(x1,x2)=c2 • L = f(x1,x2)+λ1(g1(x1,x2)-c1)+λ2(g2(x1,x2)-c2) • FOCs:x1: f1(x1,x2) + λ1g11(x1,x2) + λ2g21(x1,x2) =0x2: f2(x1,x2) + λ1g12(x1,x2) + λ2g22(x1,x2) =0 • Plus constraints:λ1: g1(x1,x2) – c1 = 0λ2: g2(x1,x2) – c2 = 0

Econ 600: Mathematical Economics