Create Presentation
Download Presentation

Download Presentation
## Information Geometry: Duality, Convexity, and Divergences

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Information Geometry:**Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA**Clarify two senses of duality in information geometry:**Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function; Lecture Plan • A revisit to Bregman divergence • Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space**Bregman Divergence**i) Quadri-lateral relation: Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:**Canonical Divergence and Fenchel Inequality**An alternative expression of Bregman divergence is canonical divergence or explicitly: That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if**Convex Inequality and a-Divergence Induced by it**By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:**Proposition:**For a smooth function F:Rn -> R, the following are equivalent: Significance of Bregman Divergence Among a-Divergence Family**Statistical Manifold Structure Induced From**Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: Expanding D(x,y) around x=y: In essence, is satisfied by such identification of derivatives of D. i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections**i) The metric and conjugate affine connections are given by:**ii) Riemann curvature is given by: a-Hessian Geometry (of Finite-Dimension Vector Space) Theorem. D(a) induces the a-Hessian manifold, i.e.**iii) The manifold is equi-affine, with the Tchebychev**potential given by: and a-parallel volume form given by iv) There exists biorthogonal coordinates: with**A General Divergence Function(al)**From Vector Space to Function Space Question: How to extend the above analysis to infinite-dimensional function space? for any two functions in some function space, and an arbitrary, strictly increasing function . Remark: Induced by convex inequality**A Special Case of D(a): Classic a-Divergence**For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:**Other Examples ofD(a)**Jensen Difference U-Divergence (a=1)**A Short Detour: Monotone Scaling**Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: Observe: i) r is strictly monotone iff r-1 is strictly monotone; ii) r(t) = t as the identity element; We recall that for a strictly convex function f : iii) r1, r2 are strictly monotone, so is**DEFINITION: r-embedding is said to be conjugated to**t-embedding with respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding**A sub-manifold is said to be r-affine if there exists a**countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is: Parameterized Functions as Forming a Submanifold under Monotone Scaling**i) The following potential function is strictly convex:**F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of . F*(h) is called the generalized entropy functional. Proposition. For the r-affine submanifold: Theorem. The r-affine submanifoldis a-Hessian manifold.**An Application: the (a,b)-Divergence**Take f=r-(b), where: called “alpha-embedding”, now denoted by b. a: parameter reflecting reference duality b: parameter reflecting representation duality They reduce to a-divergence proper A(a) and to Jensen difference E(a):**Proposition 1. Denote tangent vector fields**which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form: Information Geometry on Banach Space**Corollary 1a. For a finite-dimensional submanifold**(parametric model), with The metric and dual connections associated with are given by: with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where**Remark: The ambient space B is flat, so it embeds, as proper**submanifolds, • the manifold Mmof probability density functions (constrained to be • positive-valued and normalized to unit measure); • the finite-dimensional manifold Mqof parameterized probability models. Mq Mm B(ambient manifold) Proposition 2. The curvature R(a) and torsion tensors T(a)associated with any a-connection on the infinite-dimensional function space Bare identically zero. CAVEAT: Topology? (G. Pistone and his colleagues)**Proposition 3. The (a,b)-divergence for the parametric**models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.**Summary of Current Approach**Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections**References**Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195. Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65. Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170. Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.