Information Geometry: Duality, Convexity, and Divergences

# Information Geometry: Duality, Convexity, and Divergences

## Information Geometry: Duality, Convexity, and Divergences

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Information Geometry: Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA

2. Clarify two senses of duality in information geometry: Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function; Lecture Plan • A revisit to Bregman divergence • Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space

3. Bregman Divergence i) Quadri-lateral relation: Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:

4. Canonical Divergence and Fenchel Inequality An alternative expression of Bregman divergence is canonical divergence or explicitly: That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if

5. Convex Inequality and a-Divergence Induced by it By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:

6. Proposition: For a smooth function F:Rn -> R, the following are equivalent: Significance of Bregman Divergence Among a-Divergence Family

7. Statistical Manifold Structure Induced From Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: Expanding D(x,y) around x=y: In essence, is satisfied by such identification of derivatives of D. i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections

8. i) The metric and conjugate affine connections are given by: ii) Riemann curvature is given by: a-Hessian Geometry (of Finite-Dimension Vector Space) Theorem. D(a) induces the a-Hessian manifold, i.e.

9. iii) The manifold is equi-affine, with the Tchebychev potential given by: and a-parallel volume form given by iv) There exists biorthogonal coordinates: with

10. A General Divergence Function(al) From Vector Space to Function Space Question: How to extend the above analysis to infinite-dimensional function space? for any two functions in some function space, and an arbitrary, strictly increasing function . Remark: Induced by convex inequality

11. A Special Case of D(a): Classic a-Divergence For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:

12. Other Examples ofD(a) Jensen Difference U-Divergence (a=1)

13. A Short Detour: Monotone Scaling Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: Observe: i) r is strictly monotone iff r-1 is strictly monotone; ii) r(t) = t as the identity element; We recall that for a strictly convex function f : iii) r1, r2 are strictly monotone, so is

14. DEFINITION: r-embedding is said to be conjugated to t-embedding with respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding

15. A sub-manifold is said to be r-affine if there exists a countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is: Parameterized Functions as Forming a Submanifold under Monotone Scaling

16. i) The following potential function is strictly convex: F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of . F*(h) is called the generalized entropy functional. Proposition. For the r-affine submanifold: Theorem. The r-affine submanifoldis a-Hessian manifold.

17. An Application: the (a,b)-Divergence Take f=r-(b), where: called “alpha-embedding”, now denoted by b. a: parameter reflecting reference duality b: parameter reflecting representation duality They reduce to a-divergence proper A(a) and to Jensen difference E(a):

18. Proposition 1. Denote tangent vector fields which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form: Information Geometry on Banach Space

19. Corollary 1a. For a finite-dimensional submanifold (parametric model), with The metric and dual connections associated with are given by: with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where

20. Remark: The ambient space B is flat, so it embeds, as proper submanifolds, • the manifold Mmof probability density functions (constrained to be • positive-valued and normalized to unit measure); • the finite-dimensional manifold Mqof parameterized probability models. Mq Mm B(ambient manifold) Proposition 2. The curvature R(a) and torsion tensors T(a)associated with any a-connection on the infinite-dimensional function space Bare identically zero. CAVEAT: Topology? (G. Pistone and his colleagues)

21. Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.

22. Summary of Current Approach Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections

23. References Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195. Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65. Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170. Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

24. Questions?