Information Geometry: Duality, Convexity, and Divergences - PowerPoint PPT Presentation

Slide1 l.jpg
1 / 24

  • Uploaded on
  • Presentation posted in: General

Information Geometry: Duality, Convexity, and Divergences. Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 *Currently on leave to AFOSR under IPA. Clarify two senses of duality in information geometry:. Reference duality:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Information Geometry: Duality, Convexity, and Divergences

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Slide1 l.jpg

Information Geometry:

Duality, Convexity, and Divergences

Jun Zhang*

University of Michigan

Ann Arbor, Michigan 48104

*Currently on leave to AFOSR under IPA

Slide2 l.jpg

Clarify two senses of duality in information geometry:

Reference duality:

choice of the reference vs comparison point on the manifold;

Representational duality:

choice of a monotonic scaling of density function;

Lecture Plan

  • A revisit to Bregman divergence

  • Generalization (a-divergence on Rn) and a-Hessian geometry

3) Embedding into infinite-dimensional function space

4) Generalized Fish metric and a-connection on Banach space

Slide3 l.jpg

Bregman Divergence

i) Quadri-lateral relation:

Triangular relation (generalized cosine) as a special case:

ii) Reference-representation biduality:

Slide4 l.jpg

Canonical Divergence and Fenchel Inequality

An alternative expression of Bregman divergence is canonical divergence

or explicitly:

That A is non-negative is a direct consequence of the Fenchel inequality

for a strictly convex function:

where equality holds if and only if

Slide5 l.jpg

Convex Inequality and a-Divergence Induced by it

By the definition of a strictly convex function F,

It is easy to show that the following is non-negative for all ,


Easily verifiable:

Slide6 l.jpg


For a smooth function F:Rn -> R, the following are equivalent:

Significance of Bregman Divergence

Among a-Divergence Family

Slide7 l.jpg

Statistical Manifold Structure Induced From

Divergence Function (Eguchi, 1983)

Given a divergence D(x,y), with D(x,x)=0. One can then derive

the Riemannian metric and a pair of conjugate connections:

Expanding D(x,y) around x=y:

In essence,

is satisfied by such

identification of

derivatives of D.

i) 2nd order: one (and the same) metric

ii) 3rd order: a pair of conjugated connections

Slide8 l.jpg

i) The metric and conjugate affine connections are given by:

ii) Riemann curvature is given by:

a-Hessian Geometry (of Finite-Dimension Vector Space)

Theorem. D(a) induces the a-Hessian manifold, i.e.

Slide9 l.jpg

iii) The manifold is equi-affine, with the Tchebychev potential given by:

and a-parallel volume form given by

iv) There exists biorthogonal coordinates:


Slide10 l.jpg

A General Divergence Function(al)

From Vector Space to Function Space

Question: How to extend the above analysis to infinite-dimensional

function space?

for any two functions in some function space, and an arbitrary, strictly

increasing function .

Remark: Induced by convex inequality

Slide11 l.jpg

A Special Case of D(a): Classic a-Divergence

For parameterized pdf’s, such divergence induces an a-independent metric,

but a-dependent dual connections:

Other examples of d a l.jpg

Other Examples ofD(a)

Jensen Difference

U-Divergence (a=1)

Slide13 l.jpg

A Short Detour: Monotone Scaling

Define monotone embedding (“scaling”) of a measurable function p as the

transformation r(p), where

is a strictly monotone function.

Therefore, monotone embeddings of a given probability density function

form a group, with functional composition as group operation:


i) r is strictly monotone iff r-1 is strictly monotone;

ii) r(t) = t as the identity element;

We recall that for a strictly convex function f :

iii) r1, r2 are strictly monotone, so is

Slide14 l.jpg

DEFINITION: r-embedding is said to be conjugated to t-embedding with

respect to a strictly convex function f (whose conjugate is f*) if :

Example: a-embedding

Slide15 l.jpg

A sub-manifold is said to be r-affine if there exists a countable set of linearly

independent functions li(z) over a measurable space such that:

Here, q is called the “natural parameter”. The “expectation parameter” is

defined by projecting the conjugated t-embedding onto the li(z):

Example: For log-linear model (exponential family)

The expectation parameter is:

Parameterized Functions as Forming

a Submanifold under Monotone Scaling

Slide16 l.jpg

i) The following potential function is strictly convex:

F(q) is called the generating (partition) functional.

ii) Define, under the conjugate representations

then is Fenchel conjugate of .

F*(h) is called the generalized entropy functional.

Proposition. For the r-affine submanifold:

Theorem. The r-affine submanifoldis a-Hessian manifold.

Slide17 l.jpg

An Application: the (a,b)-Divergence

Take f=r-(b), where:

called “alpha-embedding”,

now denoted by b.

a: parameter reflecting reference duality

b: parameter reflecting representation duality

They reduce to a-divergence proper A(a) and to Jensen difference E(a):

Slide18 l.jpg

Proposition 1. Denote tangent vector fields which are,

at given p on the manifold, themselves functions in Banach space. The metric

and dual connections induced by take the forms:

Written in dually

symmetric form:

Information Geometry on Banach Space

Slide19 l.jpg

Corollary 1a. For a finite-dimensional submanifold (parametric model), with

The metric and dual connections associated with are given by:


Remark: Choosing reduces to the forms of Fisher

metric and the a-connections in classical parametric information geometry, where

Slide20 l.jpg

  • Remark: The ambient space B is flat, so it embeds, as proper submanifolds,

  • the manifold Mmof probability density functions (constrained to be

  • positive-valued and normalized to unit measure);

  • the finite-dimensional manifold Mqof parameterized probability models.



B(ambient manifold)

Proposition 2. The curvature R(a) and torsion tensors T(a)associated with

any a-connection on the infinite-dimensional function space Bare identically zero.

CAVEAT: Topology? (G. Pistone and his colleagues)

Slide21 l.jpg

Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper:

Remark: The (a,b)-divergence is the homogeneous f-divergence

As such, it should reproduce the standard Fisher metric and the dual alpha-

connections in their proper form. Again, it is the ab that takes the role of

the conventional “alpha” parameter.

Slide22 l.jpg

Summary of Current Approach



equiv to d-divergence (Zhu & Rohwer, 1985)

includes KL divergence as a special case

f-divergence (Csiszar)

Bregman divergence

equivalent to the canonical divergence

U-divergence (Eguchi)


Riemannian metric

Fisher information

Conjugate connections

a-connection family

Equi-affine structure

cubic form, Tchebychev 1-form


Convex-based a-divergence for

vector space of finite dim

function space of infinite dim

Generalized expressions of

Fisher metric


Slide23 l.jpg


Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195.

Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ.

Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65.

Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67).

Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170.

Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics.

Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

Slide24 l.jpg


  • Login