Information Geometry:
1 / 24

Information Geometry: Duality, Convexity, and Divergences - PowerPoint PPT Presentation

  • Uploaded on

Information Geometry: Duality, Convexity, and Divergences. Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 *Currently on leave to AFOSR under IPA. Clarify two senses of duality in information geometry:. Reference duality:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Information Geometry: Duality, Convexity, and Divergences' - phineas

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Information Geometry:

Duality, Convexity, and Divergences

Jun Zhang*

University of Michigan

Ann Arbor, Michigan 48104

*Currently on leave to AFOSR under IPA

Slide2 l.jpg

Clarify two senses of duality in information geometry:

Reference duality:

choice of the reference vs comparison point on the manifold;

Representational duality:

choice of a monotonic scaling of density function;

Lecture Plan

  • A revisit to Bregman divergence

  • Generalization (a-divergence on Rn) and a-Hessian geometry

3) Embedding into infinite-dimensional function space

4) Generalized Fish metric and a-connection on Banach space

Slide3 l.jpg

Bregman Divergence

i) Quadri-lateral relation:

Triangular relation (generalized cosine) as a special case:

ii) Reference-representation biduality:

Slide4 l.jpg

Canonical Divergence and Fenchel Inequality

An alternative expression of Bregman divergence is canonical divergence

or explicitly:

That A is non-negative is a direct consequence of the Fenchel inequality

for a strictly convex function:

where equality holds if and only if

Slide5 l.jpg

Convex Inequality and a-Divergence Induced by it

By the definition of a strictly convex function F,

It is easy to show that the following is non-negative for all ,


Easily verifiable:

Slide6 l.jpg


For a smooth function F:Rn -> R, the following are equivalent:

Significance of Bregman Divergence

Among a-Divergence Family

Slide7 l.jpg

Statistical Manifold Structure Induced From

Divergence Function (Eguchi, 1983)

Given a divergence D(x,y), with D(x,x)=0. One can then derive

the Riemannian metric and a pair of conjugate connections:

Expanding D(x,y) around x=y:

In essence,

is satisfied by such

identification of

derivatives of D.

i) 2nd order: one (and the same) metric

ii) 3rd order: a pair of conjugated connections

Slide8 l.jpg

i) The metric and conjugate affine connections are given by:

ii) Riemann curvature is given by:

a-Hessian Geometry (of Finite-Dimension Vector Space)

Theorem. D(a) induces the a-Hessian manifold, i.e.

Slide9 l.jpg

iii) The manifold is equi-affine, with the Tchebychev potential given by:

and a-parallel volume form given by

iv) There exists biorthogonal coordinates:


Slide10 l.jpg

A General Divergence Function(al) potential given by:

From Vector Space to Function Space

Question: How to extend the above analysis to infinite-dimensional

function space?

for any two functions in some function space, and an arbitrary, strictly

increasing function .

Remark: Induced by convex inequality

Slide11 l.jpg

A Special Case of D potential given by: (a): Classic a-Divergence

For parameterized pdf’s, such divergence induces an a-independent metric,

but a-dependent dual connections:

Other examples of d a l.jpg
Other Examples of potential given by: D(a)

Jensen Difference

U-Divergence (a=1)

Slide13 l.jpg

A Short Detour: Monotone Scaling potential given by:

Define monotone embedding (“scaling”) of a measurable function p as the

transformation r(p), where

is a strictly monotone function.

Therefore, monotone embeddings of a given probability density function

form a group, with functional composition as group operation:


i) r is strictly monotone iff r-1 is strictly monotone;

ii) r(t) = t as the identity element;

We recall that for a strictly convex function f :

iii) r1, r2 are strictly monotone, so is

Slide14 l.jpg

DEFINITION potential given by: : r-embedding is said to be conjugated to t-embedding with

respect to a strictly convex function f (whose conjugate is f*) if :

Example: a-embedding

Slide15 l.jpg

A sub-manifold is said to be potential given by: r-affine if there exists a countable set of linearly

independent functions li(z) over a measurable space such that:

Here, q is called the “natural parameter”. The “expectation parameter” is

defined by projecting the conjugated t-embedding onto the li(z):

Example: For log-linear model (exponential family)

The expectation parameter is:

Parameterized Functions as Forming

a Submanifold under Monotone Scaling

Slide16 l.jpg

i) The following potential function is strictly convex: potential given by:

F(q) is called the generating (partition) functional.

ii) Define, under the conjugate representations

then is Fenchel conjugate of .

F*(h) is called the generalized entropy functional.

Proposition. For the r-affine submanifold:

Theorem. The r-affine submanifoldis a-Hessian manifold.

Slide17 l.jpg

An Application: the ( potential given by: a,b)-Divergence

Take f=r-(b), where:

called “alpha-embedding”,

now denoted by b.

a: parameter reflecting reference duality

b: parameter reflecting representation duality

They reduce to a-divergence proper A(a) and to Jensen difference E(a):

Slide18 l.jpg

Proposition 1 potential given by: . Denote tangent vector fields which are,

at given p on the manifold, themselves functions in Banach space. The metric

and dual connections induced by take the forms:

Written in dually

symmetric form:

Information Geometry on Banach Space

Slide19 l.jpg

Corollary 1a potential given by: . For a finite-dimensional submanifold (parametric model), with

The metric and dual connections associated with are given by:


Remark: Choosing reduces to the forms of Fisher

metric and the a-connections in classical parametric information geometry, where

Slide20 l.jpg

  • Remark: potential given by: The ambient space B is flat, so it embeds, as proper submanifolds,

  • the manifold Mmof probability density functions (constrained to be

  • positive-valued and normalized to unit measure);

  • the finite-dimensional manifold Mqof parameterized probability models.



B(ambient manifold)

Proposition 2. The curvature R(a) and torsion tensors T(a)associated with

any a-connection on the infinite-dimensional function space Bare identically zero.

CAVEAT: Topology? (G. Pistone and his colleagues)

Slide21 l.jpg

Proposition 3 potential given by: . The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper:

Remark: The (a,b)-divergence is the homogeneous f-divergence

As such, it should reproduce the standard Fisher metric and the dual alpha-

connections in their proper form. Again, it is the ab that takes the role of

the conventional “alpha” parameter.

Slide22 l.jpg

Summary of Current Approach potential given by:



equiv to d-divergence (Zhu & Rohwer, 1985)

includes KL divergence as a special case

f-divergence (Csiszar)

Bregman divergence

equivalent to the canonical divergence

U-divergence (Eguchi)


Riemannian metric

Fisher information

Conjugate connections

a-connection family

Equi-affine structure

cubic form, Tchebychev 1-form


Convex-based a-divergence for

vector space of finite dim

function space of infinite dim

Generalized expressions of

Fisher metric


Slide23 l.jpg

References potential given by:

Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195.

Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ.

Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65.

Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67).

Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170.

Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics.

Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

Slide24 l.jpg

Questions? potential given by: