Loading in 2 Seconds...
Loading in 2 Seconds...
Information Geometry: Duality, Convexity, and Divergences. Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 [email protected] *Currently on leave to AFOSR under IPA. Clarify two senses of duality in information geometry:. Reference duality:
choice of the reference vs comparison point on the manifold;
choice of a monotonic scaling of density function;
3) Embedding into infinite-dimensional function space
4) Generalized Fish metric and a-connection on Banach space
i) Quadri-lateral relation:
Triangular relation (generalized cosine) as a special case:
ii) Reference-representation biduality:
An alternative expression of Bregman divergence is canonical divergence
That A is non-negative is a direct consequence of the Fenchel inequality
for a strictly convex function:
where equality holds if and only if
By the definition of a strictly convex function F,
It is easy to show that the following is non-negative for all ,
For a smooth function F:Rn -> R, the following are equivalent:
Significance of Bregman Divergence
Among a-Divergence Family
Divergence Function (Eguchi, 1983)
Given a divergence D(x,y), with D(x,x)=0. One can then derive
the Riemannian metric and a pair of conjugate connections:
Expanding D(x,y) around x=y:
is satisfied by such
derivatives of D.
i) 2nd order: one (and the same) metric
ii) 3rd order: a pair of conjugated connections
ii) Riemann curvature is given by:
a-Hessian Geometry (of Finite-Dimension Vector Space)
Theorem. D(a) induces the a-Hessian manifold, i.e.
iii) The manifold is equi-affine, with the Tchebychev potential given by:
and a-parallel volume form given by
iv) There exists biorthogonal coordinates:
From Vector Space to Function Space
Question: How to extend the above analysis to infinite-dimensional
for any two functions in some function space, and an arbitrary, strictly
increasing function .
Remark: Induced by convex inequality
For parameterized pdf’s, such divergence induces an a-independent metric,
but a-dependent dual connections:
Define monotone embedding (“scaling”) of a measurable function p as the
transformation r(p), where
is a strictly monotone function.
Therefore, monotone embeddings of a given probability density function
form a group, with functional composition as group operation:
i) r is strictly monotone iff r-1 is strictly monotone;
ii) r(t) = t as the identity element;
We recall that for a strictly convex function f :
iii) r1, r2 are strictly monotone, so is
DEFINITION: r-embedding is said to be conjugated to t-embedding with
respect to a strictly convex function f (whose conjugate is f*) if :
A sub-manifold is said to be r-affine if there exists a countable set of linearly
independent functions li(z) over a measurable space such that:
Here, q is called the “natural parameter”. The “expectation parameter” is
defined by projecting the conjugated t-embedding onto the li(z):
Example: For log-linear model (exponential family)
The expectation parameter is:
Parameterized Functions as Forming
a Submanifold under Monotone Scaling
F(q) is called the generating (partition) functional.
ii) Define, under the conjugate representations
then is Fenchel conjugate of .
F*(h) is called the generalized entropy functional.
Proposition. For the r-affine submanifold:
Theorem. The r-affine submanifoldis a-Hessian manifold.
Take f=r-(b), where:
now denoted by b.
a: parameter reflecting reference duality
b: parameter reflecting representation duality
They reduce to a-divergence proper A(a) and to Jensen difference E(a):
Proposition 1. Denote tangent vector fields which are,
at given p on the manifold, themselves functions in Banach space. The metric
and dual connections induced by take the forms:
Written in dually
Information Geometry on Banach Space
Corollary 1a. For a finite-dimensional submanifold (parametric model), with
The metric and dual connections associated with are given by:
Remark: Choosing reduces to the forms of Fisher
metric and the a-connections in classical parametric information geometry, where
Proposition 2. The curvature R(a) and torsion tensors T(a)associated with
any a-connection on the infinite-dimensional function space Bare identically zero.
CAVEAT: Topology? (G. Pistone and his colleagues)
Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper:
Remark: The (a,b)-divergence is the homogeneous f-divergence
As such, it should reproduce the standard Fisher metric and the dual alpha-
connections in their proper form. Again, it is the ab that takes the role of
the conventional “alpha” parameter.
equiv to d-divergence (Zhu & Rohwer, 1985)
includes KL divergence as a special case
equivalent to the canonical divergence
cubic form, Tchebychev 1-form
Convex-based a-divergence for
vector space of finite dim
function space of infinite dim
Generalized expressions of
Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195.
Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ.
Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65.
Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67).
Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170.
Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics.
Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.