Minimum Information Inference. Naftali Tishby Amir Globerson ICNC, CSE The Hebrew University TAU, Jan. 2, 2005. Talk outline. Classification with probabilistic models: Generative vs. Discriminative The Minimum Information Principle Generalization error bounds
The Hebrew University
TAU, Jan. 2, 2005
Generalization – Can’t be computed directly
and variance of the samples in class y.
Why not estimate it directly? Generative classifiers (implicitly) estimate p(x), which is not really needed or known.
(important when X is very complex).
eBayes<0.5(H(Y)-I(X;Y)). (Hellman and Raviv 1970).
expected value of (X)
fMI(y|x)A generalization bound
Given a sequence the probability that another independently drawn sequence: is drawn from their joint distribution,
Suggesting Minimum Mutual Information (MinMI) as a general principle for joint (typical) inference.
Looks Familiar ?
Discriminative 1st Order LogLinear
used singleton marginal constraints.
The Information Bottleneck approach.