Feature Selection as Relevant Information Encoding . Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS 2001. Many thanks to: Noam Slonim Amir Globerson Bill Bialek Fernando Pereira Nir Friedman. Feature Selection?.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Feature Selectionas Relevant Information Encoding
School of Computer Science and Engineering
The Hebrew University, Jerusalem, Israel
Many thanks to:
The document clusters preserve the relevant
information between the documents and words
How much X is telling about Y?
I(X;Y): function of the joint probability distribution p(x,y) -
minimal number of yes/no questions (bits) needed to ask about x, in order to learn all we can about Y.
Uncertainty removed about X when we know Y:
I(X;Y) = H(X) - H( X|Y) = H(Y) - H(Y|X)
Bottlenecks and Neural Nets
that needs short encoding ( small )
while preserving as much as possible the information on the relevant signal ( )
We want a short representation of X that keeps the information about another variable, Y, if possible.
The emerged effective distortion measure:
The iterative algorithm: (Generalized Blahut-Arimoto)
The self consistent equations:
Assuming acontinuous manifoldfor
Coupled (local in ) eigenfunction equations, with as an eigenvalue.
Multivariate Information Bottleneck:
Extending the dependency graphs
This can be done by alternating maximization of Entropy under the constraints:
The resulting functions are our relevant features at rank d.