Feature Selection as Relevant Information Encoding . Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS 2001. Many thanks to: Noam Slonim Amir Globerson Bill Bialek Fernando Pereira Nir Friedman. Feature Selection?.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
School of Computer Science and Engineering
The Hebrew University, Jerusalem, Israel
The document clusters preserve the relevant
information between the documents and words
How much X is telling about Y?
I(X;Y): function of the joint probability distribution p(x,y) -
minimal number of yes/no questions (bits) needed to ask about x, in order to learn all we can about Y.
Uncertainty removed about X when we know Y:
I(X;Y) = H(X) - H( X|Y) = H(Y) - H(Y|X)
that needs short encoding ( small )
while preserving as much as possible the information on the relevant signal ( )
We want a short representation of X that keeps the information about another variable, Y, if possible.
The self consistent equations:
Assuming acontinuous manifoldfor
Coupled (local in ) eigenfunction equations, with as an eigenvalue.
Extending the dependency graphs
This can be done by alternating maximization of Entropy under the constraints:
The resulting functions are our relevant features at rank d.