90 likes | 206 Views
This presentation by Noam Slonim and Naftali Tishby, presented by Yangzhe Xiao, explores the advantages of using word clusters for text classification. By reducing feature dimensionality, word clusters offer a more robust method, achieving an 18% increase in accuracy over traditional word approaches. The challenge lies in grouping similar words while preserving document category information. Utilizing the Information Bottleneck method, the goal is to maximize mutual information retention through optimal clustering. The Agglomerative IB Algorithm and analysis of information curves in varying sample sizes are also discussed.
E N D
The Power of Word Clusters for Text Classification Noam Slonim and Naftali Tishby Presented by: Yangzhe Xiao
Word-clusters vs words • Reduced feature dimensionality. • More robust. • 18% increase in accuracy. • Challenge: Group similar words into word-clusters that preserve the information about document categories. --Information Bottleneck (IB) Method.
IB method is based on following idea: Given the empirical joint distribution of two variables, one variable is compressed so that the mutual information about the other variable is preserved as much as possible. • find clusters of the members of the set X, denoted here by , such that the mutual information I( ;Y) is maximized, under a constraint on the information extracted from X, I ( ;X).
The problem has optimal formal solution without any assumption about the origin of the joint distribution p(x,y).
Kullback-Leibler divergence between the conditional distributions p(y|x) and Z(β,x) is a normalization factor. Single positive β determines the softness of the classification.
Normalized information curves for all 10 iterations in large and small sample sizes