Create Presentation
Download Presentation

Download Presentation

The Power of Word Clusters for Text Classification

Download Presentation
## The Power of Word Clusters for Text Classification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**The Power of Word Clusters for Text Classification**Noam Slonim and Naftali Tishby Presented by: Yangzhe Xiao**Word-clusters vs words**• Reduced feature dimensionality. • More robust. • 18% increase in accuracy. • Challenge: Group similar words into word-clusters that preserve the information about document categories. --Information Bottleneck (IB) Method.**IB method is based on following idea:**Given the empirical joint distribution of two variables, one variable is compressed so that the mutual information about the other variable is preserved as much as possible. • find clusters of the members of the set X, denoted here by , such that the mutual information I( ;Y) is maximized, under a constraint on the information extracted from X, I ( ;X).**The problem has optimal formal solution without any**assumption about the origin of the joint distribution p(x,y).**Kullback-Leibler divergence between the conditional**distributions p(y|x) and Z(β,x) is a normalization factor. Single positive β determines the softness of the classification.**Normalized information curves for all 10 iterations in large**and small sample sizes