Download
information bottleneck n.
Skip this Video
Loading SlideShow in 5 Seconds..
Information Bottleneck PowerPoint Presentation
Download Presentation
Information Bottleneck

Information Bottleneck

232 Views Download Presentation
Download Presentation

Information Bottleneck

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Information Bottleneck presented by Boris Epshtein & Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2004

  2. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  3. Motivation Clustering Problem

  4. Motivation • “Hard” Clustering – partitioning of the input data into several exhaustive and mutually exclusive clusters • Each cluster is represented by a centroid

  5. Motivation • “Good” clustering – should group similar data points together and dissimilar points apart • Quality of partition – average distortion between the data points and corresponding representatives (cluster centroids)

  6. Motivation • “Soft” Clustering – each data point is assigned to all clusters with some normalized probability • Goal – minimize expected distortion between the data points and cluster centroids

  7. Motivation… Complexity-Precision Trade-off • Too simple model Poor precision • Higher precision requires more complex model

  8. Motivation… Complexity-Precision Trade-off • Too simple model Poor precision • Higher precision requires more complex model • Too complex model Overfitting

  9. Motivation… Complexity-Precision Trade-off • Too Complex Model • can lead to overfitting • is hard to learn • Too Simple Model • can not capture the real structure of the data • Examples of approaches: • SRM Structural Risk Minimization • MDL Minimum Description Length • Rate Distortion Theory Poor generalization

  10. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  11. Definitions… Entropy • The measure of uncertainty about the random variable

  12. Definitions… Entropy - Example • Fair Coin: • Unfair Coin:

  13. Definitions… Entropy - Illustration Highest Lowest

  14. Definitions… Conditional Entropy • The measure of uncertainty about the random variable given the value of the variable

  15. Definitions… Conditional EntropyExample

  16. Definitions… Mutual Information • The reduction in uncertainty of due to the knowledge of • Nonnegative • Symmetric • Convex w.r.t. for a fixed

  17. Definitions… Mutual Information - Example

  18. Definitions… Kullback Leibler Distance Over the same alphabet • A distance between distributions • Nonnegative • Asymmetric

  19. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  20. Rate Distortion TheoryIntroduction • Goal: obtain compact clustering of the data with minimal expected distortion • Distortion measure is a part of the problem setup • The clustering and its quality depend on the choice of the distortion measure

  21. Rate Distortion Theory Data • Obtain compact clustering of the data with minimal expected distortion given fixed set of representatives ? Cover & Thomas

  22. Rate Distortion Theory - Intuition • zero distortion • not compact • high distortion • very compact

  23. Rate Distortion Theory – Cont. • The quality of clustering is determined by • Complexity is measured by • Distortion is measured by (a.k.a. Rate)

  24. Rate Distortion Plane D - distortion constraint Minimal Distortion Maximal Compression Ed(X,T)

  25. Rate Distortion Function • Let be an upper bound constraint on the expected distortion Higher values of mean more relaxed distortion constraint Stronger compression levels are attainable • Given the distortion constraint find the most compact model (with smallest complexity )

  26. Rate Distortion Function • Given • Set of points with prior • Set of representatives • Distortion measure • Find • The most compact soft clustering of points of that satisfies the distortion constraint • Rate Distortion Function

  27. Rate Distortion Function Complexity Term Distortion Term Lagrange Multiplier Minimize !

  28. Rate Distortion Curve Minimal Distortion Maximal Compression Ed(X,T)

  29. Rate Distortion Function Minimize Subject to The minimum is attained when Normalization

  30. Solution - Analysis Solution: Known The solution is implicit

  31. Solution - Analysis Solution: For a fixed When is similar to is small closer points are attached to with higher probability

  32. Solution - Analysis Solution: Fix t reduces the influence of distortion does not depend on this + maximal compression single cluster Fix x most of cond. prob. goes to some with smallest distortion hard clustering

  33. Solution - Analysis Solution: Intermediate soft clustering, intermediate complexity Varying

  34. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  35. Blahut – Arimoto Algorithm Input: Randomly init Optimize convex function over convex set the minimum is global

  36. Blahut-Arimoto Algorithm Advantages: • Obtains compact clustering of the data with minimal expected distortion • Optimal clustering given fixed set of representatives

  37. Blahut-Arimoto Algorithm Drawbacks: • Distortion measure is a part of the problem setup • Hard to obtain for some problems • Equivalent to determining relevant features • Fixed set of representatives • Slow convergence

  38. Rate Distortion Theory – Additional Insights • Another problem would be to find optimal representatives given the clustering. • Joint optimization of clustering and representatives doesn’t have a unique solution. (like EM or K-means)

  39. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  40. Information Bottleneck • Copes with the drawbacks of Rate Distortion approach • Compress the data while preserving “important” (relevant) information • It is often easier to define what information is important than to define a distortion measure. • Replace the distortion upper bound constraint by a lower bound constraint over the relevant information Tishby, Pereira & Bialek, 1999

  41. Information Bottleneck-Example Given: Documents Joint prior Topics

  42. Information Bottleneck-Example Obtain: I(Word;Topic) I(Cluster;Topic) I(Word;Cluster) Words Partitioning Topics

  43. Information Bottleneck-Example Extreme case 1: I(Cluster;Topic)=0 Not Informative I(Word;Cluster)=0 Very Compact

  44. Information Bottleneck-Example Extreme case 2: I(Cluster;Topic)=max VeryInformative I(Word;Cluster)=max Not Compact Minimize I(Word; Cluster) & maximize I(Cluster; Topic)

  45. Information Bottleneck topics words Compactness Relevant Information

  46. Relevance Compression Curve Maximal Relevant Information D – relevance constraint Maximal Compression

  47. Relevance Compression Function • Let be minimal allowed value of Smaller more relaxed relevant information constraint Stronger compression levels are attainable • Given relevant information constraint Find the most compact model (with smallest )

  48. Relevance Compression Function Compression Term Relevance Term Lagrange Multiplier Minimize !

  49. Relevance Compression Curve Maximal Relevant Information Maximal Compression

  50. Relevance Compression Function Minimize Subject to The minimum is attained when Normalization