1 / 38

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Using Discretization and Bayesian Inference Network Learning for Automatic Filtering Profile Generation. Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang. Contents. Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization

Download Presentation

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Discretization and Bayesian Inference Network Learning for Automatic Filtering Profile Generation Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

  2. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  3. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  4. Information Filtering

  5. The Filtering Profile • Information filtering system deals with users who have a relatively stable and long-term information need. • An information need is usually represented by a filtering profile.

  6. Construction of the Filtering Profile • Collect training data through the interactions with users. • Ex) gathering user feedback information about the relevance judgments for a certain information need or topic. • Analyze this kind of training data and construct the filtering profile by machine learning techniques. • Use this filtering profile to determine the relevance of a new document.

  7. The Uncertainty Issue • It is difficult to specify absolutely whether a document is relevant to a topic as it may only partially match with the topic. • Ex) “the economic policy of government” • The probabilistic approach is appropriate for this kind of task.

  8. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  9. Gathering training data by interactions with users Transformation of each document into an internal form Feature selection Discretization of the feature value Bayesian network learning An Overview of the Approach - For each topic

  10. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  11. Document Representation • All stop words are eliminated. • Ex) “the”, “are”, “and”, etc. • Stemming of the remaining words. • Ex) “looks”  “look”, “looking”  “look”, etc. • A document is represented by a vector form. • Each element in the vector is either the word frequency or the word weight. • The word weight is calculated as follows: • where N is the total number of documents and ni is the number of documents that contains the term i.

  12. Word Frequency Representation of a Document

  13. Feature Selection • Expected mutual information measure is given as • where Wi is a feature and Cj denotes the fact that the document is relevant to topic j. • Mutual information measures the information contained in the term Wi about topic j. • A document is represented as follows:

  14. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  15. Discretization Scheme • The goal of discretization is to find a mapping m such that the feature value is represented by a discrete value. • The mapping is characterized by a series of threshold levels (0, w1, …, wk) • where 0 < w1 < w2 < … < wk. • The mapping m has the following property: • where q is the feature value.

  16. Predefine Level Discretization • One determine the discretization level k and the threshold values. • Ex) Integers between 0 and 15 are discretized into three levels by the threshold values 5.5 and 10.5.

  17. Lloyd’s Algorithm • Consider the distribution of feature values. • Step 1: determine the discretization level k. • Step 2: select the initial threshold levels (y1, y2, …, yk - 1). • Step 3: repeat the following steps for all i. • Calculate the mean feature value iof ith region. • Generate all possible threshold levels between i and i+1. • Select the threshold level which minimizes the following distortion measure. • Step 4: If the distortion measure of this new set of threshold levels is less than that of the old set, then go to Step 3.

  18. Relevance Dependence Discretization (1/3) • Consider the dependency between the feature and the relevance of the topic. • The relevance information entropy is given as • where S is the group of feature values.

  19. Relevance Dependence Discretization (2/3) • The partition entropy of the region induced by w is defined as • where S1 is the subset of S with feature values smaller than w and S2 is S – S1. • The more homogeneous of the region, the smaller is the partition entropy. • The partition entropy controls the recursive partition algorithm.

  20. Relevance Dependence Discretization (3/3) • A criterion for recursive partition algorithm is as follows: • where (m; S) is defined as • where • k number of relevance classes in the partition S; • k1 number of relevance classes in the partition S1; • k2 number of relevance classes in the partition S2.

  21. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  22. Bayesian Inference for Document Classification • The probability of Cj given the document by Bayes’ Theorem is as follows:

  23. C T5 T4 T2 T1 T3 Background of Bayesian Networks • The process of inference is to use the evidence of some of the nodes that have observations to find the probability of some of the other nodes in the network.

  24. Learning Bayesian Networks • Parametric learning • The conditional probability for each node is estimated from the training data. • Structural learning • Best-first search • MDL score • A classification-based network • simplifies the structural learning process.

  25. MDL Score for Bayesian Networks • The MDL (Minimum Description Length) score for a Bayesian network B is defined as • where X is a node in the network. • The score for each node is calculated as

  26. Complexity of the Network Structure • Lnetwork is the network description length and corresponds to the topological complexity of a network and computed as follows: • where N is the number of training documents, sj is the number of possible states the variable Tji can take.

  27. Accuracy of the Network Structure • The data description length is given by the following formula: • where M() is the number of cases that match a particular instantiation in the training data. • The more accurate the network, the shorter is this length.

  28. Contents • Introduction • Overview of the approach • Automatic document pre-processing • Feature selection • Feature discretization • Learning Bayesian networks • Experiments and results • Conclusions and future work

  29. The Process of Information Filtering based on Bayesian Network Learning • Gather the training documents. • For all training documents, determine the relevance to each topic. • Feature selection for each topic. • 5 and 10 features were used in the experiments. • Discretization of the feature values. • Learn a Bayesian network for each topic. • Set the probability threshold value for the relevance decision. • Each Bayesian network corresponds to the filtering profile.

  30. Document Collections • Reuters 21 578 • 29 topics. • In chronological order, first 7 000 documents were chosen as the training set and the other 14 578 documents were used as test set. • FBIS (Foreign Broadcast Information Service) • 38 topics used in TREC (Text REtrieval Conferences). • In chronological order, 60 000 documents were chosen as the training set and the other 70 471 documents were used as test set.

  31. Evaluation Metrics for Information Retrieval

  32. Filtering Performance of the Bayesian Network on the Reuters Collection

  33. Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

  34. Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

  35. Filtering Performance of the Bayesian Network on the FBIS Collection

  36. Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

  37. Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

  38. Conclusions and Future Work • Discretization methods. • Structural learning. • Large data • Better performance over naïve Bayesian approach.

More Related