1 / 29

Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers

Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers. Kaizhu Huang , Irwin King , Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICANN&ICONIP2003, June, 2003

loyal
Download Presentation

Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICANN&ICONIP2003, June, 2003 Istanbul, Turkey

  2. Outline • Abstract • Background • Classifiers • Naïve Bayesian Classifiers • Semi-Naïve Bayesian Classifiers • Chow-Liu Tree • Bounded Semi-Naïve Bayesian Classifiers • Mixture of Bounded Semi-Naïve Bayesian Classifiers • Experimental Results • Discussion • Conclusion The Chinese University of Hong Kong Multimedia Information Processing Lab

  3. Abstract • Propose a technique for constructing semi-naïve Bayesian classifiers. • It is bounded by the number of variables that can be combined into a node. • It has a less computational cost than the traditional semi-naïve Bayesian networks. • Experiments show the proposed technique is more accurate. • Upgrade the Semi-Naïve structure into a mixture structure • The expression power is increased • Experiments show the mixture approach outperforms other types of classifiers The Chinese University of Hong Kong Multimedia Information Processing Lab

  4. A Typical Classification Problem • Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease. The Chinese University of Hong Kong Multimedia Information Processing Lab

  5. Background • Probabilistic Classifiers • The classification mapping function is defined as: • The joint probability is not easily estimated from the dataset; Usually, the assumption about the distribution has to be made, e.g., dependent or independent? Posterior probability a constant for a given x w.r.t. cl Joint probability The Chinese University of Hong Kong Multimedia Information Processing Lab

  6. Related Work • Naïve Bayesian Classifiers (NB) • Assumption: Given the class label C, the attributes are independent: • Classification mapping function (1) The Chinese University of Hong Kong Multimedia Information Processing Lab

  7. Related Work • Naïve Bayesian Classifiers • NB’s performance is comparable with some state-of-the-art classifiers even when its independency assumption does not hold in normal cases. • Question: Can the performance be better when the conditional independency assumption of NB is relaxed? The Chinese University of Hong Kong Multimedia Information Processing Lab

  8. Related Work • Semi-Naïve Bayesian Classifiers(SNB) • A looser assumption than NB. • Independency occurs among the jointed variables, given the class label C. The Chinese University of Hong Kong Multimedia Information Processing Lab

  9. A tree dependence structure Related Work • Chow-Liu Tree (CLT) • Another looser assumption than NB. • A dependence tree exists among the variables, given the class variable C. The Chinese University of Hong Kong Multimedia Information Processing Lab

  10. A conditional independency assumption among jointed variables A conditional tree dependency assumption among variables Traditional SNBs are not well developed like CLT Chow & Liu68 developed a global optimal and polynomial time costalgorithm Summary of Related Work The Chinese University of Hong Kong Multimedia Information Processing Lab

  11. Semi-dependence does not hold in real cases as well Yes No No Inefficient even in jointing 3 variables Exponential time cost Problems of Traditional SNBs Accurate? Local heuristic Local heuristic Strong Assumption? Efficient? The Chinese University of Hong Kong Multimedia Information Processing Lab

  12. Our Solution • Bounded Semi-Naïve Bayesian Network(B-SNB) • Accurate? • We use a global combinatorial optimization method. • Efficient? • We find the network based on Linear Programming, which can be solved in polynomial time. • Mixture of B-SNB (MBSNB) • Strong assumption? • Mixture structure is a superclass of B-SNB The Chinese University of Hong Kong Multimedia Information Processing Lab

  13. Improved significantly Our Solution The Chinese University of Hong Kong Multimedia Information Processing Lab

  14. Bounded Semi-Naïve Bayesian Network Model Definition • Jointed variables • Completely covering the variable set without overlapping • Conditional independency • Bounded The Chinese University of Hong Kong Multimedia Information Processing Lab

  15. Constraining the Search Space • Large search space • Reduced by adding the constraint as follows: • The cardinality of each jointed variable is exactly equal to K • Hidden principle: • When K is small, a K cardinality of jointed variables will be more accurate than separating them into several jointed variables. • Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d). • Search space after reduction: The Chinese University of Hong Kong Multimedia Information Processing Lab

  16. Searching K-Bounded-SNB Model • How to search for the appropriate model? • Finding the m= [n/K ] K-cardinality subsets (jointed variables) from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood. • [x] means rounding the x to the nearest integer The Chinese University of Hong Kong Multimedia Information Processing Lab

  17. Relax the previous constraints into 0x1--an integer programming (IP) problem is changed into a linear programming (LP) problem Rounding Scheme: Rounding LP solution into an IP Solution. No coverage among jointed variables All the jointed variables forms the variable set Global Optimization Procedure The Chinese University of Hong Kong Multimedia Information Processing Lab

  18. E STEP M STEP , update Sk dby B-SNB method Mixture Upgrading (using EM) The Chinese University of Hong Kong Multimedia Information Processing Lab

  19. Experimental Setup • Datasets • 6 benchmark datasets from UCI machine learning repository • 1 synthetically generated dataset named “XOR” • Experimental Environments • Platform:Windows 2000 • Developing tool: Matlab 6.1 The Chinese University of Hong Kong Multimedia Information Processing Lab

  20. Experimental Results Overall Prediction Rate(%) • We set the bound parameter K to 2 and 3. • 2-BSNB means the BSNB model for bounded parameter set to 2. The Chinese University of Hong Kong Multimedia Information Processing Lab

  21. NB vs MBSNB The Chinese University of Hong Kong Multimedia Information Processing Lab

  22. BSNB vs MBSNB The Chinese University of Hong Kong Multimedia Information Processing Lab

  23. CLT vs MBSNB The Chinese University of Hong Kong Multimedia Information Processing Lab

  24. C4.5 vs MBSNB The Chinese University of Hong Kong Multimedia Information Processing Lab

  25. Average Error Rate Average Error Rate Chart The Chinese University of Hong Kong Multimedia Information Processing Lab

  26. Observations • Large K B-SNBs are not good for sparse datasets. • Post dataset: 90 samples; K=3, the accuracy decreases. • Which value for K is good depends on the properties of the datasets. • For example, Tic-Tac-Toe, Vehicle: 3-variable bias; K=3, the accuracy increases. The Chinese University of Hong Kong Multimedia Information Processing Lab

  27. Discussion • When n cannot be divided by K exactly • (n mod K)=l, l0, The assumption that all the joined variable has the same cardinality K will be violated. Solution: • Find an l-cardinality jointed variable with the minimum entropy • Do the optimization on the other n-l variables since (n-l mod K) will be 0. • How to choose K ? • When the sample number of the dataset is small, a large K may not get a good performance. • A good K should be related to the nature of the datasets. • A natural way is to use the cross validation methods to find the optimal K. The Chinese University of Hong Kong Multimedia Information Processing Lab

  28. Conclusion • A novel Bounded Semi-Naïve Bayesian classifier is proposed. • Direct combinatorial optimization method enables B-SNB to have global optimization. • The transformation from IP into an LP problem reduces the computational complexity into a polynomial one. • A Mixture of BSNB is developed • Expand the expression power of B-SNB • Experimental results show the mixture approach outperforms other types of classifiers. The Chinese University of Hong Kong Multimedia Information Processing Lab

  29. Thank you! The Chinese University of Hong Kong Multimedia Information Processing Lab

More Related