1 / 28

Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlovi ć

Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers. Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlovi ć Department of Computer Science, Rutgers University James M. Rehg

merrill
Download Presentation

Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlovi ć

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boosted Augmented Naive BayesEfficient discriminative learning of Bayesian network classifiers Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlović Department of Computer Science, Rutgers University James M. Rehg GVU, College of Computing, Georgia Institute of Technology

  2. Contribution • Boosting approach to Bayesian network classification • Additive combination of simple models (e.g. Naïve Bayes) • Weighted maximum likelihood learning • Generalizes Boosted Naïve Bayes (Elkan 1997) • Comprehensive experimental evaluation of BNB. • Boosted Augmented Naïve Bayes (BAN) • Efficient training algorithm • Competitive classification accuracy • Naïve Bayes, TAN, BNC (2004), ELR (2001)

  3. Bayesian network • Modular and Intuitive graphical representation • Explicit Probabilistic Representation Bayesian network classifiers • Joint distribution • Conditional distribution • Class Label • How to efficiently train Bayesian network discriminatively to improve its classification accuracy?

  4. Parameter Learning • Maximum Likelihood parameter learning • Efficient parameter learning algorithm • Maximizes LLG score • No analytic solution for parameters that maximizes CLLG

  5. C • Ensemble of sparse model as an alternative to B • Using ML to train each sparse model Model selection A • ML does not optimize CLLA • ELRA optimizes CLLA (Greiner and Zhou, 2002) Excellent classification accuracy Computationally expensive in training B • ML optimizes CLLB when B is optimal • BNC algorithm searches for the optimal structure (Grossman and Domingos, 2004)

  6. Talk outline Our Goal: • Combine parameter and structure optimization • Avoid over-fitting • Retain training efficiency • Minimization function for Boosted Bayesian network • Empirical Evaluation of Boosted Naïve Bayes • Boosted Augmented Naïve Bayes (BAN) • Empirical Evaluation of BAN

  7. Exponential Loss Function (ELF) • Boosted Bayesian network classifier minimizes ELF function. • ELFF is an upper bound of –CLLF

  8. Minimizing ELF via ensemble method • Ensemble method • Adaboost (Population version) constructs F(x) additively to approximately minimizes ELFF • Discriminatively updates the data weights • Tractable ML learning to train the parameters

  9. (13) BNB (10) NB (2) Results: 25 UCI datasets (BNB) BNB vs. NB 0.151 vs. 0.173

  10. Results: 25 UCI datasets (BNB) (13) (14) BNB (10) BNB (9) BNB vs. NB 0.151 vs. 0.173 BNB vs. TAN 0.151 vs. 0.184 NB (2) TAN (2) BNB (5*) (16) BNB (7) (15) BNB vs. ELR-NB 0.151 vs. 0.161 BNB vs. BNC-2P 0.151 vs. 0.164 ELR-NB (4*) BNC-2P (3)

  11. Evaluation of BNB • Computationally Efficient method • O(MNT) , T = 5~20, O(MN) • Good classification Accuracy • Outperforms NB, TAN • Competitive with ELR, BNC • Sparse structure + boosting = competitive accuracy • Potential drawbacks • Strongly correlated features (Corral, etc)

  12. Structure Learning • Challenge: • Efficiency • NP-hard problem • K-2, Hill Climbing search still examines polynomial number of structures • Resisting overfitting • Structure controls classifier capacity • Our proposed solution: • Combines sparse model to form an ensemble • Constrains edge selection

  13. Creating • Step 1 (Friedman et al. 1999) • Build pair-wise conditional mutual information table • Create maximum spanning tree using conditional mutual information as edge weight • Convert a undirected graph into a directed graph 1 2 3 4

  14. Initial structure • Select Naïve Bayes • Create BNB via AdaBoost • Evaluate BNB 1 2 3 4

  15. Ensemble CLL = -0.65 Ensemble CLL = -0.50 Ensemble CLL = -0.55? Iteratively adding edges Ensemble CLL = -0.75 1 2 3 4

  16. Final BAN structure Ensemble of the final structure produced by

  17. Analysis of BAN • BAN • The base structure is sparser than BNC model • BAN uses an ensemble of sparser models to approximate a densely connected structure Example of BAN model Example of BNC-2P model

  18. Computational complexity of BAN • Training Complexity: O(MN^2+ MNTS) • O (MN^2) G_tree • O (MNTS) Structure Search • T => boosting iteration per structure • S => number of structure examined • S < N • Empirical training time • T = 5~25, S = 0~5 • Approximately 25-100 times the training of NB

  19. Result (simulated dataset): True structure: Naïve Bayes: • 25 different distribution • CPT table • Number of features • 4000 samples, 5 fold cross validation

  20. Results: (simulated dataset): (6) BAN(19) NB (0) • BAN VS NB

  21. BNB (0) Results: (simulated dataset): True structure: BAN (3) 22 • BAN VS BNB • Correct edges added under BAN BNB achieved optimal error in 22 datasets BAN outperforms BNB in the remaining 3

  22. Results: 25 UCI datasets (BAN) • Standard datasets for Bayesian network classifiers • Friedman et. al. 1999 • Greiner and Zhou 2002 • Grossman and Domingos 2004 • 5 fold cross validation • Implemented NB, TAN, BAN, BNB, BNC-2P • Obtained results for ELR-NB, ELR-TAN

  23. Results: BAN vs. Standard method (13) BAN (10) BAN (10) NB (2) TAN (2) BAN VS NB 0.141 VS 0.173 BAN VS TAN 0.141 VS 0.184

  24. Results: BAN vs. Structure Learning BAN (7) BNC (1) BAN VS BNC-2P 0.141 VS 0.164 BAN contains 0-5 augmented edges BNC-2P contains 4-16 augmented edges

  25. Results: BAN vs. ELR BAN (8)* (13) (14) BAN (5)* BAN (6)* BAN (4)* • BAN VS ELR-NB 0.141 vs. 0.161 • BAN VS ELR-TAN 0.141 vs. 0.155 Error stats directly taken from published results BAN is more efficient to train

  26. Evaluation of BAN vs. BNB • Comparison under significance test • BAN outperforms BNB (7) • Corral • 2% - 5% • BNB outperforms BAN (2) • 0.5%-2% • Not significant 13 • BAN choose BNB as base structure • IRIS, MOFN • Average testing error • 0.141 vs. 0.151 • BAN outperforms BNB (16) • BNB outperforms BAN (6) BAN (7) (14) BNB (2) • BAN VS BNB 0.141 VS 0.151

  27. Conclusion • An ensemble of sparse model as an alternative to structure and parameter optimization • Simple to implement • Very efficient in training • Competitive classification accuracy • NB, TAN, HGC • BNC • ELR

  28. Future Work • Extend BAN to handle sequential data • Analyze the class of Bayesian network classifiers that can be approximated with an ensemble of sparse structures. • Can the BAN model parameters be obtained through parameter learning given the final model structure? • Can we use BAN approach to learn generative models?

More Related