1 / 31

Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Ben Perry – M.S. Thesis Defense. A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data. Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857. Overview.

cicero
Download Presentation

Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ben Perry – M.S. Thesis Defense A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857

  2. Overview • Bayesian Network • Definitions and examples • Inference and learning • Genetic Algorithms • Structure Learning Background • Problem • K2 algorithm • Sparse Candidate • Improving K2: Permutation Genetic Algorithm (GASLEAK) • Shortcoming: greedy, sensitive to ordering • Permutation GA • Master’s thesis: Adjacency Matrix GA (SLAM GA) • Rationale • Evaluation with Known Bayesian Networks • Summary

  3. Bayesian Belief Networks (BBNS):Definition • Bayesian Network • Directed acyclic graph • Vertices (nodes): denote events, or states of affairs (each a random variable) • Edges (arcs, links): denote conditional dependencies, causalities • Model of conditional dependence assertions (or CI assumptions) • Example (“Ben’s Presentation” BBN) (sprinkler) • General Product (Chain) Rule for BBNs` Appearance:Good, Bad Ben is nervous: Extremely, Yes, No Sleep: Narcoleptic Well Bad All-nighter X2 X1 X4 X5 Ben’s presentation: Good, Not so good, Failed miserably X3 Memory: Elephant, Good, Bad, None P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)

  4. Graphical Modelsof Probability Distributions • Idea • Want: model that can be used to perform inference • Desired properties • Correlations among variables • Ability to represent functional, logical, stochastic relationships • Probability of certain events • Inference: Decision Support Problems • Diagnosis (medical, equipment) • Pattern recognition (image, speech) • Prediction • Want to Learn: Most Likely Model that Generates Observed Data • Under certain assumptions (Causal Markovity), it has been shown that we can do it • Given: data D (tuples or vectors containing observed values of variables) • Return: directed graph (V, E) expressing target CPTs • NEXT: Genetic algorithms

  5. Genetic Algorithms • Idea • Emulate natural process of survival of the fittest (Example: Roaches adapt) • Each generation has many diverse individuals • Each individual competes for the chance to survive • Most common approach: best individuals live to the next generation and mate • Produce children with traits from both parents • If parents are strong, children might be stronger • Major components (operators) • Fitness function • Chromosome manipulation • Cross-over (Not the “John Edward” type!), mutation • From (Educated?) Guess to Gold • Initial population typically random or not much better than random – bad scores • Performs well with a non-deceptive search space and good genetic operators • Ability to escape local optima with mutations. • Not guaranteed to get the best answer, but usually gets close

  6. 10 21 22 13 20 19 23 16 15 36 6 5 4 27 11 31 32 34 35 37 17 12 29 28 24 18 25 26 7 8 9 33 14 30 1 2 3 Learning Structure:K2 Algorithm • Algorithm Learn-BBN-Structure-K2 (D, Max-Parents) FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn} WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent Best argmaxj>i (P(D | xjParents[xi]) // max Dirichlet score IF (Parents[xi] + Best).Score> Parents[xi].Score) THEN Parents[xi] += Best RETURN ({Parents[xi] | i {1, 2, …, n}}) • ALogical Alarm Reduction Mechanism [Beinlich et al, 1989] • BBN model for patient monitoring in surgical anesthesia • Vertices (37): findings (e.g., esophageal intubation), intermediates, observables • K2: found BBN different in only 1 edge from gold standard (elicited from expert)

  7. Learning Structure:K2 downfalls • Greedy (may fall into local maxima) • Highly dependent upon node ordering • Optimal node ordering must be given • If optimal order is already known, an expert could probably create the network • Number of orderings consistent with DAGs is exponential (n!)

  8. Learning Structure:Sparse Candidate • General Idea: • Inspect k-best parent candidates at a time. (K2 only inspects one) • k is typically very small ~ 5 ≤ k ≤ 15 • Exponential to the order of k • Algorithm: Loop until no improvements or iteration limit exceeds: For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase] • Must handle cycles.. expensive. • K2 gives this to us for free • Next: Improving K2

  9. Genetic Algorithm for Structure Learning from Evidence, AIS, and K2 [2] Representation Evaluator for Bayesian Network Structure Learning Problems Dtrain(Structure Learning) D: Training Data Dval(Inference) : Evidence Specification f(α) Ordering Fitness α Candidate Ordering [1] Permutation Genetic Algorithm Optimized Ordering GASLEAK:A Permutation GA for Variable Ordering

  10. Properties of the Genetic Algorithm • Elitist • Chromosome representation • Integer permutation ordering • Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4 • Seeding • Random shuffle • Operators • Order crossover • Swap mutation • Fitness • RMSE • Job farm • Java-based; Utilize many machines regardless of OS

  11. Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables. GASLEAK results • Not encouraging • Bad fitness functionor bad evidence b.v. • Many graph errors

  12. Master’s Thesis: SLAM GA • SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm • Initial population- tried several approaches: • Completely Random Bayesian Networks (Box-Muller, Max parents) • Many illegal structures; wrote fixCycles algorithm. • Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate • Performed better than random • Aggregate of k learned networks from K2 given random orderings (cycles eliminated) – Best approach

  13. K2 Random Order Aggregator Instantiater Training Data K2 Manager BBN K2 Random Order D 1 K2 Random Order BBN 2 Aggregator Aggregate BBN . . . . BBN BBN k • For small networks, k=1 is best. For larger networks, k=2 is best.

  14. SLAM GA • Chromosome representation • Edge matrix – n^2 bits • Each bit represents a parent edge to node. • 1 = parent, 0 = not parent • Operators • Crossover: Swap parents, fix cycles.

  15. SLAM GA: Crossover

  16. SLAM GA • Chromosome representation • Edge matrix – n^2 • Each bit represents a parent edge to node. • 1 = parent, 0 = not parent • Operators • Crossover: Swap parents, fix cycles. • Mutation: Reverse, delete, or add a random number of edges. Fix cycles. • Fitness • Total Bayesian Dirichlet equivalencescore for each node

  17. Learned network 1 Graph Error Results - Asia Best of first generation Actual 15 Graph Errors

  18. Results – Asia

  19. Learned network 2 Graph Errors Results - Poker Best of first generation Actual 11 Graph Errors

  20. Results - Poker

  21. Learned network 4 Graph Errors Results - Golf Best of first generation Actual 11 Graph Errors

  22. Results - Golf

  23. Learned network Results – Boerlage92 Initial Actual

  24. Results - Boerlage92

  25. Results - Alarm

  26. Final Fitness Values

  27. K2 vs. SLAM GA • K2: • Very good if ordering is known • Ordering is often not known • Greedy, very dependent on ordering. • SLAM GA • Stochastic; falls out of local optima trap • Can improve on bad structures learned by K2 • Takes much longer than K2

  28. GASLEAK vs. SLAM GA • GASLEAK: • Gold network never recovered • Much more computationally-expensive • K2 is run on each [new] individual each generation • Each chromosome must be scored • Final network has many graph errors • SLAM GA • For small networks, gold standard network often recovered. • Relatively few graph errors for final network. • Less computationally intensive • Initial population most expensive • Each chromosome must be scored

  29. SLAM GA: Ramifications • Effective structure learning algorithm • Ideal for small networks • Improvement over GASLEAK • SLAM GA faster in spite of same GA parameters • SLAM GA more accurate • Improvement over K2 • Aggregate algorithm produces better initial population • Parent-swapping crossover technique effective • Diversifies search space while retaining past information

  30. SLAM GA: Future Work • Parameter tweaking • Better fitness function • Several ‘bad’ structures score better than gold standard • GA works fine • ‘Intelligent’ mutation operator • Add edges from pre-qualified set of candidate parents • New instantiation methods • Use GASLEAK • Other structure-learning algorithms • Scalability • Job farm

  31. Summary • Bayesian Network • Genetic Algorithms • Learning Structure: K2, Sparse Candidate • GASLEAK • SLAM GA

More Related