**Learning Bayesian Network using Genetic Algorithms** Dhirubhai Ambani Institute of Information & Communication Technology Mentor Prof Suman Mitra DA-IICT, Gujarat 200701195 Ashish Kalya

**Introduction to Bayesian Network** • DAG represents Bayesian Structure • Conditional Probabilities distributions form Bayesian parameters Image Source: 1may,2011,”bayesiangraph.png” http://www.ra.cs.uni-tuebingen.de/software/JCell/images/docbook/bayesianGraph.png

**Two Approaches for Learning Bayesian Structure** • Constraint based • Finds a Bayesian network structure whose conditional independence constraints match those found in the data. • Heuristic Search methods • Traverse the search space heuristically to find the DAG that can best explain the data (i.e. could have generated the data). Traverse space looking for high-scoring structures • Example : K2 algorithm

**The Need for Heuristic Search Algorithms** Ideally we would search the space of all DAGs exhaustively and find the DAG which maximizes the Bayesian scoring criterion. However, for a large (not that large!) number of nodes this becomes infeasible [6]: Number of Nodes Number of DAGs 0 1 1 1 2 3 3 25 4 543 5 29281 6 3781503 7 1138779265 8 783702329343

**Genetic Algorithms (GAs) OVERVIEW** • Inspired by the biological evolution process • Encoding: each individual coded as a string of certain finite length called chromosome, generally a binary string • Fitness function : gives fitness value Space of strings Fitness Function Set of Rational Numbers

**Components of GAs** • Selection of individual as parents is inversely proportional to its fitness value • Crossover: strings(chromosome) are randomly mixed to form new offspring. • Mutation: randomly changes a string(chromosome) Parents: 00000000Offspring: 11100000 11111111 00011111 Parents: 00000000 Offspring: 00100000

**The Evolutionary Cycle** parents crossover & mutation selection modified offspring initiate & evaluation population evaluate evaluated offspring Introduction to Genetic Algorithms

**Related Work** • a genetic algorithm based upon the score-based greedy algorithm approach has been proposed in[2]. • Semantic crossover and mutation operator have been introduced in [3]. • Encoding of individuals using dual chromosome has been done [7], we have used that approach for generating our initial population

**Scoring Metric == Fitness Function ** • Maximum Likely hood Estimate (MLE) • Bayesian Information Criteria (BIC) • BIC punishes network complexity as simple networks are desirable [1]

**Stimulation Details** • Algorithm: GA with elitist model • Encoding : DAG represented by a string A11 A12 . . . A21 A22 . . . A1N A2N . . ANN where A is the adjacency matrix • Fitness function : -1 * Bayesian Information criteria metric • One point crossover and crossover rate=0.9 • Mutation rate = 0.01, 0.1 and variable rate (see fig.)

**Stimulation Details** • Initial population: A set of random DAGs • How do you generate Random DAGs • Upper triangular matrix is always acyclic • Permute the order of nodes and then rearrange the matrix correspondingly • Example: permute (1 2 3)= (1 3 2) 1 2 3 1 3 2 1 1 3 2 2 3

**ASIA Network Structure** “A very small belief network for a fictitious medical example about whether a patient has tuberculosis, lung cancer or bronchitis, related to their X-ray, dyspnea, visit-to-Asia and smoking status.” [8] Image scorce:http:1 may,2011,”asia.png”,//www.stanford.edu/class/cs221/project2_files/asia.png

**Stimulation Details** • Algorithms stimulated for 500, 1000, 2000 and 5000 cases • Number of generations considered: 50 , 100 , 150 and 250 • Size of population considered: 10, 20 , 50 and 100

**Issues faced** • Both crossover and mutation operators generate individuals which are not DAGs. • Need to find cyclic directed graphs and remove cycles

**Modified GA with elitist model** • A simple directed graph G has a directed cycle if and only if there is a back-edge in DFS(G) [5] • Ones all new individuals have been generated then we check if there is any back edge present and if found remove them. parents crossover & mutation selection remove cycle initiate & evaluation population evaluate evaluated offspring

**Observations and Analysis(1) ** Observed values: • (AF)average of fitness value of the best individual of each run of GA over 10 runs • (AH) average of Hamming distance between the string representing best individual of each run of GA and string representing the original structure over 10 runs • Example: Hamming distance is 2 0000011 0010010

**Observations and Analysis (2)** • For 500 and 1000 cases, AF values lesser than that for original network were reached quite frequently. • For higher number of cases larger population size gives better results but for 500 cases no such difference was observed between population sizes of 50 and 100. • Smaller data size reduces the impact of population size and number of generations

**Observations and Analysis (3)** • For very close or similar values of AF, we have very different AH • For normal GA, low and variable mutation gave comparatively better results. For modified GA , high mutation gave better results very clearly. • Modified GA performed better

**Conclusions ** • Less than 0.00000032 fraction of the search space was explored • Results for AH using modified GA are comparable with those obtained from K2 algorithm for ASIA network[9] • GAs make sense because approximate answers are acceptable specially so when number of cases is not large.

**Future Work ** • Remove cycles by making informed choice about which edge to remove • Need to carry these stimulations with more complex and large networks

**Acknowledgements ** • I would like to thank Prof. Suman Mitra for the initial conceptualization of idea. His regular inputs and study material provided by him were of great help.

**References ** • [1] Richard E. Neapolitan, Learning Bayesian Networks, Prentice Hall Series in Artificial Intelligence, Prentice Hall, December 2000. • [2] P. Larrañaga, M. Poza, Y. Yurramendi, R. H. Murga, and C. M. H. Kuijpers, “Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol 18, no 9, 1996. • [3] S. Shetty, M. Song, Structure learning of Bayesian networks using a semantic genetic algorithm-based approach, in: Third International Conference on Information Technology: Research and Education, 2005, ITRE 2005, pp. 454–458. • [4] Etxeberria, R, Larranaga, P, and Pikaza, J M 1997. "Analysis of the behaviour of the genetic algorithms when searching Bayesian networks from data", Pattern Recognition Letters Vol. 18 No 11-13 pp 1269-1273. • [5] Jorgen Bang-Jensen, Gregory Z. Gutin, Digraphs theory algorithm and applications, 2nd ed.,Springer, 2010. • [6] McKay, B. D.; Royle, G. F.; Wanless, I. M.; Oggier, F. E.; Sloane, N. J. A.; Wilf, H. (2004), "Acyclic digraphs and eigenvalues of (0,1)-matrices", Journal of Integer Sequences 7, http://www.cs.uwaterloo.ca/journals/JIS/VOL7/Sloane/sloane15.html, Article 04.3.3. • [7] J. Lee, W. Chung and E. Kim, “Structure Learning of Bayesian Networks Using Dual Genetic Algorithm,” IEICE Trans. Inf. & Syst., 2007 • [8] (22 April 2011) “Norsys Bayes Net Library” [online] http://www.norsys.com/networklibrary.html# • [9] Murphy,K.P. (2002) Bayes Net Toolbox. Technical Report, MIT Artificial Intelligence Laborator

**Thank You**