Bayesian Network

Bayesian Network David Grannen Mathieu Robin Micheal Lynch Sohail Akram Tolu Aina

Bayesianism is a controversial but increasingly popular approach of statistics that offers many benefits, although not everyone is persuaded of its validity

Bayesians Networks based on a statistical approach presented by a mathematician, Thomas Bayes in 1763. This is an approach for calculating probabilities among several variables that are causally related but for whichthe relationships can't easily be derived by experimentation. Bayes formula provides the mathematical tool that combines prior knowledge with current data to produce a posterior distribution

It most likely seemed to be a complicated formula that looked something like this: P(a|b) = L(b|a)P(a) / [ L(b|a)P(a) + L(b|not a)P( not a) ] Following medical example, we have a patient who is concerned about his/her chances of experiencing a heart attack. Historical data that we have Population experiences heart attacks: 20% Smokers experience heart attacks : 90% (of all) Without experience of a heart attack smokers: 60% P(heart attack | smoker) = L(smoker | heart attack)Prior(heart attack) / [ L(smoker | heart attack)Prior(heart attack) + L(smoker | no heart attack)Prior(no heart attack) ] or P(heart attack | smoker) = (90% * 20%) / [ (90% * 20%) + (60% * 80%) ] P(heart attack | smoker) = 27%

Bayesian networks are complex diagrams that organize the body of knowledge in any given area by mapping out cause-and-effect relationships among key variables and encoding them with numbers that represent the extent to which one variable is likely to affect another. This approach allows scientists to combine new data with their existing knowledge or expertise.

In the late 1980 on the basis of work of Judea Pearl, a professor of computer science at UCLA, AI researchers discovered that Bayesian networks offered an efficient way to deal with the lack or ambiguity of information that has hampered previous systems. Bayesian networks provide "an overarching graphical framework" that brings together diverse elements of AI and increases the range of its likely application to the real world

Bayesian applications

Decision-making using Bayesian methods has many applications in software applications. Best-known example is Microsoft's Office Assistant .When a user calls up the assistant, Bayesian methods are used to analyse recent actions in order to try to work out what the user is attempting to do, with this calculation constantly being modified in the light of new actions. • Microsoft is the most aggressive in exploiting Bayesian approach. The company offers a free Web service that helps customers diagnose printing problems with their computers and recommends the quickest way to resolve them. Another Web service helps parents diagnose their children's health problems.

Scott Musman, a computer consultant in Arlington, Va., recently designed a Bayesian network for the Navy that can identify enemy missiles, aircraft or vessels and recommend which weapons could be used most advantageously against incoming targets. • General Electric is using Bayesian techniques to develop a system that will take information from sensors attached to an engine and, based on expert opinion built into the system as well as vast amounts of data on past engine performance, pinpoint emerging problems

Representation of Graphical Models • Graphical models are graphs in which nodes represent random variables. • A Bayesian Network is kind of directed graphical model , which takes into account the directionality of the arcs. (arrows between nodes) • Advatage ofa directed graphical model is that one can regard an arc from A to B as indicating that A ``causes'' B.. A B

Graphical Models 2 • Along with Graph , it is necessary to specify the parameters of the model. • For a directed model, we must specify the Conditional Probability Distribution (CPD) at each node. • If the variables are discrete, this can be represented as a table (CPT), which lists the probability that the child node takes on each of its different values for each combination of values of its parents.

Example – wet Grass

Example – Wet grass • Event “grass is wet – 2 causes Rain or sprinkler. • From table Pr(W = true) | S=true, R= False0 = 0.9 , each row sums to 1.0 so Pr(W = false | S=true , R = false) = 0.1 • Developing Inference from the Bayesian networks

Inference We observe the grass is wet- 2 causes sprinkler or rain .. Which is more likely ??? Pr(S=1|W=1) = Σ Pr(S=1, W=1) / Pr(W=1) = 0.2781/0.6 Pr(S=1|W=1) = Σ Pr(R=1, W=1) / Pr(W=1) = 0.4581/0.6 Normalizing Pr(W=1) = 0.6471

Inference 2 • Pr(S=1| W=1) =0.2781/0.6471 = = 0.429 • Pr((R=1|W=1) = 0.4581 / 0.6471 == 0.7079 • More likely grass is wet because its raining!! • Example given is “bottom up” Bayes Network from effects to causes. Top down reasoning also possible using example above we can deduce probability grass is wet given that its cloudy.

Inference (cont.) • Inference is concerned with, how can we use graphical models to efficiently answer probabilistic queries? • Uses Bayes thoerem • P(B|A) = odds P(A|B) / 1 + P(A|B) • A prior probability is based on previously observed data • Conditional probability of the form P(B|A)

Scenario • Apartment with a smoke detector • Smoke detector near bathroom • Taking shower often triggers detector (smoke detectors detect stream)

Scenario (2) B (burn dinner) O (plan to go out ) A (smoke alarm) S (take shower) F( Electrical Fire)

Bayes theorem (2) • Conditional probabilities specify the degree of belief in some proposition or propositions based on the assumption that some other propositions are true. • Therefore the theory has no meaning without prior resolution of the probability of these antecedent propositions.

Approch Top down The probability an event will occur given it aprior probability Bottom up Reasoning which starts from effect and tries to determine the causes

types of inference (a) Predictive - a can cause b (b) Diagnostic - b is evidence of a (c) Intercasual - a and b can cause c a explains c so its evidence against b (“explaining away”,“Berkson's paradox”, or "selectionbias") a a a b b b c

Example • The a priori probability of a burglary B is 0.0001. • The conditional probability of an alarm A given a burglary is Pr(A|B)

Example (2) Burglary No Burglary +----------+----------+ Alarm | 0.95 | 0.01 | +----------+----------+ No Alarm | 0.05 | 0.99 | +----------+----------+ What is value of Pr(B|A)?

Bayesian Learning Sources. A Tutorial on Learning Bayesian Networks by David Heckerman MSR-TR-95-06 Learning Bayesian Networks from Data by Nir Friedman and Moises Goldszmidt from Berkeley and SRI International

The easier side to Bayesian Learning Chorus In the Theory we can build a sample, With Convergeance surely guarenteed, But beware of autocorrelations, Or it will take forever to succeed! Verse 4 When it runs aint it thrillin To the last Iteration. It frolics and plays throughout n-space Walkin’ in a Bayesian Wonderland Ending Random walkin’ in a Bayesian Wonderland.

In perspective

Where Learning enters the arena • Bayesian Networks Summarise as follows as; • Efficient representations of probability distributions • Local Models • Independence • Effective representations of Probability Distributions for • Computing posterior probabilities • Computing most probable instantiation • Decision making • But there is more i.e. Statistical Induction -> Learning

The Learning Process • Done by • Encode existing ‘expert’ knowledge in a Bayesian Network • Use a database to update this knowledge – creating one or more new Bayesian Networks • Results in • Refinement of original knowledge • Sometimes the identification of new distinctions and relationships • Robust to the errors in knowledge of experts

Similar to Neural Net Learning • But with the following advantages • We can easily encode expert knowledge – increasing efficiency and accuracy of learning • Nodes and Arcs in learned Bayesian Networks often correspond to recognizable distinctions and causal relationships • Thus it is easier to understand and interpreted the knowledge encoded in the representation

Bayesian Learning– The Problem

Why Learning • Feasibility of Learning • Availability of data and computational power • Need for Learning • Characteristics of current systems and processes • Defy closed form analysis • => need data driven approach for characterisation • Scale and change fast • => need continuous automatic adaptation • Examples • Communications networks, illegal activities, the brain, economic markets

Why Learn a Bayesian Network • Combine knowledge engineering and statistical induction • Covers the whole spectrum from knowledge intensive model construction to data intensive model induction • More than a learning black-box • Explanation of outputs • Interpretability and modifiability • Algorithms for decision making, value of information diagnosis an repair • Causal representation , reasoning and discovery • i.e. does smoking cause cancer

A Simple Example • Wang presents a simple example in [2] using only the first four operations, which I reproduce in abbreviated form here. He begins with the following 8 statements: • robin (= feathered-creature <1.00, 0.90> • bird (= feathered-creature <1.00, 0.90> • wan (= bird <1.00, 0.90> • wan (= swimmer <1.00, 0.90> • gull (= bird <1.00, 0.90> • gull (= swimmer <1.00, 0.90> • row (= bird <1.00, 0.90> • row (= swimmer <0.00, 0.90> • (Note that giving a statement with a frequency of 0.00 simply means that it is not true.) The system is then asked to evaluate the truth value of "robin (= swimmer". It comes to the following conclusions, in this order: • robin (= bird <1.00, 0.45> (1 and 2, abduction) • bird (= swimmer <1.00, 0.45> (3 and 4, induction) • obin (= swimmer <1.00, 0.20> (9 and 10, deduction) • bird (= swimmer <1.00, 0.45> (5 and 6, induction) • bird (= swimmer <1.00, 0.62> (10 and 12, revision) • ird (= swimmer <0.00, 0.45> (7 and 8, induction) • bird (= swimmer <0.67, 0.71> (13 and 14, revision) • robin (= swimmer <0.67, 0.32> (9 and 15, deduction) • Note that NARS actually comes to a great many more conclusions than this, but the ones shown are the ones that actually lead toward the conclusion. Also, NARS reports the conclusions at both lines 11 and 16, since the guesswork involved necessarily means it needs to be able to change its mind, as it were. The final conclusion, given at line 16, means that two thirds of the relevant evidence indicates that a robin can swim, but that this conclusion has somewhat less than one third of the possible degree of confidence; both of these items, of course, indicate the need for more information :-).

A Comparison with another Learning Technique

Current Topics • Time • Beyond discrete time and beyond fixed rate • Causality • Removing the assumptions • Hidden Variables • Where to place them and how many • Model Evaluation and active learning • What parts of it are suspect and what and how much data is needed.

Decision Theory (1) • What happens when it is time to convert beliefs into actions? • Decision Theory = Probability Theory + Utility Theory

Decision Theory (2) • Decompose a multi-attribute utility fonction into a sum of local utilities • Each term is a node, which has as parents: • The random variables on which it depends • The action (control) nodes • The resulting graph is an influence diagram • Finally, compute the optimal sequence of actions to perform to maximize expected utility

Applications (1) • QMR-DT: a decision-theoretic reformulation of the Quick Medical Reference model

Some Applications • Biostatistics – Medical Research Council Bayesian Inferance Using Gibbs Sampling BUGS) • Data Analysis – NASA (AutoClass) • Collaborative filtering – Microsoft (Microsoft Belief Networks - MSBN) • Fraud Detection – ATT • Speech recognition – UC Berkeley

Applications (2) • Real-Time decision: NASA’s system Visa • Genetics: linkage analysis • Speech recognition • Data compression: density estimation • Coding: turbocodes

Applications : MS Office • MS office assistant: The Lumière Project • Source: The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users, by E. Horvitz, J. Breese, D. Heckerman, D. Hovel, K. Rommelse (Microsoft Research)

User behaviour is monitored to determine Assistant actions. Examples: Search Focus of attention Introspection Undesired effects Inefficient command sequences Domain-specific syntactic and semantic content MS Office (2)

MS Office (3) • Portion of a Bayesian Net for infering the likehood that a user needs assistance, considering profile info and recent activity

Bayesian Network