Evidence and scenario sensitivities in naïve Bayesian classifiers

Evidence and scenario sensitivities in naïve Bayesian classifiers Presented by Marwan Kandela & Rejin James Silja Renooij, Linda C. van der Gaag, "Evidence and scenario sensitivities in naive Bayesian classifiers," International Journal of Approximate Reasoning, Volume 49, Issue 2, October 2008, Pages 398-416, ISSN 0888-613X, 10.1016/j.ijar.2008.02.008. (http://www.sciencedirect.com/science/article/pii/S0888613X08000236)

Authors • Professors at the University of Utrecht (The Netherlands). • Department of Information and Computing Sciences. • Many publications together, Papers (peer reviewed),Journals… etc • Conference/Workshop Silja Renooij Linda C. van der Gaag

Introduction • What is a Classifier ? • It is an Algorithm that Classifies a new observation to one category in a set of categories. • What is a Feature ? • The individual Observations are analyzed into a set of quantifiable properties known as Features.

Naïve Bayesian Classifier • It is a simple probability classifier based on applying Bayes' Theorem with a strong(naïve) independence assumption. • It assumes the presence of a feature of a class is unrelated to the presence of another feature given the class variable.

Sensitivity Analysis • Sensitivity analysis, investigates the robustness of answers from a Bayesian analysis to uncertainty about the precise details of the analysis. • An answer is robust if it does not depend sensitively on the assumptions and calculation inputs on which it is based. • A sensitivity function is used for this.

Sensitivity Function • Parameter probabilities of a Bayesian network • Estimated from data • Assessed by domain experts • Include some inaccuracies • A Bayesian network can be subjected to a sensitivity analysis To investigate the effects of these inaccuracies on the computed posterior probabilities • Focus on sensitivity analyses in which just one parameter is being varied; such an analysis is termed a one-way sensitivity analysis • The effects of the parameter variation are captured by a simple mathematical function, called a sensitivity function

Sensitivity Function • A one-way sensitivity function f(x) can take one of three general forms. It is linear if: • The probability of interest is a prior probability rather than a posterior probability • The probability of the entered evidence is unaffected by the parameter variation • In the other two cases the sensitivity function is a fragment of a rectangular hyperbola, which takes the general form • We will focus on this last type of function and assume any sensitivity function to be hyperbolic unless explicitly stated otherwise. • As we all know a rectangular hyperbola in general has two branches and two asymptotes defining its center (s,t) • The equation above could be used to calculate the values we may need

Rectangular Hyperbola • The figure has four different quadrants for the possible outcomes • This explains the significance of our r, s & t values calculated from the last slide • Depending on these values we can tell which quadrant we fall in

Sensitivity to class parameters • Functional forms • Hyperbolic sensitivity function that describes an output probability of a naïve Bayesian network in terms of a single parameter

Sensitivity to class parameters Where s is the value defining the vertical asymptote of the sensitivity function Where t is the value defining the horizontal asymptote of the sensitivity function We will use an Example that was used in the paper to explain the calculations: Ex. Consider a naïve Bayesian network with a class variable S modeling the possible stages of cancer of the oesophagus. The parameter probabilities for this class variable are:

Sensitivity to class parameters • The feature variables of the network capture the results from diagnostic tests. For a particular patient, the available findings are summarized in the input instance e, giving rise to the following posterior probability distribution over the class variable: • You see the bolded parts? • Recall Xₒ and Pₒ’ from two slides ago? • The original parameter value Xₒ = .10 • The original posterior for stage IVA is Pₒ’ = .61 • So we plug these numbers in the equation and solve for the value of S: • S = ( 1 - .61) (.10) / (.10 - .61) = -0.07647  -0.08 • The sensitivity function therefore is a fourth quadrant hyperbola branch. • why do we say so? • How do we know?

Sensitivity to class parameters • We have no need to perform any more computations or calculate anything else. We can establish the equations we need: • We observe that the functions are fully determined by just the original value for the class parameter being varied and the original posterior probability distribution over the output variable of interest. • Any questions about the Calculations?

Sensitivity to class parameters • B. Sensitivity properties • Any sensitivity property pertaining to a class parameter can be computed from the sensitivity function we derived in the last slide. • That is why this paper also makes sure to go over the sensitivity value and admissible deviation. • Sensitivity Value • * By expressing the output probability in terms of a class parameter from the sensitivity function the following equation is readily established: Now we will use different combinations of Xₒ and Pₒ’ to compute sensitivity values from this function and plot them into a graph.

Sensitivity to class parameters The graph/plot shows that larger sensitivity values can be found only for rather extreme parameter values in combination with less extreme output probabilities So we conclude that as long as the class parameters of a naïve Bayesian network are not highly unbalanced, therefore, the probability with which an instance is predicted to belong to a particular class will be quite insensitive to parameter variation

Sensitivity to class parameters • Admissible deviation • * The admissible deviation for a parameter gives the amount of variation that is allowed in its original value before the most likely class value changes From the above proposition, we observe that any class parameter can be varied to the boundary of the unit window in one of the two possible directions without changing the most likely class value, as indicated by the  and  symbols.

Sensitivity to class parameters • We will use the same (oesophagus cancer) example to make our sample calculations and explain them: • Xₒ and Pₒ’ will have the same values only this time we add a third variable • The original parameter value Xₒ = .10 • The original posterior for stage IVA is Pₒ’ = .61 • The original posterior for stage IIA is Pₒ‾ = .19 • We chose IIA because it is the second stage that is most likely to happen after stage IVA • Now we plug these values in the formula for Xm: • Xm = (.19) (.10) / ((.9) (.61) ) + ((.19) (.10)) = 0.0334 0.03 • So the parameter under study is (0.07,) • This admissible deviation indicates that the parameter can be varied from 0.10 to 1.00 without inducing a change in the most likely stage for the patient.

Sensitivity to feature parameters • Functional forms • Hyperbolic sensitivity function that describes an output probability of a naïve Bayesian network in terms of a single parameter

Sensitivity to feature parameters • We will still consider stage IVA as our stage of interest and look at its original value • The original parameter value Xₒ = 0.52 • The original posterior for stage IVA is Pₒ’ = .61 • We plug in these values in the equation to get S • S = (.52) – (.52/.61) = -0.3324  -0.33 • Therefore the sensitivity function for stage IVA is a fourth quadrant hyperbola branch

Sensitivity to feature parameters Without performing any further computations we establish that • Just as the sensitivity functions for the class parameters, we find that the functions for the feature parameters are exactly determined by the original values for these parameters and the original posterior probability distribution for the output variable of interest. • Sensitivity properties • Any sensitivity property pertaining to a network’s feature parameters can be computed • Study the properties of sensitivity value and admissible deviation

Sensitivity to feature parameters • Sensitivity Value • * By expressing the output probability in terms of a feature parameter from the sensitivity function the following sensitivity value is readily established: Now we will use different combinations of Xₒ and Pₒ’ to compute sensitivity values from this function and plot them into a graph.

Sensitivity to feature parameters The graph/plot shows that larger sensitivity values can be found if the original value Xₒ for the parameter under study is quite small and the original posterior probability Pₒ’ is less extreme So we conclude that as long as the class parameters of a naïve Bayesian network are not highly unbalanced, therefore, will the probability with which an instance is predicted to belong to a particular class be quite insensitive to parameter variation

Sensitivity to feature parameters • Admissible deviation • * The admissible deviation for a parameter gives the amount of variation that is allowed in its original value before the most likely class value changes

Sensitivity to feature parameters From the proposition from last slide, we observe that any feature parameter can be varied to the boundary of the unit window in at least one direction without changing the most likely class value. • We consider again the naïve Bayesian network and the patient information from Examples 1 and 3. Suppose that we are once more interested in the effect of inaccuracies. • Xₒ , Pₒ’ and Pₒ‾ will have the same values • The original parameter value Xₒ = .52 • The original posterior for stage IVA is Pₒ’ = .61 • The original posterior for stage IIA is Pₒ‾ = .19 • We chose IIA because it is the second stage that is most likely to happen after stage IVA • Now we plug these values in the formula for Xm: • Xm = (.19) (.52/.61) = 0.1619 0.16 • So the parameter under study is (0.36,) • This admissible deviation indicates that the parameter can be varied from 0.52 to 1.00 without inducing a change in the most likely stage for the patient.

Scenario Sensitivity • Scenario sensitivity means sensitivity to the scenarios with possibly additional evidence. • Here scenario sensitivities are discussed in the context of a naïve Bayesian Network. • In this section the authors trys to figure out • - how much impact additional evidence could have on the probability distribution over the class variable and • -how sensitive this impact is to the inaccuracies in the network’s parameters.

Scenario Sensitivity • First We try to figure out the impact of additional evidence on an output probability of interest • For this a ratio is considered • Let E⁰ and Eⁿ be a set of feature variables • Where, Φ⊆ E⁰ ⊂ Eⁿ ⊆ E and • Eⁿ- E⁰={E₁ ,…E l} , 1≤ l≤ n • Let e⁰ and eⁿ be the instances • Then, for each class value c • -> posterior probability prior to obtaining new information

Scenario Sensitivity • We Consider a Naïve Bayesian network example , which is an extension of example 1 used. • An additional CT Scan of upper abdomen can be performed to find whether the patient has cancer • But to decide whether we need to perform a scan we need posterior probability distributions given an additional positive result and given an additional negative result from the scan. • Probability of a positive test result is -> ∑ sPr(CT-liver =yes|S). Pr(S|e) = 0.12

Scenario Sensitivity • Original Probability of Stage IVB = 0.11 (example 1) • So new probability of stage IVB would be • Pr(IVB|eⁿ) = (0.69∕0.12) . 011 = 0.63 • Therefore the most likely stage would change from IVA to IVB given a positive test result. • Similarly impact of negative test result can also be calculated. • Sensitivity of this impact to inaccuracies in network parameters is found out. • Equation used is • h(x)= • Where x is the network parameter.

Scenario Sensitivity • X belongs to a variable from the set Eⁿ- E⁰ , So denominator is constant w.r.t x • So h(x) only depends upon the numerator and it can be calculated for each feature parameter x once Pr(C|eⁿ) is available. • Previous example is considered again and inaccuracies of parameter probability • x=θ (CT-liver=yes|IVB) • Where feature variable is CT-liver • Pr(IVB|eⁿ) was 0.63 • Prior Probability of Stage IVB was 0.11 • Therefore Sensitivity Function is and we compute • Probability of IVB increased from 0.11 to 0.63 i.e. 5.7 times . • If x is varied, IVB becomes at most 6.4 times.

Conclusion • Sensitivity-analysis techniques used to study effects of parameter inaccuracies on the posterior probability distributions from a naive Bayesian network. • Sensitivity parameters determined solely by the original value of the parameter under study and Posterior Probability distribution. • Properties Help us to study robustness of naïve Bayesian Classifier. • A novel notion of scenario sensitivity was introduced • Scenario sensitivities expressed in terms of standard sensitivity functions • Future work -> to study the properties of scenario sensitivity functions for all classifier parameters and Bayesian Networks in General

Evidence and scenario sensitivities in naïve Bayesian classifiers

Evidence and scenario sensitivities in naïve Bayesian classifiers

Presentation Transcript

Object Oriented Bayesian Networks for the Analysis of Evidence

Bayesian Statistics and Belief Networks

Bayesian Networks

Bayesian Statistics and Belief Networks

Bayesian Reasoning

Inference in Bayesian Nets

Improve Naïve Bayesian Classifier by Discriminative Training

Bayesian Statistics and Belief Networks

Classifiers

Chapter 6: Automatic Classification (Supervised Data Organization)

Applying Bayesian evidence synthesis in comparative effectiveness research

Bayesian Classifiers

Applying Bayesian networks

IV. Sensitivity Analysis for Initial Model

Bayesian inference

Connecting frequentist and bayesian evidence for multivariate point hypotheses

Evidence and scenario sensitivities in naïve Bayesian classifiers

Classifiers

Bayesian Classification

Ensemble Classifiers

Bayesian Learning

Classification Bayesian Classifiers