Intro to AI Uncertainty

1 / 29

# Intro to AI Uncertainty - PowerPoint PPT Presentation

Intro to AI Uncertainty. Ruth Bergman Fall 2002. Why Not Use Logic?. Suppose I want to write down rules about medical diagnosis: Diagnostic rules: A x has(x,sorethroat)  has(x, cold) Causal rules: A x has(x,cold)  has(x, sorethroat) Clearly, this isn’t right: Diagnostic case:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Intro to AI Uncertainty

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Intro to AIUncertainty

Ruth Bergman

Fall 2002

Why Not Use Logic?
• Suppose I want to write down rules about medical diagnosis:

Diagnostic rules: A x has(x,sorethroat)  has(x, cold)

Causal rules: A x has(x,cold)  has(x, sorethroat)

• Clearly, this isn’t right:

Diagnostic case:

• we may not know exactly which collections of symptoms or tests allow us to infer a diagnosis (qualification problem)
• even if we did we may not have that information
• even if we do, how do we know it is correct?

Causal rules:

• Symptoms don’t usually appear guaranteed; note logical case would use contrapositive
• There are lots of causes for symptoms; if we miss one we might get an incorrect inference
• How do we reason backwards?
Uncertainty
• The problem with pure FOL is that it deals with black and write
• The world isn’t black and write because of uncertainty:
• Uncertainty due to imprecision or noise
• Uncertainty because we don’t know everything about the domain
• Uncertainty because in practice we often cannot acquire all the information we’d like.
• As a result, we’d like to assign a degree of belief (or plausibility or possibility) to any statement we make
• note this is different than a degree of truth!
Ways of Handling Uncertainty
• MYCIN: operationalize uncertainty with the rules:
• a  b with certainty 0.7
• we know a with certainty 1
• ergo, we know b with 0.7
• but, we if we also know
• a  c with certainty 0.6
• b v c  d with certainty 1
• do we know d with certainty .7, .6, .88, 1, ....?
• suppose a ~e and ~e  ~d ....
• In a rule-based system, such non-local dependencies are hard to catch
Probability
• Problems such as this have led people to invent lots of calculi for uncertainty; probability still dominates
• Basic idea:
• I have some DoB (a prior probability) about some proposition p
• I receive evidence about p; the evidence is related to p by a conditional probability
• From these two quantities, I can compute an updated DoB about p --- a posterior probability
Probability Review
• Basic probability is on propositions or propositional statements:
• P(A) (A is a proposition)
• P(Accident), P(phonecall), P(Cold)
• P(X = v) (X is a random variable; v a value)
• P(card = JackofClubs), P(weather=sunny), ....
• P(A v B), P(A ^ B), P(~A) ...
• Referred to as the prior or unconditional probability
• The conditional probability of A given B

P(A | B) = P(A,B)/P(B)

• the product rule P(A,B) = P(A | B) * P(B)
• Conditional independence P(A | B) = P(A)
• A is conditionally independent of B
Probability Review
• The joint distribution of A and B
• P(A,B) = x ( equivalent to P(A ^ B) = x)

P(A=1,B) = .1

P(A=1) = .1 + .2 = .3

P(A =1 | B) = .1/.4 = .25

Bayes Theorem
• P(A,B) = P(A | B) P(B) = P(B | A) P(A)

P(A|B) = P(B | A) P(A) / P(B)

• Example: what is the probability of meningitis when a patient has a stiff neck?

P(S|M) = 0.5

P(M) = 1/50000

P(S) = 1/20

P(M|S) = P(S|M)P(M)/P(S) = 0.5 * 1/50000 / 1/20 = 0.0002

• More general

P(A | B , E) = P(B | A , E) P(A | E)/ P(B | E)

Alarm System Example
• A burglary alarm system is fairly reliable at detecting burglary
• It may also respond to minor earthquakes
• Neighbors John and Mary will call when they hear the alarm
• John always calls when he hears the alarm
• He sometimes confuses the telephone with the alarm and calls
• Mary sometimes misses the alarm
• Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.
Alarm System Example
• P(Alarm|Burglary) A burglary alarm system is fairly reliable at detecting burglary
• P(Alarm|Earthquake) It may also respond to minor earthquakes
• P(JohnCalls|Alarm), P(MaryCalls|Alarm) Neighbors John and Mary will call when they hear the alarm
• John always calls when he hears the alarm
• P(JohnCalls|~Alarm) He sometimes confuses the telephone with the alarm and calls
• Mary sometimes misses the alarm
• Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. P(Burglary|JohnCalls,MaryCalls)

earthquake

burglary

alarm

John calls

Mary calls

Influence Diagrams
• Another way to present this information is an influence diagram

earthquake

burglary

alarm

John calls

Mary calls

Influence Diagrams
• A set of random variables.
• A set of directed arcs
• An arc from X to Y means that X has influence on Y.
• Each node has an associated conditional probability table.
• The graph has no directed cycle.

earthquake

burglary

alarm

John calls

Mary calls

Conditional Probability Tables
• Each row contains the conditional probability for a possible combination of values of the parent nodes
• Each row must sum to 1

P(E)

P(B)

earthquake

burglary

0.002

0.001

B E

P(A)

T T

T F

F T

F F

0.95

0.94

0.29

0.001

alarm

John calls

Mary calls

A

P(A)

A

P(A)

T

F

0.90

0.05

T

F

0.70

0.01

Belief Network for the Alarm
The Semanics of Belief Networks
• The probability that the alarm sounded but neither a burglary nor an earthquake has occurred and both John and Mary call
• P(J ^ M ^ A ^ ~B ^ ~E) = P(J | A) P(M | A) P(A | ~B ^ ~E) P(~B) P(~E) =

0.9 * 0.7 * 0.001 * 0.999* 0.998 = 0.00062

• More generally, we can write this as
• P(x1, ... xn) = πi P(xi | Parents(Xi))
Constructing Belief Networks
• Choose the set of variables Xi that describe the domain
• Choose an ordering for the variables
• Ideally, work backward from observables to root causes
• While there are variables left:
• Pick a variable Xi and add it to the network
• Set Parents{Xi} to the minimal set of nodes such that conditional independence holds
• Define the conditional probability table for Xi
• Once you’re done, its likely you’ll realize you need to fiddle a little bit!
Node Ordering
• The correct order to add nodes is
• Add the “root causes” first
• Then the variables they influence
• And so on…
• Alarm example: consider the ordering
• MaryCalls, JohnCalls, Alarm, Burglary, Earthquake
• MaryCalls, JohnCalls, Earthquake, Burglary, Alarm

John calls

Mary calls

earthquake

burglary

alarm

Probabilistic Inference
• Diagnostic inference (from effets to causes)
• Given that JohnCalls, infer that P(B|J) = 0.016
• Causal inference (from causes to effects)
• Given Burglary, P(J|B) = 0.86 and P(M|B) = 0.67
• Intercausal inference (between causes of a common effect)
• Given Alarm, P(B|A) = 0.376
• If Earthquake is also true, P(B|A^E) = 0.003
• Mixed inference (combining two or more of the above)
• P(A|J ^ ~E) = 0.03
• P(B|J ^ ~E) = 0.017

X

Y

E

Z

Z

Z

Conditional Independence

D-separation

• if every undirected path from a set of nodes X to a set of nodes Y is d-separated by E, then X and Y are conditionally independent given E
• a set of nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E

X

Y

E

Z

Z

Z

Conditional Independence
• An undirected path from X to Y is blocked given E if there is a node Z s.t.
• Z is in E and there is one arrow leading in and one arrow leading out
• Z is in E and Z has both arrows leading out
• Neither Z nor any descendant of Z is in E and both path arrows lead into Z
An Inference Algorithm for Belief Networks
• In order to develop an algorithm, we will assume our networks are singly connected
• A network is singly connected if there is at most a single undirected path between nodes in the network
• note this means that any two nodes can be d-separated by removing a single node
• These are also known as polytrees.
• We will then consider a generic node X with parents U1...Um,and children Y1 ... Yn.
• parents of Yi are Zi,j
• Evidence above X is Ex+; below is Ex-

U1

Um

Z1j

Z1j

Y1

Y1

Singly Connected Network

Ex+

X

Ex-

Inference in Belief Networks
• P(X|Ex) = P(X | Ex+, Ex-) = k P(Ex- | X, Ex+) P(X | Ex+)k P(Ex- | X) P(X | Ex+)
• the last follows by noting that X d-separates its parents and children
• Now, we note that we can apply the product rule to the second term

P(X | Ex+) = Σu P(X | u, Ex+) P(u | Ex+) = ΣuP(X | u) πi P(ui | EU/X)

again, these last facts follow from conditional independence

• Note that we now have a recursive algorithm: the first term in the sum is just a table lookup; the second is what we started with on a smaller set of nodes.

i

Inference in Belief Networks
• P(X|E) = k P(Ex- | X) P(X | Ex+)
• The evaluation for the first expression is similar, but more involved, yielding

P(X | Ex+) = k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

• P(Ex-| yi) is a recursive instance of P(Ex- | X)
• P(yi | X, zi ) is a conditional probability table entry for Yi
• P(zij | EZij/Yi) is a recursive instance of the P(X|E) calculation
The Algorithm

Support-Except(X,V) return P(X| Ex/v)

if EVIDENCE(X) then return point dist for X

else

calculate P(E-x/v| X) = evidence-except(X,V)

U  parents(X)

if U is empty

then return normalized P(E-x/v| X) P(X)

else

for each Ui in U

calculate and store P(Ui|Eui/X) = support-except(Ui,X)

return k P(Ex- | X) ΣuP(X | u) πi P(ui | EU/X)

The Algorithm

Evidence-Except(X,V) return P(E-X\V| X)

Y  children[X] – V

if Y is empty

then return a uniform distribution

else

for each Yiin Y do

calculate P(E-Yi|yi) = Evidence-Except(Yi, null)

Zi = PARENTS(Yi) – X

foreach Zij in Zi

calculate P(Zij | Ezij\Yi) = Support-Except(Zij,Yi)

return k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

The Call
• For a node X, call Support-Except(X,null)
PathFinder
• Diagnostic system for lymph node disease
• Pathfinder IV a Bayesian model
• 8 hrs devising vocabulary
• 35 hrs defining topology
• 40 hrs to make 14000 probability assessments
• most recent version appears to outperform the experts who designed it!
Other Uncertainty Calculi
• Dempster-Shafer Theory
• Ignorance: there are sets which have no probability
• In this case, the best you can do, in some cases, is bound the probability
• D-S theory is one way of doing this
• Fuzzy Logic
• Suppose we introduce a fuzzy membership function (a degree of membership
• Logical semantics are based on set membership
• Thus, we get a logic with degrees of truth
• e.g. John is a big man  bigman(John) w. truth value 0.