- 86 Views
- Updated on

Download Presentation
## Intro to AI Uncertainty

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Why Not Use Logic?

- Suppose I want to write down rules about medical diagnosis:

Diagnostic rules: A x has(x,sorethroat) has(x, cold)

Causal rules: A x has(x,cold) has(x, sorethroat)

- Clearly, this isn’t right:

Diagnostic case:

- we may not know exactly which collections of symptoms or tests allow us to infer a diagnosis (qualification problem)
- even if we did we may not have that information
- even if we do, how do we know it is correct?

Causal rules:

- Symptoms don’t usually appear guaranteed; note logical case would use contrapositive
- There are lots of causes for symptoms; if we miss one we might get an incorrect inference
- How do we reason backwards?

Uncertainty

- The problem with pure FOL is that it deals with black and write
- The world isn’t black and write because of uncertainty:
- Uncertainty due to imprecision or noise
- Uncertainty because we don’t know everything about the domain
- Uncertainty because in practice we often cannot acquire all the information we’d like.
- As a result, we’d like to assign a degree of belief (or plausibility or possibility) to any statement we make
- note this is different than a degree of truth!

Ways of Handling Uncertainty

- MYCIN: operationalize uncertainty with the rules:
- a b with certainty 0.7
- we know a with certainty 1
- ergo, we know b with 0.7
- but, we if we also know
- a c with certainty 0.6
- b v c d with certainty 1
- do we know d with certainty .7, .6, .88, 1, ....?
- suppose a ~e and ~e ~d ....
- In a rule-based system, such non-local dependencies are hard to catch

Probability

- Problems such as this have led people to invent lots of calculi for uncertainty; probability still dominates
- Basic idea:
- I have some DoB (a prior probability) about some proposition p
- I receive evidence about p; the evidence is related to p by a conditional probability
- From these two quantities, I can compute an updated DoB about p --- a posterior probability

Probability Review

- Basic probability is on propositions or propositional statements:
- P(A) (A is a proposition)
- P(Accident), P(phonecall), P(Cold)
- P(X = v) (X is a random variable; v a value)
- P(card = JackofClubs), P(weather=sunny), ....
- P(A v B), P(A ^ B), P(~A) ...
- Referred to as the prior or unconditional probability
- The conditional probability of A given B

P(A | B) = P(A,B)/P(B)

- the product rule P(A,B) = P(A | B) * P(B)
- Conditional independence P(A | B) = P(A)
- A is conditionally independent of B

Probability Review

- The joint distribution of A and B
- P(A,B) = x ( equivalent to P(A ^ B) = x)

P(A=1,B) = .1

P(A=1) = .1 + .2 = .3

P(A =1 | B) = .1/.4 = .25

Bayes Theorem

- P(A,B) = P(A | B) P(B) = P(B | A) P(A)

P(A|B) = P(B | A) P(A) / P(B)

- Example: what is the probability of meningitis when a patient has a stiff neck?

P(S|M) = 0.5

P(M) = 1/50000

P(S) = 1/20

P(M|S) = P(S|M)P(M)/P(S) = 0.5 * 1/50000 / 1/20 = 0.0002

- More general

P(A | B , E) = P(B | A , E) P(A | E)/ P(B | E)

Alarm System Example

- A burglary alarm system is fairly reliable at detecting burglary
- It may also respond to minor earthquakes
- Neighbors John and Mary will call when they hear the alarm
- John always calls when he hears the alarm
- He sometimes confuses the telephone with the alarm and calls
- Mary sometimes misses the alarm
- Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.

Alarm System Example

- P(Alarm|Burglary) A burglary alarm system is fairly reliable at detecting burglary
- P(Alarm|Earthquake) It may also respond to minor earthquakes
- P(JohnCalls|Alarm), P(MaryCalls|Alarm) Neighbors John and Mary will call when they hear the alarm
- John always calls when he hears the alarm
- P(JohnCalls|~Alarm) He sometimes confuses the telephone with the alarm and calls
- Mary sometimes misses the alarm
- Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. P(Burglary|JohnCalls,MaryCalls)

burglary

alarm

John calls

Mary calls

Influence Diagrams- Another way to present this information is an influence diagram

burglary

alarm

John calls

Mary calls

Influence Diagrams- A set of random variables.
- A set of directed arcs
- An arc from X to Y means that X has influence on Y.
- Each node has an associated conditional probability table.
- The graph has no directed cycle.

burglary

alarm

John calls

Mary calls

Conditional Probability Tables- Each row contains the conditional probability for a possible combination of values of the parent nodes
- Each row must sum to 1

P(B)

earthquake

burglary

0.002

0.001

B E

P(A)

T T

T F

F T

F F

0.95

0.94

0.29

0.001

alarm

John calls

Mary calls

A

P(A)

A

P(A)

T

F

0.90

0.05

T

F

0.70

0.01

Belief Network for the AlarmThe Semanics of Belief Networks

- The probability that the alarm sounded but neither a burglary nor an earthquake has occurred and both John and Mary call
- P(J ^ M ^ A ^ ~B ^ ~E) = P(J | A) P(M | A) P(A | ~B ^ ~E) P(~B) P(~E) =

0.9 * 0.7 * 0.001 * 0.999* 0.998 = 0.00062

- More generally, we can write this as
- P(x1, ... xn) = πi P(xi | Parents(Xi))

Constructing Belief Networks

- Choose the set of variables Xi that describe the domain
- Choose an ordering for the variables
- Ideally, work backward from observables to root causes
- While there are variables left:
- Pick a variable Xi and add it to the network
- Set Parents{Xi} to the minimal set of nodes such that conditional independence holds
- Define the conditional probability table for Xi
- Once you’re done, its likely you’ll realize you need to fiddle a little bit!

Node Ordering

- The correct order to add nodes is
- Add the “root causes” first
- Then the variables they influence
- And so on…
- Alarm example: consider the ordering
- MaryCalls, JohnCalls, Alarm, Burglary, Earthquake
- MaryCalls, JohnCalls, Earthquake, Burglary, Alarm

John calls

Mary calls

earthquake

burglary

alarm

Probabilistic Inference

- Diagnostic inference (from effets to causes)
- Given that JohnCalls, infer that P(B|J) = 0.016
- Causal inference (from causes to effects)
- Given Burglary, P(J|B) = 0.86 and P(M|B) = 0.67
- Intercausal inference (between causes of a common effect)
- Given Alarm, P(B|A) = 0.376
- If Earthquake is also true, P(B|A^E) = 0.003
- Mixed inference (combining two or more of the above)
- P(A|J ^ ~E) = 0.03
- P(B|J ^ ~E) = 0.017

Y

E

Z

Z

Z

Conditional IndependenceD-separation

- if every undirected path from a set of nodes X to a set of nodes Y is d-separated by E, then X and Y are conditionally independent given E
- a set of nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E

Y

E

Z

Z

Z

Conditional Independence- An undirected path from X to Y is blocked given E if there is a node Z s.t.
- Z is in E and there is one arrow leading in and one arrow leading out
- Z is in E and Z has both arrows leading out
- Neither Z nor any descendant of Z is in E and both path arrows lead into Z

An Inference Algorithm for Belief Networks

- In order to develop an algorithm, we will assume our networks are singly connected
- A network is singly connected if there is at most a single undirected path between nodes in the network
- note this means that any two nodes can be d-separated by removing a single node
- These are also known as polytrees.
- We will then consider a generic node X with parents U1...Um,and children Y1 ... Yn.
- parents of Yi are Zi,j
- Evidence above X is Ex+; below is Ex-

Inference in Belief Networks

- P(X|Ex) = P(X | Ex+, Ex-) = k P(Ex- | X, Ex+) P(X | Ex+)k P(Ex- | X) P(X | Ex+)
- the last follows by noting that X d-separates its parents and children
- Now, we note that we can apply the product rule to the second term

P(X | Ex+) = Σu P(X | u, Ex+) P(u | Ex+) = ΣuP(X | u) πi P(ui | EU/X)

again, these last facts follow from conditional independence

- Note that we now have a recursive algorithm: the first term in the sum is just a table lookup; the second is what we started with on a smaller set of nodes.

i

Inference in Belief Networks

- P(X|E) = k P(Ex- | X) P(X | Ex+)
- The evaluation for the first expression is similar, but more involved, yielding

P(X | Ex+) = k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

- P(Ex-| yi) is a recursive instance of P(Ex- | X)
- P(yi | X, zi ) is a conditional probability table entry for Yi
- P(zij | EZij/Yi) is a recursive instance of the P(X|E) calculation

The Algorithm

Support-Except(X,V) return P(X| Ex/v)

if EVIDENCE(X) then return point dist for X

else

calculate P(E-x/v| X) = evidence-except(X,V)

U parents(X)

if U is empty

then return normalized P(E-x/v| X) P(X)

else

for each Ui in U

calculate and store P(Ui|Eui/X) = support-except(Ui,X)

return k P(Ex- | X) ΣuP(X | u) πi P(ui | EU/X)

The Algorithm

Evidence-Except(X,V) return P(E-X\V| X)

Y children[X] – V

if Y is empty

then return a uniform distribution

else

for each Yiin Y do

calculate P(E-Yi|yi) = Evidence-Except(Yi, null)

Zi = PARENTS(Yi) – X

foreach Zij in Zi

calculate P(Zij | Ezij\Yi) = Support-Except(Zij,Yi)

return k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

The Call

- For a node X, call Support-Except(X,null)

PathFinder

- Diagnostic system for lymph node disease
- Pathfinder IV a Bayesian model
- 8 hrs devising vocabulary
- 35 hrs defining topology
- 40 hrs to make 14000 probability assessments
- most recent version appears to outperform the experts who designed it!

Other Uncertainty Calculi

- Dempster-Shafer Theory
- Ignorance: there are sets which have no probability
- In this case, the best you can do, in some cases, is bound the probability
- D-S theory is one way of doing this
- Fuzzy Logic
- Suppose we introduce a fuzzy membership function (a degree of membership
- Logical semantics are based on set membership
- Thus, we get a logic with degrees of truth
- e.g. John is a big man bigman(John) w. truth value 0.

Download Presentation

Connecting to Server..