intro to ai uncertainty n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Intro to AI Uncertainty PowerPoint Presentation
Download Presentation
Intro to AI Uncertainty

Loading in 2 Seconds...

play fullscreen
1 / 29

Intro to AI Uncertainty - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

Intro to AI Uncertainty. Ruth Bergman Fall 2002. Why Not Use Logic?. Suppose I want to write down rules about medical diagnosis: Diagnostic rules: A x has(x,sorethroat)  has(x, cold) Causal rules: A x has(x,cold)  has(x, sorethroat) Clearly, this isn’t right: Diagnostic case:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Intro to AI Uncertainty' - cadence


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
intro to ai uncertainty

Intro to AIUncertainty

Ruth Bergman

Fall 2002

why not use logic
Why Not Use Logic?
  • Suppose I want to write down rules about medical diagnosis:

Diagnostic rules: A x has(x,sorethroat)  has(x, cold)

Causal rules: A x has(x,cold)  has(x, sorethroat)

  • Clearly, this isn’t right:

Diagnostic case:

      • we may not know exactly which collections of symptoms or tests allow us to infer a diagnosis (qualification problem)
      • even if we did we may not have that information
      • even if we do, how do we know it is correct?

Causal rules:

      • Symptoms don’t usually appear guaranteed; note logical case would use contrapositive
      • There are lots of causes for symptoms; if we miss one we might get an incorrect inference
      • How do we reason backwards?
uncertainty
Uncertainty
  • The problem with pure FOL is that it deals with black and write
  • The world isn’t black and write because of uncertainty:
    • Uncertainty due to imprecision or noise
    • Uncertainty because we don’t know everything about the domain
    • Uncertainty because in practice we often cannot acquire all the information we’d like.
  • As a result, we’d like to assign a degree of belief (or plausibility or possibility) to any statement we make
    • note this is different than a degree of truth!
ways of handling uncertainty
Ways of Handling Uncertainty
  • MYCIN: operationalize uncertainty with the rules:
      • a  b with certainty 0.7
      • we know a with certainty 1
    • ergo, we know b with 0.7
    • but, we if we also know
      • a  c with certainty 0.6
      • b v c  d with certainty 1
    • do we know d with certainty .7, .6, .88, 1, ....?
      • suppose a ~e and ~e  ~d ....
    • In a rule-based system, such non-local dependencies are hard to catch
probability
Probability
  • Problems such as this have led people to invent lots of calculi for uncertainty; probability still dominates
  • Basic idea:
    • I have some DoB (a prior probability) about some proposition p
    • I receive evidence about p; the evidence is related to p by a conditional probability
    • From these two quantities, I can compute an updated DoB about p --- a posterior probability
probability review
Probability Review
  • Basic probability is on propositions or propositional statements:
    • P(A) (A is a proposition)
      • P(Accident), P(phonecall), P(Cold)
    • P(X = v) (X is a random variable; v a value)
      • P(card = JackofClubs), P(weather=sunny), ....
    • P(A v B), P(A ^ B), P(~A) ...
    • Referred to as the prior or unconditional probability
  • The conditional probability of A given B

P(A | B) = P(A,B)/P(B)

    • the product rule P(A,B) = P(A | B) * P(B)
  • Conditional independence P(A | B) = P(A)
    • A is conditionally independent of B
probability review1
Probability Review
  • The joint distribution of A and B
    • P(A,B) = x ( equivalent to P(A ^ B) = x)

P(A=1,B) = .1

P(A=1) = .1 + .2 = .3

P(A =1 | B) = .1/.4 = .25

bayes theorem
Bayes Theorem
  • P(A,B) = P(A | B) P(B) = P(B | A) P(A)

P(A|B) = P(B | A) P(A) / P(B)

  • Example: what is the probability of meningitis when a patient has a stiff neck?

P(S|M) = 0.5

P(M) = 1/50000

P(S) = 1/20

P(M|S) = P(S|M)P(M)/P(S) = 0.5 * 1/50000 / 1/20 = 0.0002

  • More general

P(A | B , E) = P(B | A , E) P(A | E)/ P(B | E)

alarm system example
Alarm System Example
  • A burglary alarm system is fairly reliable at detecting burglary
  • It may also respond to minor earthquakes
  • Neighbors John and Mary will call when they hear the alarm
  • John always calls when he hears the alarm
  • He sometimes confuses the telephone with the alarm and calls
  • Mary sometimes misses the alarm
  • Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.
alarm system example1
Alarm System Example
  • P(Alarm|Burglary) A burglary alarm system is fairly reliable at detecting burglary
  • P(Alarm|Earthquake) It may also respond to minor earthquakes
  • P(JohnCalls|Alarm), P(MaryCalls|Alarm) Neighbors John and Mary will call when they hear the alarm
  • John always calls when he hears the alarm
  • P(JohnCalls|~Alarm) He sometimes confuses the telephone with the alarm and calls
  • Mary sometimes misses the alarm
  • Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. P(Burglary|JohnCalls,MaryCalls)
influence diagrams

earthquake

burglary

alarm

John calls

Mary calls

Influence Diagrams
  • Another way to present this information is an influence diagram
influence diagrams1

earthquake

burglary

alarm

John calls

Mary calls

Influence Diagrams
  • A set of random variables.
  • A set of directed arcs
    • An arc from X to Y means that X has influence on Y.
  • Each node has an associated conditional probability table.
  • The graph has no directed cycle.
conditional probability tables

earthquake

burglary

alarm

John calls

Mary calls

Conditional Probability Tables
  • Each row contains the conditional probability for a possible combination of values of the parent nodes
  • Each row must sum to 1
belief network for the alarm

P(E)

P(B)

earthquake

burglary

0.002

0.001

B E

P(A)

T T

T F

F T

F F

0.95

0.94

0.29

0.001

alarm

John calls

Mary calls

A

P(A)

A

P(A)

T

F

0.90

0.05

T

F

0.70

0.01

Belief Network for the Alarm
the semanics of belief networks
The Semanics of Belief Networks
  • The probability that the alarm sounded but neither a burglary nor an earthquake has occurred and both John and Mary call
    • P(J ^ M ^ A ^ ~B ^ ~E) = P(J | A) P(M | A) P(A | ~B ^ ~E) P(~B) P(~E) =

0.9 * 0.7 * 0.001 * 0.999* 0.998 = 0.00062

  • More generally, we can write this as
    • P(x1, ... xn) = πi P(xi | Parents(Xi))
constructing belief networks
Constructing Belief Networks
  • Choose the set of variables Xi that describe the domain
  • Choose an ordering for the variables
    • Ideally, work backward from observables to root causes
  • While there are variables left:
    • Pick a variable Xi and add it to the network
    • Set Parents{Xi} to the minimal set of nodes such that conditional independence holds
    • Define the conditional probability table for Xi
  • Once you’re done, its likely you’ll realize you need to fiddle a little bit!
node ordering
Node Ordering
  • The correct order to add nodes is
    • Add the “root causes” first
    • Then the variables they influence
    • And so on…
  • Alarm example: consider the ordering
    • MaryCalls, JohnCalls, Alarm, Burglary, Earthquake
    • MaryCalls, JohnCalls, Earthquake, Burglary, Alarm

John calls

Mary calls

earthquake

burglary

alarm

probabilistic inference
Probabilistic Inference
  • Diagnostic inference (from effets to causes)
    • Given that JohnCalls, infer that P(B|J) = 0.016
  • Causal inference (from causes to effects)
    • Given Burglary, P(J|B) = 0.86 and P(M|B) = 0.67
  • Intercausal inference (between causes of a common effect)
    • Given Alarm, P(B|A) = 0.376
    • If Earthquake is also true, P(B|A^E) = 0.003
  • Mixed inference (combining two or more of the above)
    • P(A|J ^ ~E) = 0.03
    • P(B|J ^ ~E) = 0.017
conditional independence

X

Y

E

Z

Z

Z

Conditional Independence

D-separation

  • if every undirected path from a set of nodes X to a set of nodes Y is d-separated by E, then X and Y are conditionally independent given E
  • a set of nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E
conditional independence1

X

Y

E

Z

Z

Z

Conditional Independence
  • An undirected path from X to Y is blocked given E if there is a node Z s.t.
    • Z is in E and there is one arrow leading in and one arrow leading out
    • Z is in E and Z has both arrows leading out
    • Neither Z nor any descendant of Z is in E and both path arrows lead into Z
an inference algorithm for belief networks
An Inference Algorithm for Belief Networks
  • In order to develop an algorithm, we will assume our networks are singly connected
    • A network is singly connected if there is at most a single undirected path between nodes in the network
      • note this means that any two nodes can be d-separated by removing a single node
    • These are also known as polytrees.
  • We will then consider a generic node X with parents U1...Um,and children Y1 ... Yn.
    • parents of Yi are Zi,j
    • Evidence above X is Ex+; below is Ex-
singly connected network

U1

Um

Z1j

Z1j

Y1

Y1

Singly Connected Network

Ex+

X

Ex-

inference in belief networks
Inference in Belief Networks
  • P(X|Ex) = P(X | Ex+, Ex-) = k P(Ex- | X, Ex+) P(X | Ex+)k P(Ex- | X) P(X | Ex+)
    • the last follows by noting that X d-separates its parents and children
  • Now, we note that we can apply the product rule to the second term

P(X | Ex+) = Σu P(X | u, Ex+) P(u | Ex+) = ΣuP(X | u) πi P(ui | EU/X)

again, these last facts follow from conditional independence

  • Note that we now have a recursive algorithm: the first term in the sum is just a table lookup; the second is what we started with on a smaller set of nodes.

i

inference in belief networks1
Inference in Belief Networks
  • P(X|E) = k P(Ex- | X) P(X | Ex+)
  • The evaluation for the first expression is similar, but more involved, yielding

P(X | Ex+) = k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

  • P(Ex-| yi) is a recursive instance of P(Ex- | X)
  • P(yi | X, zi ) is a conditional probability table entry for Yi
  • P(zij | EZij/Yi) is a recursive instance of the P(X|E) calculation
the algorithm
The Algorithm

Support-Except(X,V) return P(X| Ex/v)

if EVIDENCE(X) then return point dist for X

else

calculate P(E-x/v| X) = evidence-except(X,V)

U  parents(X)

if U is empty

then return normalized P(E-x/v| X) P(X)

else

for each Ui in U

calculate and store P(Ui|Eui/X) = support-except(Ui,X)

return k P(Ex- | X) ΣuP(X | u) πi P(ui | EU/X)

the algorithm1
The Algorithm

Evidence-Except(X,V) return P(E-X\V| X)

Y  children[X] – V

if Y is empty

then return a uniform distribution

else

for each Yiin Y do

calculate P(E-Yi|yi) = Evidence-Except(Yi, null)

Zi = PARENTS(Yi) – X

foreach Zij in Zi

calculate P(Zij | Ezij\Yi) = Support-Except(Zij,Yi)

return k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

the call
The Call
  • For a node X, call Support-Except(X,null)
pathfinder
PathFinder
  • Diagnostic system for lymph node disease
  • Pathfinder IV a Bayesian model
    • 8 hrs devising vocabulary
    • 35 hrs defining topology
    • 40 hrs to make 14000 probability assessments
    • most recent version appears to outperform the experts who designed it!
other uncertainty calculi
Other Uncertainty Calculi
  • Dempster-Shafer Theory
    • Ignorance: there are sets which have no probability
    • In this case, the best you can do, in some cases, is bound the probability
    • D-S theory is one way of doing this
  • Fuzzy Logic
    • Suppose we introduce a fuzzy membership function (a degree of membership
    • Logical semantics are based on set membership
    • Thus, we get a logic with degrees of truth
      • e.g. John is a big man  bigman(John) w. truth value 0.