Notes for CS3310 Artificial Intelligence Part 7: Reasoning under uncertainty

Notes for CS3310 Artificial IntelligencePart 7: Reasoning under uncertainty • Prof. Neil C. Rowe Naval Postgraduate School Version of January 2006

Rules with probabilities • Probability arguments in predicates indicate their degree of truth on a scale of 0 to 1. • a(0.7) :- b(1.0).("a is true with probability 0.7 if b is true.” • or "a is true 70% of the time that b is true.”) • (0.7 is a "rule strength probability“, or conditional probability “a” given “b” – mathematicians call this p(a|b) ) • a(P) :- b(P).("a is true with probability P if b is too.”) • (P in b(P) is an "evidence probability") • a(0.7) :- b(P), P > 0.5. ("a is true with probability 0.7 if b is more than 50% certain.”) • (0.5 is an "inference probability threshold") • a(P) :- b(P2), P is P2*0.5. ("a is true half the time when b is true.”) (0.5 is another form of rule strength probability) • a(P) :- b(P2), c(P3), P is P2*P3. (“a is true with probability the product of the probabilities of b and c.”) (P is the "andcombine" of P2 and P3 – the probability of “a” decreases with either the a decrease in “b” or “c”)

Probability examples for car repair • Many useful expert-system rules in the form of "cause if effect” require probabilities. • Probabilities are always the last argument. • You wouldn't use all of the following in an expert system, just the one or two most correct. • battery(dead,0.6) :- ignition(wont_start,1.0). • battery(dead,P) :- voltmeter(battery_terminals,abnormal,P). • battery(dead,P) :- voltmeter(battery_terminals,abnormal,P) and P>0.1.

More probability examples for car repair • battery(dead,P) :- electrical_problem(P2), and P = P2*0.5. • battery(dead,P) :- age(battery,old,P2) and P = P2*0.1. • battery(dead,P) :- electrical_problem(P2), age(battery,old,P3), P is P2*P3*0.57. • battery(dead,L) :- electrical_problem(P2), age(battery,old,P3), problem_yesterday(P4), expensive(battery,P5), • L is (0.5*P2)+(0.1*P3)+(0.6*P4)-(0.1*P5) • L in the last rule is a "likelihood" not a probability -- it can be more than 1.0. This last rule is like an artificial neuron.

Combining rule strengths with evidence probabilities • Given: battery(dead,0.6) :- electrical_problem. • But suppose electrical_problem is itself uncertain. Maybe the lights don't work and the radio won't play in a car, but there could be other causes. • Let's say the probability of an electrical problem is 0.7. How do we combine that with 0.6? It should decrease the overall probability somehow. • Let "H" be "battery dead" (i.e., the "hypothesis"), and "E" be "electrical problem" (the "evidence"). • Then from probability theory: • p ( H  E ) = p ( H | E ) p ( E ) • (“The probability of H and E is the product of the probability of H given E and the probability of E”)

Combining probabilities, cont. • The left side is what we want to know, the probability that both the battery is dead and that we have an electrical problem. • The first thing on the right side is the conditional probability of H given E, or 0.6 in the example. • The last thing is the a priori probability of E, or 0.7 in the example. • So the answer is 0.42. And in general: • battery(dead,P) :- electrical_problem(P2), • P = P2*0.6.

Combining disjunctive evidence • total_fuse(State,P) :- bagof(X,fuse(State,X),XL), orcombine(XL,P). • fuse(blown,P) :- cord(frayed,P2), notworking(1.0), P is P2*0.3. • fuse(blown,P) :- sound(pop,T,P2), • event(plug_in_device, T2), almost_simultaneous(T,T2,P3), andcombine([P2,P3],P4), P is P4*0.8. • cord(break_in_cord,P) :- cord(frayed,P2), notworking(1.0), P is P2*0.4. • almost_simultaneous(T1,T2,P) :- P is 1/(1+((T1-T2)*(T1-T2)))

Notes on the previous page • Almost_simultaneous is a "fuzzy" routine; it computes a probability that two times (measured in seconds) are close enough together to be "almost simultaneous" for an average person. • Bagof collects the probabilities of rules that succeed in concluding that the fuse is in some state. It’s built-in in Prolog. Its 3 arguments are a variable, a predicate expression containing the variable, and the list of possible bindings of that variable. • Use: • orcombine([P],P). • orcombine([P1|L],P) :- orcombine(L,P2), P is P1+P2-(P1*P2). • andcombine([P],P). • andcombine([P1|L],P) :- andcombine(L,P2), P is P1*P2.

Examples of Prolog’s built-in “bagof” • Given the database: • f(a,c). • f(a,e). • f(b,c). • f(d,e). • Then: • ?- bagof(X,f(X,c),L). • L=[a,b] • (“Make a list of all X such that f(X,c) succeeds.”) • ?- bagof([X,Y],f(X,Y),L). • L=[[a,c],[a,e],[b,c],[d,e]] • (“Make a list of all X,Y pairs for which f(X,Y) succeeds.”) • ?- bagof(X,f(X,Y),L). • L=[a,b] Y=c • L=[a,d] Y=e • (“Bind Y to something and make a list of the X values for which f(X,Y) succeeds.”) • ?- bagof(X^Y,f(X,Y),L). • L=[a,b,d] • (“Find all X for which there exists a Y such that f(X,Y) succeeds.”)

Zero probabilities are not the same as negations • Suppose we want to add probabilities and a rule strength of 0.8 to the rule: • a if b and not c. • What it c has a probability of 0.001? Does that count as "not"? • If evidence has probabilities, we should avoid negating it, just invert the associated probability. • If b and c are independent, the example becomes: • a(P) if b(P2) and c(P3) and P = P2*(1-P3)*0.8.

Practice in writing probability rules • (Use diagnosis and symptom predicates of 2 arguments.) • 1. "If it's definitely foggy today, it will be foggy tomorrow with probability 0.75” • 2. "If it's definitely humid and not unusually warm, it will be foggy tomorrow with probability 0.9". • 3. Rewrite (1) assuming fogginess today has a degree of likelihood (like if you're indoors).

Practice in writing probability rules, cont. • 4. Rewrite (2) assuming that being humid and being unusually warm have degrees of likelihood and are independent. • 5. What is the probability that it will be foggy tomorrow, assuming that it is certainly foggy today, it is humid with probability 0.5, and it is unusually warm with probability 0.2? Assume all probabilities are independent.

Probabilities from statistics on a population • Suppose 16 cases (cars) appear in the repair shop today. • Case 1(B,S) Case 2(B,S) Case 3 Case 4 • Case 5(B,S) Case 6(B,S) Case 7 Case 8 • Case 9 (B) Case 10(S) Case 11(S) Case 12(S) • Case 13 Case 14 Case 15(S) Case 16 • B = battery is dead, S = car won’t start • Notice: p ( B  S ) = p ( B ) + p ( S ) - p ( B  S ) • since 9/16 = 5/16 + 8/16 - 4/16. • But the formula holds for any B and S.

Classic probability combination formulae • Given probabilities "p" and "q" to combine. • These formulae are commutative and associative. With three to combine, apply the formula to any two, then combine that result with a third; etc. • Independence means the presence of one event does not change the probability of the other event. Conservative and liberal assumptions are appropriate if an event implies the presence or absence of another. p q p q p q

Exercise with the combination methods • Given three pieces of evidence supporting the same conclusion with probabilities 0.8, 0.6, and 0.5 each. • Andcombine: • conservative: • independence: • liberal: • Orcombine: • conservative: • independence: • liberal: • Note always: conservative “and”  independence “and” •  liberal “and”  conservative “or”  independence “or” •  liberal “or”

Fuzziness • It means that some input is numeric instead of true/false. We must usually convert it to a probability. • Examples: speed of a missile S, for threat assessment; a patient's temperature T, for a medical expert system. • We could compute f(T) = | T - 98.6 | , and the larger this number is, the sicker the patient is. Problem: this can be > 1, so isn't a probability, and can't be orcombined or andcombined. • We could compute g(X) = 1-(1/(1+((T-98.6)(T-98.6)))). This will be 0 when T = 98.6, and approaches 1 if T is very high or very low. But steepness of curve is not adjustable. • We could compute h(T) = 1-(1/(1+ ((T-98.6)(T-98.6)/K))), and K can be adjusted. • We could compute i(T) = 1 - exp(-(T-98.6)(T-98.6)/K) where “exp(x)” means e to the x power. This uses normal distribution (hence has sound theory behind it), and is adjustable. • There are also ways to handle fuzziness without converting to a probability, fuzzy set theory.

Bayes' Rule for uncertainty handling • Let H be some hypotheses or conclusion; let E be some collection of evidence. The laws of probability give the following theorem: • p ( H | E ) = p ( E | H ) p ( H ) / p ( E ) • (p(H|E) means the probability of H given E.) This allows us to reason "backwards" from evidence to causes (or "hypotheses"); the real world moves from causes to evidence. • If E = E1  E2  ... , the needed probabilities are harder to get. But then we may be able to assume independence of some of the factors, and multiply them. This idea is used in "Bayesian networks" which illustrate what factors affect which others.

Examples of Bayes' Rule to get rule strengths • E1 (evidence 1) is car won't start; E2 is radio won't play; E3 is headlights don't shine; E = E1  E2  E3; H (hypothesis) is battery is dead. • Assume p ( E1 ) = 0.05, p ( E2 ) = 0.04, p ( E3 ) = 0.08, and p(H) = 0.03. Then by Bayes' Rule, assuming dead battery implies all the evidence: p ( H | E1 ) = p ( E1 | H ) p ( H ) / p ( E1 ) = 1*0.03/0.05 = 0.6; p ( H | E2 ) = 1*0.03/0.04 = 0.75; p ( H | E3 ) = 1*0.03/0.08 = 0.375. • Now suppose all three pieces of evidence are present: • --By conservative assumption: max(0.6 ,0.75) = 0.75, max(0.75,0.375) = 0.75 • --By liberal assumption: min(1 ,0.6+0.75) = 1, min(1,1+0.375)=1. • --By independence assumption: p ( H | E1  E2  E3 ) = 1 - ( 1 - 0.6 ) ( 1 - 0.75 ) ( 1 - 0.375 ) = 0.9375.

Naïve Bayes reasoning • Suppose E1 and E2 are two pieces of evidence for hypothesis H. Then by Bayes’ Rule: • p(H | (E1E2))= p((E1E2)|H) p( H ) / p(E1 E2) • If we assume E1 and E2 are “conditionally independent” of one another with respect to H: • p(H | (E1E2)) = p(E1|H) p(E2|H) p(H ) / p(E1 E2) • Use Bayes’ Rule twice, and this is equal to: • p(H|E1) p(E1) p(H|E2) p(E2) p(H) / (p(E1 E2) p(H) p(H)) • Also if E1 and E1 are conditionally independent: • p(~H | (E1E2)) = p(E1|~H) p(E2|~H) p(~H ) / p (E1 E2) • = p(~H|E1) p(E1) p(~H|E2) p(E2) p(~H) • / (p (E1 E2) p(~H) p(~H) • Settingthe ratio of left sides equal to the ratio of right sides, the p(E1), p(E2), and (p(E1 E2) cancel out and we have: • p(H | (E1E2)) / p(~H | (E1E2)) = [p(H|E1)p(~H)/(p(~H|E1)p(H))] * [p(H|E2)p(~H)/(p(~H|E2)p(H))] * • [p(H)/p(~H)]

Naïve Bayes reasoning, cont. • Define odds as o(X) = p(X)/p(~X) = p(X)/(1-p(X)). • Then p(X) = o(X) / (1+o(X)). • Then the equation becomes: • o ( H | (E1E2) ) = • [o (H | E1) / o(H)] * [o(H | E2) / o(H)] * o(H) • This is the odds form of “Naïve Bayesinference”. • With more than two pieces of evidence: • o ( H | (E1E2...En) ) = • [o (H | E1) / o(H)] * [o(H | E2) / o(H)] *... • * [o(H | En) / o(H)] * o(H) • Sopositive evidence increases odds and negative evidence decreases odds. • To use, convert probabilities to odds; apply the above formula; convert odds back to probabilities.

Bayesian reasoning for air defense • A ship would like to assign probabilities of hostility to objects observed on radar. • Factors that can be used: speed, altitude, use of abrupt turns, whether in an airlane, source airport, and apparent destination, • Keep statistics on objects observed in some area of the world, and correlate this to the eventual identities discovered for the objects. Use these to derive odds of hostility. • Odds for each factor can be learned from experience, though an adversary could try to fool you.

Bayesian reasoning for naval air defense

Stochastic grammar rules to generate behavior • Another way to use probabilities is to use them to generate behavior. For instance, attach them to rules of a “context-free grammar” to generate random strings – like random error messages: • Prob. 0.4: msg :- write(‘Fatal error at ‘), number, write(‘: ‘), fault. • Prob. 0.6: msg :- write(‘Error in ’), number, write(‘: ‘), fault. • Prob. 0.5: number :- digit, digit, digit, digit, digit. • Prob. 0.5: number :- digit, digit, digit, digit. • Prob. 0.1: digit :- write(‘0’). • Prob. 0.1: digit :- write(‘1’). • …. • Prob. 0.5: fault :- write(‘Segmentation fault’). • Prob. 0.3: fault :- write(‘Protection violation’). • Prob. 0.4: fault :- write(‘File not found’).

Artificial neural net example • Suppose you want to classify shapes in photographs • Construct something like an inference network (and-or-not graph) but where probabilities are combined, not boolean operations computed: a neural network. • For China Lake photographs, shapes in the image can be "sky", "dirt", "aircraft", and "person". Useful evidence is "blueness", "redness", "has many right angles", and "color uniformity". Good initial weights must be estimated by intuition.

Example artificial neural network blueness 1 sky • Equation relating outputs to inputs: redness 2 dirt 3 # right angles aircraft uniformity manmade-ness person 4

Neural nets • Like inference networks, but probabilities are computed instead of just "true" or "false". Two alternatives: • (1) The probabilities are expressed digitally. Then "and” and "or" gates are special-purpose integrated circuits computing the formulae. • (2) The probabilities are analog voltages, and the gates are analog integrated circuits. • Neural nets can "learn" by adjusting rule-strength probabilities to improve performance. • However, there are many ways an AI system can learn by itself besides using a neural net: caching, indexing, reorganizing, and generalizing. Neural nets are not the only way to make an AI system learn.

The artificial neuron • The most common way is a device or program that computes: • The are inputs; f is the output probability; and the w sub i are adjustable constants ("weights"). The g and the h represent some nonlinear monotonic function, often or or or • 1 minus these (the second is called the “hyperbolic tangent function”). • Increasing inputs for g should means increasing outputs, but as the inputs get large, the increase in f slows down. This is like neurons in the brain. It helps prevent information overload. • This is also like a liberal orcombine, but with included rule strengths on each input.

Input 1 Input 2 Input 3 General 3-input 4-output 2-layer artificial neural network

More about artificial neural networks • You can have multiple levels (“layers”), with neuron outputs as inputs of other neurons. • If your “g” function is linear, the artificial neurons are “perceptrons”. • Inputs can be booleans (represented as 0 or 1), weighted and combined just like probabilities. • You have one output neuron for every final conclusion. • Output of a neuron can be compared to a threshold; then you get a boolean, and can use logical reasoning from then on.

Backward propagation • Most neural networks are multilayer. • "Backward propagation" or "backpropagation" is the most popular way that these networks learn. • It works by estimating the partial derivative with respect to each weight of an incorrect output value, then uses that to determine how much to change each. • It assumes: • The weight connecting concept i at layer j to concept k at layer j+1 is the same as the weight connecting concept k back to i. • This is much like assuming p(A|B) = p(B|A), rarely true. Nonetheless, it often works!

What is the first answer of a Prolog interpreter to each of the following queries, assuming the definitions of list-processing predicates given in the notes? ?- member(foo,[bar,foo,baz]). ?- member('foo',[bar,foo,baz]). ?- member(Foo,[bar,foo,baz]). ?- member('Foo',[bar,foo,baz]). ?- member(foo,[[bar,foo]]). ?- member([X,Y],[[bar,foo]]). ?- delete(bar, [foo,bar,bag,bar,bag], Ans). ?- delete(bar, [[foo,bar],[bag,bar,bag]], Ans). ?- delete(bar,[foo,bar],[foo]). ?- length([foo,bar,baz],Count). ?- length([foo,bar],3). ?- first([foo,bar,baz],First). ?- last([foo,bar,baz],Last). ?- append([foo,bar],[foo,bar],X). ?- append([foo,bar], X, [foo,bar,baz,bag]). ?- append(X,Y,[foo,bar,baz,bag]). ?- append(X,X,[foo,bar,baz,bag]). ?- append([X], [Y], [foo,bar,baz,bag]). ?- append([X|Y],Z,[foo,bar,baz]). ?- append([X,Y],Z,[foo,bar,baz]). ?- append(X, [bar|Y], [foo,bar,baz]). Short practice questions on list processing

Review on rule-cycle hybrid chaining • a :- v, t. • a :- b, u, not(t). • m(X) :- n(X), b. • b :- c. • t :- r, s. • u :- v, r. • r. • v. • c. • n(12).

Practice question on probabilities • Given: • 1. 10 times out of 50, when a patient complains of chest pains, it's a heart problem. • 2. 35 times out 50, when a patient complains of chest pains, it's indigestion. • 3. 6 times out of 10, if a patient has a family history of heart problems, then they have a heart problem. • 4. Tom complains of chest pains and has a family history of heart problems. • 5. 1 of 10 patients have heart problems.

Practice question on probabilities, cont. • (a) Write the above information as rules and facts. • (b) What is the probability that Tom has a heart problem, using the independence assumption for combining probabilities? • (c) Write a rule for the probability someone has a heart problem given they both have chest pains and a family history; again assume independence. • (d) Rewrite the last rule assuming chest pains and family history have a degree of uncertainty and they are independent of one another.

Practice question of probabilities, cont. (2) • (e) Using that last rule, what is the probability that Tom has heart problem if he thinks with 0.7 probability he has chest pains and experts would say with 0.8 probability that he has a family history of heart problems? • (f) Using the Naïve Bayes odds formula for the rules and facts without evidence uncertainty, what is the probability that Tom has a heart problem? • (g) Using a perceptron neuron with equal weights of 1 on the two factors of chest pains and family history, no input nonlinear function, and x*x/(1+(x*x)) as the nonlinear output function, what is the probability that Tom has a heart problem with the evidence probabilities in (e)?

List-defining practice questions • 1. Using append, define a Prolog function predicate third_last(List,Item) that returns the third item from the end of a list. • 2. Define in Prolog a function predicate allpairs that returns a list of all the pairs of items that occur in a list. The first item in each pair must have occurred before the second item of the pair in the original list argument. • 3. A Prolog rule to find the middle N items (that is, N items exactly in the center) of a list L is: • (i) midn(L,N,M) :- append(L1,L2,L), append(L3,M,L1), length(M,N), length(L2,Q), length(L3,Q). • (ii) midn(L,N,M) :- append(L1,L2,L), append(M,L3,L1), length(M,N), length(L2,Q), length(L3,Q). • (iii) midn(L,N,M) :- append(L1,L2,L), append(L3,M,L1), length(M,Q), length(L2,Q), length(L3,N). • (iv) midn(L,N,M) :- append(L1,L2,L), append(M,L3,L1), length(M,Q), length(L2,Q), length(L3,N).

Another probability practice question • Suppose that previous experience says that in general: • There's a 50% chance of a bug in 100 lines of Ada code; • There's a 60% chance of a bug in 100 lines of C++ code; • Tom has a 30% chance of finding a bug in 100 lines if there is one; • Dick has a 40% chance of finding a bug in 100 lines if there is one. • Suppose a program contains 100 lines of C++ code and 100 lines of Ada code. Tom debugs the C++ and Dick debugs the Ada. What is the probability a bug will be found, assuming probabilities are independent? Show your arithmetic.

Notes for CS3310 Artificial Intelligence Part 7: Reasoning under uncertainty

Notes for CS3310 Artificial Intelligence Part 7: Reasoning under uncertainty

Presentation Transcript

Artificial Life

Artificial Intelligence Chapter 4: Informed Search and Exploration

Intelligence Chapter 10

Chapter 6: Knowledge-based Decision Support and Artificial Intelligence

Explorations in Artificial Intelligence

Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, Univer

344-571 ปัญญาประดิษฐ์ ( Artificial Intelligence)

Artificial Intelligence

CptS 440 / 540 Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Informed (Heuristic) Search and Exploration

Reasoning and Deduction

Abduction, Uncertainty, and Probabilistic Reasoning

人工智能 Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints

The Collective Intelligence of Diverse Agents: Micro Foundations of Uncertainty