Computer Science CPSC 322 Lecture 28 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Computer Science CPSC 322 Lecture 28 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Lecture Overview • Recap of Lecture 27 • Logic Wrap-Up • Intro to Reasoning Under Uncertainty • Motivation • Introduction to Probability • Random Variables and Possible World Semantics • Probability Distributions • Marginalization (time permitting)

Datalog live(W)  wire(W) ∧connected_to(W,W1) ∧ wire(W1) ∧ live(W1). ? connected_to(W, w1) • An extension of propositional definite clause (PDC) logic • We now have variables • We now have relationships between variables • We can express knowledge that holds for a set of individuals, writing more powerful clauses by introducing variables, such as: • We can ask generic queries, • E.g. “which wires are connected to w1?“

Summary of Datalog Syntax Datalog Expression Term ti Definite Clause Query: ?body Variable e.g. X, L1 Constant e.g. alan, w1 a ← b1∧ … ∧bn where a and b1.. bn are atoms e.g. in(X,Z) ← in(X,Y) ∧ part-of(Y,Z) atomic p e,g, sunny, liveOutside p(t1,….tn) e,g, in(alan, 123)

DataLogSematics • Role of semantics is still to connect symbols and sentences in the language with the target domain. Main difference: • need to create correspondence both betweenterms and individuals, as well as between predicate symbols and relations We won’t cover the formal definition of Datalog semantics, but if you are interested see 12.3.1 and 12.3.2 in textbook

Datalog: Top Down Proof Procedure in(alan, r123). part_of(r123,cs_building). in(X,Y)  part_of(Z,Y) & in(X,Z). • Extension of Top-Down procedure for PDCL that uses unification • ability of dealing with variables when matching clauses in the answer query with rule heads in KB • Not covered in the couse. See P&M Section 12.4.2, p. 511 for the details. Example Query: yes  in(alan, cs_building). in(X,Y) part_of(Z,Y) & in(X,Z). with Y = cs_building X = alan yes part_of(Z,cs_building), in(alan, Z). Using clause: part_of(r123,cs_building) with Z = r123 yes  in(alan, r123).

Learning Goals For Logic • PDCL syntax & semantics • Verify whether a logical statement belongs to the language of propositional definite clauses • Verify whether an interpretation is a model of a PDCL KB. • Verify when a conjunction of atoms is a logical consequence of a KB • Bottom-up proof procedure • Define/read/write/trace/debug the Bottom Up (BU) proof procedure • Prove that the BU proof procedure is sound and complete • Top-down proof procedure • Define/read/write/trace/debug the Top-down (SLD) proof procedure • Define it as a search problem • Datalog • Represent simple domains in Datalog • Apply the Top-down proof procedure in Datalog

Logics: Big Picture Propositional Definite Clause Logics Semantics and Proof Theory Satisfiability Testing (SAT) Propositional Logics First-Order Logics Description Logics Hardware Verification Production Systems Product Configuration Ontologies Cognitive Architectures Semantic Web Video Games Summarization Tutoring Systems Information Extraction

Semantic Web: Extracting data • Examples for typical queries • How much is a typical flight to Mexico for a given date? • What’s the cheapest vacation package to some place in the Caribbean in a given week? • Plus, the hotel should have a white sandy beach and scuba diving • If webpages are based on basic HTML • Humans need to scout for the information and integrate it • Computers are not reliable enough (yet?) • Natural language processing (NLP) can be powerful (see Watson and Siri!) • But some information may be in pictures (beach), or implicit in the text, so existing NLP techniques still don’t get all the info.

More structured representation: the Semantic Web • Beyond HTML pages only made for humans • Languages and formalisms based on description logicsthat allow websites to include rich, explicit information on • relevant concepts, individual and their relationships \ • Goal: software agents that can roam the web and carry out sophisticated tasks on our behalf, based on these richer representations • Different than searching content for keywords and popularity. • Infer meaning from content based on metadata and assertions that have already been made. • Automatically classify and integrate information • For further material, P&M text, Chapter 13. Also • the Introduction to the Semantic Web tutorial given at 2011 Semantic TechnologyConferencehttp://www.w3.org/People/Ivan/CorePresentations/SWTutorial/

Examples of ontologies for the Semantic Web • “Ontology”: logic-based representation of the world • eClassOwl: eBusiness ontology • for products and services • 75,000 classes (types of individuals) and 5,500 properties • National Cancer Institute’s ontology: 58,000 classes • Open Biomedical Ontologies Foundry: several ontologies • including the Gene Ontology to describe • gene and gene product attributes in any organism or protein sequence • OpenCyc project: a 150,000-concept ontology including • Top-level ontology • describes general concepts such as numbers, time, space, etc • Hierarchical composition: superclasses and subclasses • Many specific concepts such as “OLED display”, “iPhone”

A different example of applications of logic • Cognitive Tutors (http://pact.cs.cmu.edu/) • computer tutors for a variety of domains (math, geometry, programming, etc.) • Provide individualized support to problem solving exercises, as good human tutors do • Rely on logic-based, detailed computational models of skills and misconceptions underlying a learning domain. • CanergieLearning (http://www.carnegielearning.com/ ): • a company that commercializes these tutors, sold to hundreds of thousands of high schools in the USA

Where are we? Representation Reasoning Technique Stochastic Deterministic • Environment Problem Type Arc Consistency Constraint Satisfaction Vars + Constraints Search Static Belief Nets Logics Variable Elimination Query Search Decision Nets Sequential STRIPS Variable Elimination Planning Search Done with Deterministic Environments Markov Processes Value Iteration

Where are we? Representation Reasoning Technique Stochastic Deterministic • Environment Problem Type Arc Consistency Constraint Satisfaction Vars + Constraints Search Static Belief Nets Logics Variable Elimination Query Search Decision Nets Sequential STRIPS Variable Elimination Planning Search Second Part of the Course Markov Processes Value Iteration

Where Are We? Representation Reasoning Technique Stochastic Deterministic • Environment Problem Type Arc Consistency Constraint Satisfaction Vars + Constraints Search Static Belief Nets Logics Variable Elimination Query Search Decision Nets Sequential STRIPS Variable Elimination Planning Search We’ll focus on Belief Nets Markov Processes Value Iteration

Two main sources of uncertainty (From Lecture 2) • Sensing Uncertainty: The agent cannot fully observe a state of interest. For example: • Right now, how many people are in this room? In this building? • What disease does this patient have? • Where is the soccer player behind me? • Effect Uncertainty: The agent cannot be certain about the effects of its actions. For example: • If I work hard, will I get an A? • Will this drug work for this patient? • Where will the ball go when I kick it?

Motivation for uncertainty • To act in the real world, we almost always have to handle uncertainty (both effect and sensing uncertainty) • Deterministic domains are an abstraction • Sometimes this abstraction enables more powerful inference • Now we don’t make this abstraction anymore • AI main focus shifted from logic to probability in the 1980s • The language of probability is very expressive and general • New representations enable efficient reasoning • We will see some of these, in particular Bayesian networks • Reasoning under uncertainty is part of the ‘new’ AI • This is not a dichotomy: framework for probability is logical! • New frontier: combine logic and probability

Interesting article about AI and uncertainty • “The machine age” , by Peter Norvig (head of research at Google) • New York Post, 12 February 2011 • http://www.nypost.com/f/print/news/opinion/opedcolumnists/the_machine_age_tM7xPAv4pI4JslK0M1JtxI • “The things we thought were hard turned out to be easier.” • Playing grandmaster level chess, or proving theorems in integral calculus • “Tasks that we at first thought were easy turned out to be hard.” • A toddler (or a dog) can distinguish hundreds of objects (ball,bottle, blanket, mother, …) just by glancing at them • Very difficult for computer vision to perform at this level • “Dealing with uncertainty turned out to be more important than thinking with logical precision.” • Reasoning under uncertainty (and lots of data) are key to progress

Probability as a measure of uncertainty/ignorance • Probability measures anagent's degree of beliefin truth of propositions about states of the world • It does not measure how true a proposition is • Propositions are true or false. We simply may not know exactly which. • Example: • I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’?

Probability as a measure of uncertainty/ignorance • Probability measures anagent's degree of beliefin truth of propositions about states of the world • It does not measure how true a proposition is • Propositions are true or false. We simply may not know exactly which. • Example: • I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? • It is 1/6 ≈ 16.7%. • I now look at the dice. What is ‘the’ (my) probability now? • My probability is now • Your probability • What if I tell some of you the result is even? • Their probability becomes

Probability as a measure of uncertainty/ignorance • Probability measures anagent's degree of beliefin truth of propositions about states of the world • It does not measure how true a proposition is • Propositions are true or false. We simply may not know exactly which. • Example: • I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? • It is 1/6 ≈ 16.7%. • I now look at the dice. What is ‘the’ (my) probability now? • My probability is now either 1 or 0, depending on what I observed. • Your probabilityhasn’t changed: 1/6 ≈ 16.7% • What if I tell some of you the result is even? • Their probability increases to 1/3 ≈ 33.3%, if they believe me • Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have.

Probability as a measure of uncertainty/ignorance • Probability measures an agent's degree of beliefin truth of propositions about states of the world • Belief in a proposition fcan be measured in terms of a number between 0 and 1 • this is the probability of f • E.g:P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167 • Using probabilities between 0 and 1 is purely a convention. • P(f) = 0 means that f is believed to be . Probably true Probably false Definitely false Definitely true

Probability as a measure of uncertainty/ignorance • Probability measures an agent's degree of beliefin truth of propositions about states of the world • Belief in a proposition fcan be measured in terms of a number between 0 and 1 • this is the probability of f • E.g. P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167 • Using probabilities between 0 and 1 is purely a convention. • P(f) = 0 means that f is believed to be • Definitely false: the probability of f being true is zero. • Likewise, P(f) = 1 means f is believed to be definitely true .

Probability Theory and Random Variables • Probability Theory • system of logical axioms and formal operations for sound reasoning under uncertainty • Basic element: random variable X • X is a variable like the ones we have seen in CSP/Planning/Logic • but the agent can be uncertain about the value of X • As usual, the domain of a random variable X, written dom(X), is the set of values X can take • Types of variables • Boolean: e.g., Cancer (does the patient have cancer or not?) • Categorical: e.g., CancerTypecould be one of {breastCancer,lungCancer, skinMelanomas} • Numeric: e.g., Temperature (integer or real) • We will focus on Boolean and categorical variables

Possible Worlds • A possible world specifies an assignment to each random variable • E.g., if we model only two Boolean variables Cavity and Toothache, then there are distinct possible worlds: w1: Cavity = T Toothache = T w2: Cavity = T  Toothache = F w3; Cavity = F  Toothache = T w4: Cavity = T  Toothache = T • possible worlds are mutually exclusive and exhaustive • w╞ f means that proposition f is true in world w • A probability measure (w) over possible worlds w is a nonnegative real number such that • (w) sums to 1 over all possible worlds w • The probability of proposition fis defined by: P(f)=Σ w╞ fµ(w). i.e. • sum of the probabilities of the worlds w in which f is true Why does this make sense?

Possible Worlds • A possible world specifies an assignment to each random variable • E.g., if we model only two Boolean variables Cavity and Toothache, then there are distinct possible worlds: w1: Cavity = T Toothache = T w2: Cavity = T  Toothache = F w3; Cavity = F  Toothache = T w4: Cavity = T  Toothache = T • possible worlds are mutually exclusive and exhaustive • w╞ f means that proposition f is true in world w • A probability measure (w) over possible worlds w is a nonnegative real number such that • (w) sums to 1 over all possible worlds w • The probability of proposition fis defined by: P(f)=Σ w╞ fµ(w). i.e. • sum of the probabilities of the worlds w in which f is true Why does this make sense? Because for sure we are in one of these worlds!

Possible Worlds Semantics Not enough information to answer 0.4 1 0.6 • w╞ f means that proposition f is true in world w • A probability measure (w) over possible worlds w is a nonnegative real number such that • (w) sums to 1 over all possible worlds w • The probability of proposition f is defined by: P(f)=Σ w╞ fµ(w). • Example: weather in Vancouver • one Boolean variable: Weather with domain {sunny, cloudy} • Possible worlds: w1: Weather = sunny w2: Weather = cloudy • If p(Weather = sunny) = 0.4 • What is the probability of p(Weather = cloudy)?

Possible Worlds Semantics • w╞ f means that proposition f is true in world w • A probability measure (w) over possible worlds w is a nonnegative real number such that • (w) sums to 1 over all possible worlds w • The probability of proposition f is defined by: P(f)=Σ w╞ fµ(w). • Example: weather in Vancouver • one Boolean variable: Weatherwith domain {sunny, cloudy} • Possible worlds: w1: Weather = sunny w2: Weather = cloudy • If p(Weather = sunny) = 0.4 • What is the probability of p(Weather = cloudy)? • p(Weather = sunny) = 0.4 means that (w1) is 0.4 • (w1) and (w2) have to sum to 1 (those are the only 2 possible worlds) • So (w2) has to be 0.6, and thus p(Weather = cloudy) = 0.6

One more example 0.1 0.2 0.3 1 • Now we have an additional variable: • Temperature, with domain {hot, mild, cold} • There are now 6 possible worlds: • What’s the probability of itbeing cloudy and cold? Hint: 0.10 + 0.20 + 0.10 + 0.05 + 0.35 = 0.8

One more example • Now we have an additional variable: • Temperature, with domain {hot, mild, cold} • There are now 6 possible worlds: • What’s the probability of itbeing cloudy and cold? Hint: 0.10 + 0.20 + 0.10 + 0.05 + 0.35 = 0.8 • It is 0.2: the probability has to sum to 1 over all possible worlds

One more example 1 0.6 0.3 0.7 • Remember • The probability of proposition f is defined by: P(f)=Σ w╞ fµ(w) • sum of the probabilities of the worlds w in which f is true • Now we have an additional variable: • Temperature, with domain {hot, mild, cold} • There are now 6 possible worlds: • What’s the probability of itbeing cloudy or cold?

One more example • Remember • The probability of proposition f is defined by: P(f)=Σ w╞ fµ(w) • sum of the probabilities of the worlds w in which f is true • Now we have an additional variable: • Temperature, with domain {hot, mild, cold} • There are now 6 possible worlds: • What’s the probability of itbeing cloudy or cold? • µ(w3) + µ(w4) + µ(w5) + µ(w6) = 0.7

Probability Distributions Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) Consider the case where possible worlds are simply assignments to one random variable. • When dom(X) is infinite we need a probability density function • We will focus on the finite case

Joint Distribution • Joint distribution over random variables X1, …, Xn: • a probability distribution over the joint random variable <X1, …, Xn>with domain dom(X1) × … × dom(Xn) (the Cartesian product) • Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds • Each row corresponds to an assignment X1= x1, …, Xn= xn and its probability P(X1= x1,… ,Xn= xn) • We can also write P(X1= x1  …  Xn= xn) • The sum of probabilities across the whole table is 1. • E.g., {Weather, Temperature} example from before

Learning Goals For Today’s Class Heads up: study these concepts, especially marginalization If you don’t understand them well you will get lost quickly • Define and give examples of random variables, their domains and probability distributions • Calculate the probability of a proposition f given µ(w) for the set of possible worlds • Define a joint probability distribution (JPD) • NEXT TIME • Marginalize over specific variables to compute distributions over any subset of the variables • Conditioning

Computer Science CPSC 322 Lecture 28 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)