Loading in 5 sec....

Knowledge RepresentationsPowerPoint Presentation

Knowledge Representations

- 211 Views
- Updated On :
- Presentation posted in: Pets / Animals

Knowledge Representations. One large distinction between an AI system and a normal piece of software is that an AI system must reason using worldly knowledge What types of knowledge? Facts Axioms Statements (which may or may not be true) Rules Cases Experiences

**Knowledge Representation**in Psychology**Knowledge Representation**and Reasoning**Knowledge Representation**AI**Knowledge Representation**Cognitive**Knowledge Representation**in Data Analysis**Knowledge Representation**and Reasoning System**Knowledge Representation**and Reasoning PDF**Knowledge Representation**Techniques

Knowledge Representations

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- One large distinction between an AI system and a normal piece of software is that an AI system must reason using worldly knowledge
- What types of knowledge?
- Facts
- Axioms
- Statements (which may or may not be true)
- Rules
- Cases
- Experiences
- Associations (which may not be truth preserving)
- Descriptions
- Probabilities and Statistics

- What types of knowledge?

- Early systems used either
- semantic networks or predicate calculus to represent knowledge
- or used simple search spaces if the domain/problem had very limited amounts of knowledge (e.g., simple planning as in blocks world)

- With the early expert systems in the 70s, a significant shift took place to production systems, which combined representation and process (chaining) and even uncertainty handling (certainty factors)
- later, frames (an early version of OOP) were introduced

- Problem-specific approaches were introduced such as scripts and CDs for language representation
- In the 1980s, there was a shift from rules to model-based approaches
- Since the 1990s, Bayesian networks and hidden Markov Models have become popular
- First, we will take a brief look at some of the representations

- Given a problem expressed as a state space (whether explicitly or implicitly)
- Formally, we define a search space as [N, A, S, GD]
- N = set of nodes or states of a graph
- A = set of arcs (edges) between nodes that correspond to the steps in the problem (the legal actions or operators)
- S = a nonempty subset of N that represents start states
- GD = a nonempty subset of N that represents goal states

- Our problem becomes one of traversing the graph from a node in S to a node in GD
- Example:
- 3 missionaries and 3 cannibals are on one side of the river with a boat that can take exactly 2 people across the river
- how can we move the 3 missionaries and 3 cannibals across the river such that the cannibals never outnumber the missionaries on either side of the river (lest the cannibals start eating the missionaries!)

- 3 missionaries and 3 cannibals are on one side of the river with a boat that can take exactly 2 people across the river

- We can represent a state as a 6-item tuple: (a, b, c, d, e, f)
- a/b = number of missionaries/cannibals on left shore
- c/d = number of missionaries/cannibals in boat
- e/f = number of missionaries/cannibals on right shore
- where a + b + c + d + e + f = 6
- a >= b (unless a = 0), c >= d (unless c = 0), and e >= f (unless e = 0)

- Legal operations (moves) are
- 0, 1, 2 missionaries get into boat
- 0, 1, 2 missionaries get out of boat
- 0, 1, 2 cannibals get into boat
- 0, 1, 2 missionaries get out of boat
- boat sails from left shore to right shore
- boat sails from right shore to left shore

- We often know stuff about objects (whether physical or abstract)
- These objects have attributes (components, values) and/or relationships with other things

- So, one way to represent knowledge is to enumerate the objects and describe them through their attributes and relationships
- Common forms of such relationship representations are
- semantic networks – a network consists of nodes which are objects and values, and edges (links/arcs) which are annotated to include how the nodes are related
- predicate calculus – predicates are often relationships and arguments for the predicates are objects
- frames – in essence, objects (from object-oriented programming) where attributes are the data members and the values are the specific values stored in those members – in some cases, they are pointers to other objects

Here, we see the same information being

represented using two different representational

techniques – a semantic network (above) and

predicates (to the left)

Here we see a real-world situation of three blocks and a predicate

calculus representation for expressing this knowledge

We equip our system with rules such as the below rule to reason

over how to draw conclusions and manipulate this block’s world

This rule says “if there does not exist a Y

that is on X, then X is clear

Collins and Quillian were the first to use semantic networks in AI by storing in the network the objects and their relationships

their intention was to represent English sentences

edges would typically be annotated with these descriptors or relations

isa – class/subclass

instance – the first object is an instance of the class

has – contains or has this as a physical property

can – has the ability to

made of, color, texture, etc

A semantic network to represent the

sentences “a canary can sing/fly”, “a canary

is a bird/animal”, “a canary is a canary”,

“a canary has skin”

- Quillian demonstrated how to use the semantic network to represent word meanings
- each word would have one or more networks, with links that attach words to their definition “planes”
- the word plant is represented as three planes, each of which has links to additional word planes

- The semantic network requires a graph representation which may not be a very efficient use of memory
- Another representation is the frame
- the idea behind a frame was originally that it would represent a “frame of memory” – for instance, by capturing the objects and their attributes for a given situation or moment in time
- a frame would contain slots where a slot could contain
- identification information (including whether this frame is a subclass of another frame)
- relationships to other frames
- descriptors of this frame
- procedural information on how to use this frame (code to be executed)
- defaults for slots
- instance information (or an identification of whether the frame represents a class or an instance)

Here is a partial frame

representing a hotel room

The room contains a chair,

bed, and phone where the bed

contains a mattress and a bed

frame (not shown)

- A production system is
- a set of rules (if-then or condition-action statements)
- working memory
- the current state of the problem solving, which includes new pieces of information created by previously applied rules

- inference engine (the author calls this a “recognize-act” cycle)
- forward-chaining, backward-chaining, a combination, or some other form of reasoning such as a sponsor-selector, or agenda-driven scheduler

- conflict resolution strategy
- when it comes to selecting a rule, there may be several applicable rules, which one should we select? the choice may be based on a conflict resolution strategy such as “first rule”, “most specific rule”, “most salient rule”, “rule with most actions”, “random”, etc

- The idea behind a production system’s reasoning is that rules will describe steps in the problem solving space where a rule might
- be an operation in a game like a chess move
- translate a piece of input data into an intermediate conclusion
- piece together several intermediate conclusions into a specific conclusion
- translate a goal into substeps

- So a solution using a production system is a collection of rules that are chained together
- forward chaining – reasoning from data to conclusions where working memory is sought for conditions that match the left-hand side of the given rules
- backward chaining – reasoning from goals to operations where an initial goal is unfolded into the steps needed to solve that goal, that is, the process is one of subgoaling

- Problem: given a 4-gallon jug (X) and a 3-gallon jug (Y), fill X with exactly 2 gallons of water
- assume an infinite amount of water is available

- Rules/operators
- 1. If X = 0 then X = 4 (fill X)
- 2. If Y = 0 then Y = 3 (fill Y)
- 3. If X > 0 then X = 0 (empty X)
- 4. If Y > 0 then Y = 0 (empty Y)
- 5. If X + Y >= 3 and X > 0 then X = X – (3 – y) and Y = 3 (fill Y from X)
- 6. If X + Y >= 4 and Y > 0 then X = 4 and Y = Y – (4 – X) (fill X from Y)
- 7. If X + Y <= 3 and X > 0 then X = 0 and Y = X + Y (empty X into Y)
- 8. If X + Y <= 4 and Y > 0 then X = X + Y and Y = 0 (empty Y into X)
- rule numbers used on the next slide

- In a production system, what happens when more than one rule matches?
- a conflict resolution strategy dictates how to select from between multiple matching rules

- Simple conflict resolution strategies include
- random
- first match
- most/least recently matched rule
- rule which has matched for the longest/shortest number of cycles (refractoriness)
- most salient rule (each rule is given a salience before you run the production system)

- More complex resolution strategies might
- select the rule with the most/least number of conditions (specificity/generality)
- or most/least number of actions (biggest/smallest change to the state)

- By the early 1970s, the production system approach was found to be more than adequate for constructing large scale expert systems
- in 1971, researchers at Stanford began constructing MYCIN, a medical diagnostic system
- it contained a very large rule base
- it used backward chaining
- to deal with the uncertainty of medical knowledge, it introduced certainty factors (sort of like probabilities)
- in 1975, it was tested against medical experts and performed as well or better than the doctors it was compared to

(defrule 52

if (site culture is blood)

(gram organism is neg)

(morphology organism is rod)

(burn patient is serious)

then .4

(identity organism is pseudomonas))

If the culture was taken from the patient’s

blood and the gram of the organism is

negative and the morphology of the organism

is rods and the patient is a serious burn

patient, then conclude that the identity of the

organism is pseudomonas (.4 certainty)

- Mycin’s process starts with “diagnose-and-treat”
- repeat
- identify all rules that can provide the conclusion currently sought
- match right hand sides (that is, search for rules whose right hand sides match anything in working memory)
- use conflict resolution to identify a single rule
- fire that rule
- find and remove a piece of knowledge which is no longer needed
- find and modify a piece of knowledge now that more specific information is known
- add a new subgoal (left-hand side conditions that need to be proved)

- until the action done is added to working memory

- repeat
- Mycin would first identify the illness, possibly ordering more tests to be performed, and then given the illness, generate a treatment
- Mycin consisted of about 600 rules

- Another success story is DEC’s R1
- later renamed XCON

- This system would take customer orders and configure specific VAX computers for those orders including
- completing the order if the order was incomplete
- how the various components (drive and tape units, mother board(s), etc) would be placed inside the mainframe cabinet)
- how the wiring would take place among the various components

- R1 would perform forward chaining over about 10,000 rules
- over a 6 year period, it configured some 80,000 orders with a 95-98% accuracy rating
- ironically, whereas planning/design is viewed as a backward chaining task, R1 used forward chaining because, in this particular case, the problem is data driven, starting with user input of the computer system’s specifications
- R1’s solutions were similar in quality to human solutions

- Constraint rules
- if device requires battery then select battery for device
- if select battery for device then pick battery with voltage(battery) = voltage(device)

- Configuration rules
- if we are in the floor plan stage and there is space for a power supply and there is no power supply available then add a power supply to the order
- if step is configuring, propose alternatives and there is an unconfigured device and no container was chosen and no other device that can hold it was chosen and selecting a container wasn’t proposed yet and no problems for selecting containers were identified then propose selecting a container
- if the step is distributing a massbus device and there is a single port disk drive that has not been assigned to a massbus and there are no unassigned dual port disk drives and the number of devices that each massbus should support is known and there is a massbus that has been assigned at least one disk drive and that should support additional disk drives and the type of cable needed to connect the disk drive is known, then assign the disk drive to this massbus

- To avoid the difficulties with Frames and Nets, Schank and Rieger offered two network-like representations that would have implied uses and built-in semantics: conceptual dependencies and scripts
- the conceptual dependency was derived as a form of semantic network that would have specific types of links to be used for representing specific pieces of information in English sentences
- the action of the sentence
- the objects affected by the action or that brought about the action
- modifiers of both actions and objects

- they defined 11 primitive actions, called ACTs
- every possible action can be categorized as one of these 11
- an ACT would form the center of the CD, with links attaching the objects and modifiers

- the conceptual dependency was derived as a form of semantic network that would have specific types of links to be used for representing specific pieces of information in English sentences

- The sentence is “John ate the egg”
- The INGEST act means to ingest an object (eat, drink, swallow)
- the P above the double arrow indicates past test
- the INGEST action must have an object (the O indicates it was the object Egg) and a direction (the object went from John’s mouth to John’s insides)
- we might infer that it was “an egg” instead of “the egg” as there is nothing specific to indicate which egg was eaten
- we might also infer that John swallowed the egg whole as there is nothing to indicate that John chewed the egg!

- Is this list complete?
- what actions are missing?

- Could we reduce this list to make it more concise?
- other researchers have developed other lists of primitive actions including just 3 – physical actions, mental actions and abstract actions

The sentence is “John prevented Mary from giving a book to Bill”

This sentence has two ACTs, DO and ATRANS

DO was not in the list of 11, but can be thought of as “caused to happen”

The c/ means a negative conditional, in this case it means that John caused this not to happen

The ATRANS is a giving relationship with the object being a Book and the action being from Mary to Bill – “Mary gave a book to Bill”

like with the previous example, there is no way of telling whether it is “a book” or “the book”

- The other structured representation developed by Schank (along with Abelson) is the script
- a description of the typical actions that are involved in a typical situation
- they defined a script for going to a restaurant

- scripts provide an ability for default reasoning when information is not available that directly states that an action occurred
- so we may assume, unless otherwise stated, that a diner at a restaurant was served food, that the diner paid for the food, and that the diner was served by a waiter/waitress

- a description of the typical actions that are involved in a typical situation
- A script would contain
- entry condition(s) and results (exit conditions)
- actors (the people involved)
- props (physical items at the location used by the actors)
- scenes (individual events that take place)

- The script would use the 11 ACTs from CD theory

- The script does not contain atypical actions
- although there are options such as whether the customer was pleased or not

- There are multiple paths through the scenes to make for a robust script
- what would a “going to the movies” script look like? would it have similar props, actors, scenes? how about “going to class”?

- One of the drawbacks of the knowledge representations demonstrated thus far is that all knowledge is grouped into a single, large collection of representations
- the rules taken as a whole for instance don’t denote what rules should be used in what circumstance

- Another approach is to divide the representations into logical groupings
- this permits easier design, implementation, testing and debugging because you know what that particular group is supposed to do and what knowledge should go into it
- it should be noted that by distributing the knowledge, we might use different problem solving agents for each set of knowledge so that the knowledge is stored using different representations

- this permits easier design, implementation, testing and debugging because you know what that particular group is supposed to do and what knowledge should go into it

- Which leads us to the idea of having multiple problem solving agents
- each agent is responsible for solving some specialized type of problem(s) and knows where to obtain its own input
- each agent has its own knowledge sources, some internal, some external
- since external agents may have their own forms of representation, the agent must know
- how to find the proper agents
- how to properly communicate with these other agents
- how to interpret the information that it receives from these agents
- how to recover from a situation where the expected agent(s) is/are not available

- since external agents may have their own forms of representation, the agent must know

- Agents are interactive problem solvers that have these properties
- situated – the agent is part of the problem solving environment – it can obtain its own input from its environment and it can affect its environment through its output
- autonomous – the agent operates independently of other agents and can control its own actions and internal states
- flexible – the agent is both responsive and proactive – it can go out and find what it needs to solve its problem(s)
- social – the agent can interact with other agents including humans

- Some researchers also insist that agents have
- mobility – have the ability to move from their current environment to a new environment (e.g., migrate to another processor)
- delegation – hand off portions of the problem to other agents
- cooperation – if multiple agents are tasked with the same problem, can their solutions be combined?

- The WWW is a collection of data and knowledge in an unstructured format
- Humans often can take knowledge from disparate sources and put together a coherent picture, can problem solving agents?

- Agents on the semantic web all have their own capabilities and know where to look for knowledge
- Whether a static source, or an agent that can provide the needed information through its own processing, or from a human
- The common approach is to model the knowledge of a web site using an ontology
- ontologies give agents the ability to translate the results of another agent, or the data provided from a website, into a version of knowledge that they can understand and use

- Expert System construction used to be a trial-and-error sort of approach with the knowledge engineers
- once they had knowledge from the experts, they would fill in their knowledge base and test it out

- By the end of the 80s, it was discovered that creating an actual domain model was the way to go – build a model of the knowledge before implementing anything
- A model might be
- a dependency graph of what can cause what to happen
- or an associational model which is a collection of malfunctions and the manifestations we would expect to see from those malfunctions
- or a functional model where component parts are enumerated and described by function and behavior

- The emphasis changed to knowledge acquisition tools (KADS)
- domain experts enter their knowledge as a graphical model that contains the component parts of the item being diagnosed/designed, their functions, and rules for deciding how to diagnose or design each one

- Here is a model developed by NASA for a Livingston propulsion system for rockets
- a reactive self-configuring autonomous system
- knowledge modeled using propositional calc (instead of predicate calc – there are a finite number of elements, each will be modeled by its own proposition)

Helium is the fuel tank

Oxidizer is mixed to cause the fuel to burn

Acc is the accelerometer which, along with sensors in the valves, is used as input to control the system

Pryo valves are used as control – once they

Change state, they stay in that state – so they are used to change the flow of fuel when an error is detected, opening or closing a new pathway from tank to engine

- The idea is that the configuration manager tries to keep the spacecraft moving but at the lowest cost configuration
- Sensors feed into the ME (mode estimator) to determine if the system is functioning and in the lowest configuration
- If not, the MR (mode reconfiguration) plans a new mode by determining what valves to open and close
- Since this is a spacecraft, the output of the MR is a set of actions that cause valves to open or close directly

The high level planner generates a sequence of hardware configurations goals such as the amount of propellant that should be used , it is the configuration manager that must translate these goals into actions

The design of an

elevator can be used

to generate a diagnostic

system for elevator

problems, or in VT’s

case, a system that can

design new elevators

- Representations generally represent knowledge as fact
- However, often, knowledge and the use of the knowledge brings with it a degree of uncertainty
- how can we represent and reason with uncertainty?

- We find two forms of uncertainty
- unsure input
- unknown – do not know the answer so you have to say unknown
- unclear – answer doesn’t fit question (e.g., not yes but 80% yes)
- vague data – is a 100 degree temp a “high fever” or just “fever”?
- ambiguous/noisy data – data may not be easily interpretable

- non-truth preserving knowledge (most rules are associational, not truth preserving)
- unlike “if you are a man then you are mortal”, a doctor might reason from symptoms to diseases
- “all men are mortal” denotes a class/subclass relationship, which is truth preserving
- but the symptom to disease reasoning is based on associations and is not guaranteed to be true

- unsure input

- First used in the Mycin system, the idea is that we will attribute a measure of belief to any conclusion that we draw
- CF(H | E) = MB(H | E) – MD(H | E)
- certainty factor for hypothesis H given evidence E is the measure of belief we have for H minus measure of disbelief we have for H

- CFs are applied to hypotheses that are drawn from rules
- CFs can be combined as we associate a CF with each condition and each conclusion of each rule

- CF(H | E) = MB(H | E) – MD(H | E)
- To use CFs, we need
- to annotate every rule with a CF value (this comes from the expert)
- ways to combine CFs when we use AND, OR,

- Combining rules are straightforward:
- for AND use min
- for OR use max
- for use * (multiplication)

- Assume we have the following rules:
- A B (.7)
- A C (.4)
- D F (.6)
- B AND G E (.8)
- C OR F H (.5)

- We know A, D and G are true (so each have a value of 1.0)
- B is .7 (A is 1.0, the rule is true at .7, so B is true at 1.0 * .7 = .7)
- C is .4
- F is .6
- B AND G is min(.7, 1.0) = .7 (G is 1.0, B is .7)
- E is .7 * .8 = .56
- C OR F is max(.4, .6) = .6
- H is .6 * .5 = .30

- Another combining rule is needed when we can conclude the same hypothesis from two or more rules
- we already used C OR F H (.5) to conclude H with a CF of .30
- let’s assume that we also have the rule E H (.5)
- since E is .56, we have H at .56 * .5 = .28

- We now believe H at .30 and at .28, which is true?
- the two rules both support H, so we want to draw a stronger conclusion in H since we have two independent means of support for H

- We will use the formula CF1 + CF2 – CF1*CF2
- CF(H) = .30 + .28 - .30 * .28 = .496
- our belief in H has been strengthened through two different chains of logic

- Prior to CFs, Zadeh introduced fuzzy logic to introduce “shades of grey” into logic
- other logics are two-valued, true or false only

- Here, any proposition can take on a value in the interval [0, 1]
- Being a logic, Zadeh introduced the algebra to support logical operators of AND, OR, NOT,
- X AND Y = min(X, Y)
- X OR Y = max(X, Y)
- NOT X = (1 – X)
- X Y = X * Y

- Where the values of X, Y are determined by where they fall in the interval [0, 1]

- Fuzzy sets are to normal sets what fuzzy logic is to logic
- fuzzy set theory is based on fuzzy values from fuzzy logic but includes set operations instead of logic operations

- The basis for fuzzy sets is defining a fuzzy membership function for a set
- a fuzzy set is a set of items along with their membership values in the set where the membership value defines how closely that item is to being in that set

- Example: the set tall might be denoted as
- tall = { x | f(x) = 1.0 if x > 6’2”, .8 if x > 6’, .6 if x > 5’10”, .4 if x > 5’8”, .2 if x > 5’6”, 0 otherwise}
- so we can say that a person is tall at .8 if they are 6’1” or we can say that the set of tall people are {Anne/.2, Bill/1.0, Chuck/.6, Fred/.8, Sue/.6}

- Typically, a membership function is a continuous function (often represented in a graph form like above)
- given a value y, the membership value for y is u(y), determined by tracing the curve and seeing where it falls on the u(x) axis

- How do we define a membership function?
- this is an open question

- 1. fuzzify the input(s) using fuzzy membership functions
- 2. apply fuzzy logic rules to draw conclusions
- we use the previous rules for AND, OR, NOT,

- 3. if conclusions are supported by multiple rules, combine the conclusions
- like CF, we need a combining function, this may be done by computing a “center of gravity” using calculus

- 4. defuzzify conclusions to get specific conclusions
- defuzzification requires translating a numeric value into an actionable item

- Fuzzy logic is often applied to domains where we can easily derive fuzzy membership functions and have a few rules but not a lot
- fuzzy logic begins to break down when we have more than a dozen or two rules

- We have an atmospheric controller which can increase or decrease the temperature of the air and can increase or decrease the fan based on these simple rules
- if air is warm and dry, decrease the fan and increase the coolant
- if air is warm and not dry, increase the fan
- if air is hot and dry, increase the fan and the increase the coolant slightly
- if air is hot and not dry, increase the fan and coolant
- if air is cold, turn off the fan and decrease the coolant

- Our input obviously requires the air temperature and the humidity, the membership function for air temperature is shown to the right

if it is 60, it would be considered

cold 0, warm 1, hot 0

if it is 85, it would be cold 0,

warm .3 and hot .7

- Temperature = 85, humidity indicates dry .6
- hot .7, warm .3, cold 0, dry .6, not dry .4 (not dry = 1 – dry = 1 - .6)

- Rule 1 has “warm and dry”
- warm is .3, dry is .6, so “warm and dry” = min(.3, .6) = .3

- Rule 2 has “warm and not dry”
- min(.3, .4) = .3

- Rule 3 has “hot and dry” = min(.7, .3) = .3
- our fourth and fifth rules give us 0 since cold is 0

- Our conclusions from the first three rules are to
- decrease the coolant and increase the fan at levels of .3
- increase the fan at level of .3
- increase the fan at .3 and increase the coolant slightly

- To combine our results, we might increase the fan by .9 and decrease the coolant (assume “increase slightly” means increase by ¼) by .3 - .3/4 = .9/4
- Finally, we defuzzify “decrease by .9/4” and “increase by .9” to actionable amounts

- The most common applications for fuzzy logic are for controllers
- devices that, based on input, make minor modifications to their settings – for instance
- air conditioner controller that uses the current temperature, the desired temperature, and the number of open vents to determine how much to turn up or down the blower
- camera aperture control (up/down, focus, negate a shaky hand)
- a subway car for braking and acceleration

- devices that, based on input, make minor modifications to their settings – for instance
- Fuzzy logic has been used for expert systems
- but the systems tend to perform poorly when more than just a few rules are chained together
- in our previous example, we just had 5 stand-alone rules
- when we chain rules, the fuzzy values are multiplied (e.g., .5 from one rule * .3 from another rule * .4 from another rule, our result is .06)

- but the systems tend to perform poorly when more than just a few rules are chained together

- The D-S Theory goes beyond CF and Fuzzy Logic by providing us two values to indicate the utility of a hypothesis
- belief – as before, like the CF or fuzzy membership value
- plausibility – adds to our belief by determining if there is any evidence (belief) for opposing the hypothesis

- We want to know if h is a reasonable hypothesis
- we have evidence in favor of h giving us a belief of .7
- we have no evidence against h, this would imply that the plausibility is greater than the belief
- p(h) = 1 – b(~h) = 1 (since we have no evidence against h, ~h = 0)

- Consider two hypotheses, h1 and h2 where we have no evidence in favor of either, so b(h1) = b(h2) = .5
- we have evidence that suggests ~h2 is less believable than ~h1 so that b(~h2) = .3 and b(~h1) = .5
- h1 = [.5, .5] and h2 = [.5, .7] so h2 is more believable

- we have evidence that suggests ~h2 is less believable than ~h1 so that b(~h2) = .3 and b(~h1) = .5

- D-S theory gives us a way to compute the belief for any number of subsets of the hypotheses, and modify the beliefs as new evidence is introduced
- the formula to compute belief (given below) is a bit complex
- so we present an example to better understand it
- but the basic idea is this: we have a belief value for how well some piece of evidence supports a group (subset) of hypotheses
- we introduce a new evidence and multiply the belief from the first with the belief in support of the new evidence for those hypotheses that are in the intersection of the two subsets

- the denominator is used to normalize the computed beliefs, and is 1 unless the intersection includes some null subsets

- There are four possible hypotheses for a given patient, cold (C), flu (F), migraine (H), meningitis (M)
- we introduce a piece of evidence, m1 = fever, which supports {C, F, M} at .6
- we also have {Q} (the entire set) with support 1 - .6 = .4
- now we add the evidence m2 = nausea which can support {C, F, H} at .7 so that Q = .3
- we combine the two sets of beliefs into m3 as follows:

Since m3 has no empty sets, the denominator is 1, so the set of values in m3

is already normalized and we do not have to do anything else

- When we had m1, we had two sets, {C, F, M} and {Q}
- When we combined it with m2 (with two sets of its own,{C, F, H} and {Q}), the result was four sets
- the intersection of {C, F, M} and {C, F, H} = {C, F}
- the intersection of {C, F, M} and {Q} = {C, F, M}
- the intersection of {C, F, H} and {Q} = {C, F, H}
- the intersection of {Q} and {Q} = {Q}

- m4{M} = .8 and m4{Q} = .2

- shown on the next slide, with some empty sets so our denominator will no longer be 1 and we will have to compute it after computing the numerators

Sum of empty sets = .336+ .224 = .56, the denominator is 1 - .56 = .44

m5{M} = (.096 + .144) / .44 = .545m5{C, F, M} = .036 / .44 = .082

m5{ } = (.336 + .224) / .44 = .56m5{C, F} = .084 / .44 = .191

m5{C, F, H} = .056 / .44 = .127m5{Q} = .036 / .44 = .055

The most plausible explanation is { } because the evidence tends to contradict (some symptoms indicate Meningitis, another symptom indicates no Meningitis)

- Bayes derived the following formula
- p(h | E) = p(E | h) * p(h) / sum for all i (p(E | hi) * p(hi))
- the probability that h is true given evidence E
- p(h | E) – conditional probability
- what is the probability that h is true given the evidence E

- p(E | h) – evidential probability
- what is the probability that evidence E will appear if h is true?

- p(h) – prior probability (or a priori probability)
- what is the probability that h is true in general without any evidence?

- p(h | E) – conditional probability
- the denominator normalizes the conditional probabilities to add up to 1

- To solve a problem with Bayesian probabilities
- we need to accumulate the probabilities for all hypotheses h1, h2, h3 of p(h1 | E), p(h2 | E), p(h3 | E), …, p(E | h1), p(E | h2), p(E | h3), … and p(h1), p(h2), p(h3), … and then its just a straightforward series of calculations

- The sidewalk is wet, we want to determine the most likely cause
- it rained overnight (h1)
- we ran the sprinkler overnight (h2)
- wet sidewalk (E)

- Assume the following
- there was a 50% chance of rain – p(h1) = .5
- sprinkler is run two nights a week – p(h2) = 2/7 = .28
- p(wet sidewalk | rain overnight) = .8
- p(wet sidewalk | sprinkler) = .9

- Now we compute the two conditional probabilities
- p(h1 | E) = (.5 * .8) / (.5 * .8 + .28 * .9) = .61
- p(h2 | E) = (.28 * .9) / (.5 * .8 + .28 * .9) = .39

- There is a flaw with our previous example
- if it is likely that it will rain, we will probably not run the sprinkler even if it is the night we usually run it, and if it does not rain, we will probably be more likely to run the sprinkler the next night

- So we have to be aware of whether events are independent or not
- two events are independent if P(A & B) = P(A) * P(B)
- where & means “intersect”

- when P(B) <> 0, then P(A) = P(A | B)
- knowing B is true does not affect the probability of A being true

- two events are independent if P(A & B) = P(A) * P(B)
- We can also modify our computation by using the formula for conditional independent events
- P(A & B | C) = P(A | C) * P(B | C)
- again, & is used to mean intersection
- we will expand on this shortly

- P(A & B | C) = P(A | C) * P(B | C)

- In our wet sidewalk example, E consisted of one piece of evidence, wet sidewalk
- what if we have many pieces of evidence?

- Consider a diagnostic case where there are 10 possible symptoms that we might look for to determine whether a patient has a cold (h1), flu (h2) or sinus infection (h3)
- E is some subset of {e1, e2, e3, e4, e5, e6, e7, e8, e9, e10}

- To use Bayes’ formula, we need to know
- p(h1), p(h2), p(h3) as well as
- p(e1 | h1), p(e1 | h2), p(e1 | h3)
- p(e2 | h1), p(e2 | h2), p(e2 | h3)
- p(e3 | h1), p(e3 | h2), p(e3 | h3)

- But our patient may have several symptoms
- So we also need
- p(e1, e2 | h1), p(e1, e2 | h2), p(e1, e2 | h3)
- p(e1, e3 | h1), p(e1, e3 | h2), p(e1, e3 | h3)
- p(e2, e3 | h1), p(e2, e3 | h2), p(e2, e3 | h3)
- p(e1, e2, e3 | h1), p(e1, e2, e3 | h2), p(e1, e2, e3 | h3)

- How many different probabilities will we need?
- with 10 pieces of evidence, there are 210 = 1024 different combinations for E, so we will need 3 * 1024 = 3072 evidential probabilities (to go along with the 3 prior probabilities, one for each hypothesis)
- imagine if E comprised a set of 50 pieces of evidence instead!

- We can apply the Bayesian formulas for independent and conditionally dependent events in a network form
- we want to determine the likely cause for seeing orange barrels, flashing lights and bad traffic on the highway
- two hypotheses: construction, accident (see the figure below)
- notice T (bad traffic) can be caused by either construction or an accident, orange barrels are only evidence of construction and flashing lights are only evidence of an accident (although it could also be that a driver has been pulled over)
- construction and accident are not directly related to each other – this will help simplify the problem

- Cause-effect situations are temporal
- at time i, an event arises and causes an event at time i+1
- the Bayesian belief network is static, it captures a situation at a singular point in time
- we need a dynamic network instead

- The dynamic Bayesian network is similar to our previous networks except that each edge represents not merely a dependency, but a temporal change
- when you take the branch from state i to state i+1, you are not only indicating that state i can cause i+1 but that i was at a time prior to i+1

Here is a state diagram to

represents possible utterances

for the word “tomato”

Each node represents both a

sound and a segment of time