- 107 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'One-Off vs. Sequential Decision Processes Wrap Up' - olathe

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Lecture Overview

- Recap Lecture 35
- Decision networks for single-stage decision problems
- VE for single-stage decision problems
- Sequential decision problems

What’s Next?

Representation

- Environment

Reasoning

Technique

Stochastic

Deterministic

Problem Type

Arc

Consistency

Now we will look at acting in stochastic environments

Constraint Satisfaction

Vars +

Constraints

Search

Static

Belief Nets

Logics

Variable

Elimination

Query

Search

Decision Nets

Sequential

STRIPS

Variable

Elimination

Planning

Search

Decisions Under Uncertainty: Intro

- An agent's decision will depend on
- What actions are available
- What beliefs the agent has
- Which goals the agent has
- Differences between deterministic and stochastic setting
- Obvious difference in representation: need to represent our uncertain beliefs
- Actions will be pretty straightforward: represented as decision variables
- Goals will be interesting: we'll move from all-or-nothing goals to a richer notion:
- rating how happy the agent is in different situations.
- Putting these together, we'll extend Bayesian Networks to make a new representation called Decision Networks

Decisions Under Uncertainty: Intro

- An intelligent agent can represent and reason about situations of this nature by using
- Probability to measure the uncertainty in actions outcome
- Utility to measure agent’s preferences over the various outcomes
- Combined in a measure of expected utility that can be used to identify the action with the best expected outcome
- We saw how this works for Single Action (akaOne-Off) Decisions
- One or more primitive decisions that can be treated as a single macro decision to be made before acting

Recap: Single vs. Sequential Actions

- Single Action (aka One-Off Decisions)
- One or more primitive decisions that can be treated as a single macro decision to be made before acting
- Sequence of Actions (Sequential Decisions)
- Repeat:
- observe
- act
- Outcomes of previous decisions can influence subsequent ones
- The agent needs to define a policy, defining a priori how it will act under any possible set of circumstances

Recap: Optimal single-stage decisions

Best decision: (wear pads, short way)

Probability

Utility

E[U|D]

0.2

35

35

83

95

0.8

30

35

0.01

74.55

75

0.99

0.2

35

3

80.6

100

0.8

35

0

0.01

79.2

80

0.99

Lecture Overview

- Recap Lecture 35
- Decision networks for single-stage decision problems
- VE for single-stage decision problems
- Sequential decision problems

Single-Stage decision networks

- Extend belief networks with:
- Decision nodes, that the agent chooses the value for
- Parents: only other decision nodes allowed
- Domain is the set of possible actions
- Drawn as a rectangle
- Exactly one utility node
- Parents: all random & decision variables on which the utility depends
- Does not have a domain
- Drawn as a diamond
- Explicitly shows dependencies
- E.g., which variables affect the probability of an accident?

Types of nodes in decision networks

- A random variable is drawn as an ellipse.
- Arcs into the node represent probabilistic dependence
- As in Bayesian networks: a random variable is conditionally independent of its non-descendants given its parents
- A decision variable is drawn as an rectangle.
- Arcs into the node represent information available when the decision is made
- A utility node is drawn as a diamond.
- Arcs into the node represent variables that the utility depends on.
- Specifies a utility for each instantiation of its parents

Example Decision Network

Decision nodes do not have an associated table.

Lecture Overview

- Recap Lecture 35
- Decision networks for single-stage decision problems
- VE for single-stage decision problems
- Sequential decision problems

Computing the optimal decision: we can use VE

- Denote
- the random variables as X1, …, Xn
- the decision variables as D
- the parents of node N as pa(N)
- To find the optimal decision we can use VE:
- Create a factor for each conditional probability and for the utility
- Sum out all random variables, one at a time
- This creates a factor on D that gives the expected utility for each di
- Choose the di with the maximum value in the factor

VE Example: Step 1, create initial factors

f1(A,W)

Abbreviations:

W = Which WayP = Wear PadsA = Accident

f2(A,W,P)

VE example: step 2, sum out A

Step 2a: compute product f1(A,W) × f2(A,W,P)

What is the right form for the product f1(A,W) × f2(A,W,P)?

f(A,W)

f(A,P)

f(A)

f(A,P,W)

VE example: step 2, sum out A

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

What is the right form for the product f1(A,W) × f2(A,W,P)?

- It is f(A,P,W): the domain of the product is the union of the multiplicands’ domains
- f(A,P,W) = f1(A,W) × f2(A,W,P)
- I.e., f(A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

VE example: step 2, sum out A

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

f(A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

0.99 * 30

0.01 * 80

0.99 * 80

0.8 * 30

VE example: step 2, sum out A

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

f(A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

VE example: step 2, sum out A

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

f(A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

VE example: step 2, sum out A

The final factor encodes the expected utility of each decision

Step 2b: sum A out of the product f(A,W,P):

VE example: step 2, sum out A

The final factor encodes the expected utility of each decision

Step 2b: sum A out of the product f(A,W,P):

0.99*80 + 0.8*95

0.2*35 + 0.2*0.3

0.2*35 + 0.8*95

0.8 * 95 + 0.8*100

VE example: step 2, sum out A

The final factor encodes the expected utility of each decision

Step 2b: sum A out of the product f(A,W,P):

VE example: step 2, sum out A

Step 2b: sum A out of the product f(A,W,P):

The final factor encodes the expected utility of each decision

VE example: step 3, choose decision with max E(U)

The final factor encodes the expected utility of each decision

- Thus, taking the short way but wearing pads is the best choice, with an expected utility of 83

Step 2b: sum A out of the product f(A,W,P):

Variable Elimination for Single-Stage Decision Networks: Summary

- Create a factor for each conditional probability and for the utility
- Sum out all random variables, one at a time
- This creates a factor on D that gives the expected utility for each di
- Choose the di with the maximum value in the factor

Sequential Decision Problems

- An intelligent agent doesn't make a multi-step decision and carry it out blindly
- It would take new observations it makes into account
- A more typical scenario:
- The agent observes, acts, observes, acts, …
- Subsequent actions can depend on what is observed
- What is observed often depends on previous actions
- Often the sole reason for carrying out an action is to provide information for future actions
- For example: diagnostic tests, spying
- General Decision networks:
- Just like single-stage decision networks, with one exception:the parents of decision nodes can include random variables

Sequential Decision Problems: Example

- In our Fire Alarm domain
- If there is a report you can decide to call the fire department
- Before doing that, you can decide to check if you can see smoke, but this takes time and will delay calling
- A decision (e.g. Call) can

depends ona random variable

(e.g. SeeSmoke )

- Each decision Di has an information set of variables pa(Di), whose value will be known at the time decision Di is made
- pa(CheckSmoke) = {Report}
- pa(Call) = {Report, CheckSmoke, See Smoke}

Decision node: Agent decides

Chance node: Chance decides

Sequential Decision Problems: Example

- Example for sequential decision problem
- Each decision Di has an information set of variables pa(Di), whose value will be known at the time decision Di is made
- pa(Test) = {Symptoms}
- pa(Treatment) = {Test, Symptoms, TestResult}

Decision node: Agent decides

Chance node: Chance decides

Sequential Decision Problems

- What should an agent do?
- The agent observes, acts, observes, acts, …
- Subsequent actions can depend on what is observed
- What is observed often depends on previous actions
- The agent needs a conditional plan of what it will do given every possible set of circumstances
- We will formalize this conditional plan as a policy

Learning Goals for Decision Under Uncertainty

- Compare and contrast stochastic single-stage (one-off) decisions vs. multistage (sequential) decisions
- Define a Utility Function on possible worlds
- Define and compute optimal one-off decisions
- Represent one-off decisions as single stage decision networks
- Compute optimal one-off decisions by Variable Elimination
- Material covered in the slides coming next won’t be covered in the final

Policies for Sequential Decision Problems

Definition (Policy)A policy is a sequence of δ1 ,…..,δn decision functions

δi : dom(pa(Di )) → dom(Di)

This policy means that when the agent has observed

odom(pa(Di )) , it will do δi(o)

There are 22=4 possible decision functions δcsfor Check Smoke:

- Decision function needs to specify a value for

each instantiation of parents

CheckSmoke

Policies for Sequential Decision Problems

Definition (Policy)A policy is a sequence of δ1 ,…..,δn decision functions

δi : dom(pa(Di )) → dom(Di)

when the agent has observed odom(pDi) , it will do δi(o)

There are 28=256 possible decision functions δcsfor Call:

How many policies are there?

- If a decision D has k binary parents,
- there are 2kdifferentassignments of values to the parents
- If there are b possible value for a decision variable with k binary parents
- There are b2k different decision functions for that variable
- because there are 2k possible instantiations for the parents and for every instantiation of those parents, the decision function could pick any of b values
- If there are d decision variables, each with k binary parents and b possible values
- There are (b2k)ddifferent policies
- because there are b2k possible decision functions for each decision, and a policy is a combination of d such decision functions
- Still, we want to use the same high-level approach we used for one-off decisions
- find the “most promising” policy given the inherent uncertainty, i.e. the policy with the maximum expected utility

Optimality of a policy

w⊧ indicates a possible world w that satisfies a policy,

Finding the optimal policy

- Variable elimination for decision networks can find the optimal policy in O(d * b2k)
- Relies on dynamic programming to consider each decision function only once
- Much faster than enumerating policies (or search in policy space), but still doubly exponential
- CS422: approximation algorithms for finding optimal policies

Big Picture: Planning under Uncertainty

Probability Theory

Decision Theory

Markov Decision Processes (MDPs)

One-Off Decisions/ Sequential Decisions

Partially Observable MDPs (POMDPs)

Fully Observable MDPs

Decision Support Systems(medicine, business, …)

Economics

Control Systems

Robotics

Decision Theory: Decision Support Systems

Support for management: e.g. hiring

Source: R.E. Neapolitan, 2007

CPSC 322,

Decision Theory: Decision Support Systems

Computational Sustainability: New interdisciplinary field, AI is a key component

- Models and methods for decision making concerning the management and allocation of resources
- to solve most challenging problems related to sustainability, E.g.
- Energy: when and where to produce green energy most economically?
- Which parcels of land to purchase to protect endangered species?
- Urban planning: how to use budget for best development in 30 years?

Source: http://www.computational-sustainability.org/

Planning Under Uncertainty

- Learning and Using models of Patient-Caregiver Interactions During Activities of Daily Living
- by using POMDP (an extension of decision networks that model the temporal evolution of the world)
- Goal: Help older adults living with cognitive disabilities (such as Alzheimer's) when they:
- forget the proper sequence of tasks that need to be completed
- lose track of the steps that they have already completed

Source: Jesse Hoey UofT 2007

Planning Under Uncertainty

Autonomous driving: DARPA Urban Challenge – Stanford’s Junior

Source: Sebastian Thrun

Planning Under Uncertainty

Navigation and obstacle avoidance in a collaboratively controlled smart wheelchair (Viswanathan et al., 2008)

Uses POMDP to model

user’s state and the

wheelchair’s state.

Gives prompts to user

and prevents collisions.

Allows older adults with

mild impairments access

to powered wheelchairs

safely.

Planning Under Uncertainty

Helicopter control: MDP, reinforcement learning

(states: all possible positions, orientations, velocities and angular velocities)

Source: Andrew Ng

We are done!

Representation

- Environment

Reasoning

Technique

Stochastic

Deterministic

Problem Type

Arc

Consistency

Constraint Satisfaction

Vars +

Constraints

Search

Static

Belief Nets

Logics

Variable

Elimination

Query

Search

Decision Nets

Sequential

STRIPS

Variable

Elimination

Planning

Search

Remember: 422 and 340 expand on these topics, if you are interested

Announcements

- Final: Fri. April 13, 8:30am
- Like the midterm, it will be a mix of short questions and longer, problem-style questions
- Short questions will be very similar to the posted practice questions
- Good Luck!

Announcements

Review material for the final

- Complete list of learning goals
- Practice short questions for the second part of the course (plus the practice questions that were posted before the final)
- Homework-style questions on Planning Under Uncertainty, Practice exercises (new one on decisions under Uncertainty)

See extended office hours for next week on course page

Download Presentation

Connecting to Server..