Artificial intelligence
Download
1 / 36

Artificial Intelligence - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

Artificial Intelligence. CH 17 Making complex decisions. Group (9). Team Members : Ahmed Helal Eid Mina Victor William Supervised by : Dr. Nevin M. Darwish. Agenda. Introduction Sequential Decision Problems Optimality in sequential decision problems Value Iteration

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Artificial Intelligence' - kadeem


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Artificial intelligence

Artificial Intelligence

CH 17

Making complex decisions


Group 9
Group (9)

  • Team Members :

  • Ahmed HelalEid

  • Mina Victor William

  • Supervised by :

  • Dr. Nevin M. Darwish


Agenda
Agenda

  • Introduction

  • Sequential Decision Problems

  • Optimality in sequential decision problems

  • Value Iteration

  • The value iteration algorithm

  • Policy Iteration


Introduction
Introduction

  • Previously in ch16

    • MAKING SIMPLE DECISION.

    • Concerned with episodic decision problems, in which the utility of each action's outcome was well known.

      • Episodic environment: the agent experience is divided into atomic episodes each one consists of the agent perceiving and then performing a single action.


Introduction1
Introduction

  • In This Chapter

    • The computational issues involved in making decisions in stochastic environment.

    • Sequential decision problems,

      • in which the agent's utility depends on a sequence of decisions.

      • Sequential decision problems, which include utilities, uncertainty, and sensing, generalize the search and planning problems as special cases.



Sequential decision problems1
Sequential Decision Problems

Unfortunately the environment not go along with this situation

3

+1

-1

2

What if the environment

was deterministic ?

start

1

1

2

3

4

Actions A(s) in every state are (Up , Down , Left , Right)


Sequential decision problems2
Sequential Decision Problems

0.8

3

+1

0.8

-1

2

start

0.1

0.1

1

Model for stochastic motion

1

2

3

4

0.1

0.1

  • [ Up, Up, Right, Right, Right ]0.8^5 =0.32768

  • [ Right, Right, Up, Up, Right ]0.1^4× 0.8 =0.00008


Sequential decision problems3
Sequential Decision Problems

Probability of reaching state S` if

action a is done at state S

  • Transition model

    • T( S, a, S`)

  • Markovian transition

  • Utility function

  • Reward R(s)

Probability of reaching state

S` from S depend only on S

Depend on a sequence of state

environment history

3

-0.04

-0.04

-0.04

+1

Agent receives reward in each

state (+ve Or –ve)

2

-0.04

-1

-0.04

1

-0.04

-0.04

-0.04

Utility = ( - 0.04 × 10 )+1=0.6

For 10 steps to the goal

1

2

3

4


Markov decision process mdp
Markov Decision Process (MDP)

  • We use MDPs to solve sequential decision problems.

  • We eventually want to find the best choice of action for each state.

  • Consists of:

    • a set of actions A(s)

      • for actions in each state in state s

    • transition model P(s' | s, a)

      • describing the probability of reaching s' using action a in s

      • transitions are Markovian - only depends on s not previous states

    • reward function R(s)

      • the reward an agent receives for arriving in state s


Sequential decision problems4
Sequential Decision Problems

What is the solution to a problem look like ?

  • Policy (π)

    • A solution must specify what the agent should do for any state that the agent might reach.

  • (π(s))

    • The action recommended by the policy π for state S

  • Optimal policy (π*)

    • Yield the highest expected utility


  • Continue
    Continue……

    3

    +1

    -1

    2

    R (s) < -1.6284

    1

    3

    +1

    -1

    2

    -0.4278 < R (s) < -0.0850

    1


    Continue1
    Continue……

    3

    +1

    -1

    2

    -0.0221 < R (s) < 0

    1

    3

    +1

    R (s) > 0

    -1

    2

    1


    The horizon
    The Horizon

    • Finite horizon:

      • Fixed time N after which nothing matter

        (the game is over)

      • Optimal policy is Non-stationary

    Is there a finite Or infinite horizon

    for decision making ?


    Example of finite horizon
    Example of Finite horizon

    N= 3

    3

    +1

    -1

    2

    start

    1

    1

    2

    3

    4

    • Optimal action in a given state could change over time


    Optimality in sequential decision problems
    Optimality in sequential decision problems

    • Infinite horizon:

      • No fixed deadline (time at state doesn’t matter)

      • Optimal policy is stationary

    Is there a finite Or infinite horizon

    for decision making ?


    Example of infinite horizon
    Example of Infinite horizon

    N= 100

    3

    +1

    -1

    2

    start

    1

    1

    2

    3

    4

    • Optimal action in a given state could not change over time


    Optimality in sequential decision problems1
    Optimality in sequential decision problems

    We are mainly going to use infinite horizon utility functions because

    • there is no reason to behave differently in the same state.

    • Hence, the optimal action depends only on the current state, and the optimal policy is stationary.

    Is there a finite Or infinite horizon

    for decision making ?


    Artificial intelligence

    Optimality in sequential decision problems


    Optimality in sequential decision problems2
    Optimality in sequential decision problems

    How to calculate utility of a state Sequence ?

    • Additive reward:

    • Discount reward:

    Discount factor is between 0 & 1


    Optimality in sequential decision problems3
    Optimality in sequential decision problems

    What if there isn't terminal State Or agent never reach one?

    • If the environment doesn’t contain a terminal state, Or if the agent never reach one, then

      • all environment Histories will be infinitely long, and utilities with Additive rewards will generally be infinite.


    Optimality in sequential decision problems4
    Optimality in sequential decision problems

    What if there isn't terminal State Or agent never reach one?

    Solution

    • With Discount rewards : the utility of an infinite sequence is finite, if rewards are bounded by Rmaxand γ<1

      Uh([S0,S1,…..])= <=

      =


    Optimal policies for utilities of states
    Optimal Policies for utilities of states

    • Expected utility for some policy π starting in state s

    • The optimal policy π* has the highest expected utility and will be given by

    • This sets π*(s) to the argument a of A(s) which gives the highest utility


    Optimal policies for utilities of states1
    Optimal Policies for utilities of states

    • Policy is actually independent of start state:

      • actions will differ but policy will never change

      • this comes from the nature of a Markovian decision problem with discounted utilities over infinite horizons

    • U(s) is also independent of start state and current state


    Optimal policies for utilities of states2
    Optimal Policies for utilities of states

    The utilities are higher for states closer to the +1 exit.

    Because fewer steps are required to reach the exit



    Value iteration algorithm
    Value Iteration Algorithm

    • Hard to calculate

      • because it's non-linear so use an iterating algorithm.

    • Basic idea

      • Start at an initial value for all states then

      • update each state using their neighbours until they hit equilibrium.





    Value iteration algorithm3
    Value Iteration Algorithm

    • When to terminate??!

      • Bellman update is small.

        So the error compared with the true utility function is small.

    • Why use cRmax(1- γ) / γ

      • Recall: if γ < 1 and infinite-horizon then Uh converges to Rmax / (1 – γ) when summed over infinity

    If ||Ui+1-Ui|| < ε(1- γ)/ γ

    then

    ||Ui+1-U|| < ε


    Policy iteration
    Policy iteration

    • Policy iteration algorithm alternates two steps:

    • policy evaluation :given policy πi

      calculate Ui=U πi, the utility of each state if were to be executed.

    • policy improvement: calculate a new policyΠi+1


    Policy iteration1
    Policy Iteration

    • Algorithm

      start with policy π0

      repeat

      Policy evaluation: for each state calculate Ui given by policy πi

      • simplified version of Bellman Update eqn – no need for max

  • check if unchanged

  • Policy improvement: for each state

    • if the max utility over each action gives a better result than π(s)

    • set π(s) to the new policy

  • until unchanged