intelligence agents n.
Skip this Video
Loading SlideShow in 5 Seconds..
Intelligence Agents PowerPoint Presentation
Download Presentation
Intelligence Agents

Loading in 2 Seconds...

play fullscreen
1 / 50

Intelligence Agents - PowerPoint PPT Presentation

  • Uploaded on

Intelligence Agents. (Chapter 2). An Agent in its Environment. AGENT. action output. Sensor Input. ENVIRONMENT. Agent Environments. accessible (get complete state info) vs inaccessible environment (most real world environments)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Intelligence Agents' - doane

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an agent in its environment
An Agent in its Environment




Sensor Input


agent environments
Agent Environments
  • accessible (get complete state info) vs inaccessible environment (most real world environments)
  • episodic (temporary or one-shot) vs non-episodic (history sensitive)
    • no link between performance of agent in different scenarios
    • need not reason about interactions between this and future episodes
  • static (changes only by agent) vs dynamic
  • physical world is highly dynamic
environments episodic vs non episodic sequential
Environments - Episodic vs. non-episodic (sequential)
  • This sounds like reactive/non-reactive BUT we are talking about the environment not the agent.
  • Episode – one interaction
  • In Episodic environments, the system will behave the same way under the same state (known to the agent).
  • In non-episodic, the system appears to change its behavior over time.
  • Suppose the penalty for not doing an assignment is
    • get a zero
    • fail the course
    • get an incomplete
    • replace the score with average of others
  • All are legal actions, but action 1 is what has always happened in the past.
  • If action 2 is taken as system gets “upset”, it is history sensitive (non-episodic)
  • System responds to run not just to action and state.
agent environments cont
Agent Environments (cont)
  • deterministic (outcome uniquely defined) vs non-deterministic effect
    • limited sphere of influence
    • actions can fail to produce desired results
    • not have complete control (influence only)
    • if sufficiently complex, may appear non-deterministic even if deterministic
  • discrete (fixed, finite number of actions) vs continuous

[chess vs. taxi driving]

vwyn visit with your neighbor
vwyn (visit with your neighbor)
  • Give examples of environments which fall in each of the categories (static, deterministic, accessible, episodic, discrete).
dynamic environments
Dynamic Environments
  • Must gather information to determine the state of the environment
  • Other processes can interfere with actions it attempts to perform – so information gathering must continue after action has been selected.
  • Consider a dynamic wumpus world
the most complex systems
The most complex systems
  • inaccessible
  • non-deterministic
  • dynamic
  • continuous
  • Named open (Hewitt, 1986)
intelligent agents
Intelligent Agents
  • Reactivity
  • proactive
  • social ability

Purely Reactive agents are simple processing units that perceive and react to changes in their environment. Such agents do not have a symbolic representation of the world and do not use complex symbolic reasoning.

  • The advocates of reactive agent systems claims that intelligence is not a property of the active entity but it is distributed in the system - emerges.

Intelligence is seen as an emergent property of the entire activity of the system, the model trying to mimic the behaviour of large communities of inferior living beings, such as the communities of insects.

  • Building purely goal-directed systems is not hard – neither is building purely reactive systems. It is balancing both that is hard. Not surprising – as it is comparatively rare to find humans than do this very well.
  • Example – AlphaWolves. Agent has a personality. Being directed with other goals from human participant. How howl at the head wolf if you are shy?
A reactive system is one that maintains an on-going interaction with its environment and responds to changes (in time or the response to be useful)
  • must make local decisions which have global consequences
  • Consider printer control. May unfairly deny service over long range, even though seems appropriate in short term. Likely in episodic environments.
A little intelligence goes a long way.
  • Oren Etzioni (speaking about the commercial experience of NETBOT, Inc): We made our agents dumber and dumber and dumber until finally they made money!
  • NetBot’s Jango represents one of the most visible use of agents on the Internet, in this case an application that helps users do comparison shopping on the Web.

intentional systems, namely systems “whose behaviour can be predicted by the method of attributing belief, desires and rational acumen” (Dennett, 1987).

  • Dennett identifies different “grades” of intentional systems: A first order intentional system has beliefs and desires but no beliefs and desires about beliefs and desires. A second order system does.
first order: I desire an A in the class
  • second order: I desire that you should desire an A in the class
  • first order: I believe you are honest
  • second order: I desire to believe you are honest
  • Second order: I believe you believe you are honest
  • Shoham: such a mentalistic or intentional view of agents is not just another invention of computer scientists but is a useful paradigm for describing complex distributed systems.
  • BDI architecture
abstract architectures for agents
Abstract Architectures for Agents
  • Assume the environment may be in any of a finite set E of discrete, instantaneous states:
  • E = {e0,e1,e2,…}
  • Agents have a repertoire of possible actions which transform the state of the environment
  • Ac = {α0, α1, α2, α3 …}
  • A run, r, is a sequence of interleaved environment states and actions
    • R be the set of all such finite sequences
    • RACbe the subset of these that end in action
    • RE be the subset of these that end in environment state
    • R = RAC + RE
state transformer functions
State Transformer Functions
  • Recall P({1,2,3}) is the set of all subsets =
  • {{},{1},{2},{3},{1,2},{1,3}, {2,3}, {1,2,3}}
  • The state transformer function  (tau) represents the behavior of the environment:
  • Note that the result of applying  is non-deterministic (and hence goes to the power set of environments)
  • Note that environments are
    • history dependent (dependent on whole run)
    • non-deterministic (goes to power set)
If (r) = , then there are no possible successor states to r. The system has ended the run.
  • Formally, an environment Env is a triple Env = <E,e0,> where E is a set of environment states, e0  E is the initial state, and  is the transformer function.
  • An agent maps runs (ending in an environment) into actions

Ag: REAc

Ag is the set of all agents which perform actions based on entire history of the system. Notice that the agent is deterministic (even though environment is not).

for wumpus world
For Wumpus World:
  • e0 = initial board, agent at (1,1), arrows=1, points=0
  • Actions = north,south,east,west, shoot N/S/E/W
  •  - produces set of states from a current run ending in an action possible after an action
    • location could have changed
    • Arrows could have changed
    • Points could have changed
    • Knowledge could have changed
  • If deterministic, how many next states are possible?
  • A system is a pair containing an agent and an environment.
  • The set of runs of agent Ag in environment Env is R(Ag,Env)
  • We assume R(Ag,Env) contains only terminated runs.
  • a sequence (e0,α0,e1,α1,e2,α2 …) represents a run of an agent Ag in environment Env=<E,e0,> where
    • e0 is the initial state of Env
    • α0 =Ag(e0)
    • for  >0,
      • e  (e0,α0,e1,α1,e2,α2 …α-1) each environment comes from the possible set of results.
      • α = Ag(e0,α0,e1,α1,e2,α2 …α-1,e) the action represents what the agent would do, given the run
purely reactive agents
Purely Reactive Agents
  • Some agents decide what to do without reference to their history
  • action: EAc (not dependent on run)
  • A thermostat is purely reactive
  • action(e) = off (if e is okay)
  • =on otherwise
  • see function is the ability to observe the environment
  • action function represents the agent’s decision making process
  • The output of see is a percept see: EPer
  • action: Per*A (maps a sequence of percepts into actions)
  • Now introduce perceptionsystem:






Mars explorer

  • Mars explorer (L. Steels)
    • objective
      • to explore a distant planet, and in particular, to collect sample of a precious rockthe location of the samples is not known in advance, but it isknown that they tend to be clustered
      • mother ship broadcasts radio signal
        • weakens with distance
      • no map available
      • collaborative

Mother ship

autonomous vehicle

precious rock

mars explorer cont
Mars explorer (cont.)
  • single explorer solution:
    • behaviours / rules

1. if obstacle then change direction

2. if carrying samples and at basethen drop them

        • if carrying samples and not at basethen travel toward ship

4. if detect sample then pick it up

        • if true then walk randomly
    • total order relation
        • 1 < 2 < 3 < 4 < 5
mars explorer cont1
Mars explorer (cont.)
  • multiple explorer solution ?
    • think about it …
    • if one agent found a cluster of rocks – communicate ?
        • range ?
        • position ?
        • how to deal with such messages ? may be far off …
    • indirect communication:
      • each agent carries “radioactive crumbs”, which can be dropped, picked up and detected by passing robots
      • communication via environment is called stigmergy
example mars explorer cont
example – Mars explorer (cont.)
  • solution inspired by ant foraging behaviour
      • agent creates a “trail” of radioactive crumbs back to the mother ship whenever it finds a rock sample
      • if another agent comes across a trail, it can follow it to the sample cluster
  • refinement:
      • agents following trail to the samples picks up some crumbs to make the trail fainter
      • the trail leading to the empty cluster will finally be removed
subsumption architecture example mars explorer cont
Subsumption Architecture:example – Mars explorer (cont.)
  • modified rule set
    • if detect an obstacle then change direction
    • if carrying samples and at the basethen drop samples
    • if carrying samples and not at the basethendrop 2 crumbs and travel toward ship
    • if detect a sample then pick up sample
    • if sense crumbs then pick up 1 crumb and travel away from ship
    • if true then move randomly (nothing better to do)
  • order relation: 1 < 2 < 3 < 4 < 5 < 6
  • achieves near optimal performance in many situations
  • cheap solution and robust (the loss of a single agent is not critical).
  • L. Steels argues that (deliberative) agents are “entirely unrealistic” for this problem.
mars explorer cont2
Mars explorer (cont.)
  • advantages
    • simple
    • economic
    • computationally tractable
    • robust against failure
  • disadvantages
    • agents act short-term since they use only local information
    • no learning
    • how to engineer such agents ? Difficult if more than 10 rules interact
    • no formal tools to analyse and predict
agents with state not reactive
Agents with State (NOT reactive)
  • We now consider agents that maintain state:







agents with state
Agents with State
  • These agents have some internal data structure, which is typically used to record information about the environment state and history.Let Ibe the set of all internal states of the agent.
  • The perception function seefor a state-based agent is unchanged:
  • see : E Per
  • The action-selection function actionis now defined as a mapping
  • action : I Ac
  • from internal states to actions. An additional function nextis introduced, which maps an internal state and percept to an internal state:
  • next : I  Per  I
agent control loop
Agent Control Loop
  • Agent starts in some initial internal state i0
  • Observes its environment state e, and generates a percept see(e)
  • Internal state of the agent is then updated via nextfunction, becoming next(i0, see(e))
  • The action selected by the agent is action(next(i0, see(e)))
  • Goto 2
tasks for agents
Tasks for Agents
  • We build agents in order to carry out tasksfor us
  • The task must be specifiedby us…
  • But we want to tell agents what to do withouttelling them how to do it
utility functions over states
Utility Functions over States
  • One possibility: associate utilities with individual states — the task of the agent is then to bring about states that maximize utility
  • A task specification is a function
  • u : E Reals
  • which associates a real number with every environment state
  • How do we specify the task to be carried out? By telling the system what states we like.
  • Normally utilities show a degree of happiness – cardinal.
  • If we only know that one state is better than another, but not by how much, we say it is “ordinal” or ranked.
  • A more restricted situation is when a state is either good or bad (success or failure). This is a binary preference function or a predicate utility.
we need utilities to act in reasonable ways preference patterns
We need utilities to act in reasonable ways Preference Patterns
  • Consider some abstract set C with elements ci. Thus, C = {ci : i  I} where I is some index set. For example, C can be the set of consequences that can arise from taking action from a particular state.
  • A preference pattern is a binary relation over C. The following notation is used to describe a relation between various elements of C [3]:
  • • ci  cj : ci is preferred to cj .
  • • ci  cj : the agent is indifferent1 between ci and cj ; the two elements are equally preferred.
  • • ci  cj : ci is at least as preferred as cj .
  • A preference pattern is a linear ordering [2]. As such, it has the following properties:
  • • For all c  C, c  c.
  • • For all ci, cj  C, if ci  cj and cj  ci then ci  cj .
  • • For all ci, cj , ck  C, if ci  cj and cj  ck then ci  ck.
  • • For all ci, cj  C, either ci  cj or cj  ci.
axioms of utility functions
Axioms of Utility Functions

The Utility Theorem simply says that if an agent has a preference relation that satisfy the axioms of preference then a real-valued utility function can be constructed that reflects this preference relation.

The notation [p,A; 1−p,B] denotes a lottery where,

with probability p, the option A is won and

with probability 1 − p the option B is won.

constructing the utility
Constructing the Utility
  • Example. You are graduating from college soon, and you have four job offers: one from Microsoft (as a programmer), one from McDonald’s (as a hamburger maker), one from Walmart (as a checkout clerk), and one from Sun (as a tester). Suppose that your preferences are as follows:
  • Microsoft  Sun Walmart  McDonald’s.
  • Construct a utility that represents this preference pattern.
  • The first step in creating U is to assign a real number to the most and least preferred options.

Example continued. In the set of possible jobs, Microsoft is most preferred and McDonald’s is least preferred. Suppose that I choose the following values for each option:

  • U(Microsoft) = 100.0, U(McDonald’s) = 1.0.
  • By the continuity property, we know that there exists a p such that the agent is indifferent between an option and the lottery where the most preferred option is received with probability p and the least preferred option is received with probability 1 − p. For all options, identify this p. For each option, A, assign the utility of A as follows:
  • U(A) = pU(Most Preferred) + (1 − p)U(Least Preferred).
  • By finding the value p you are essentially identifying how strongly you feel about the option.

Suppose for sun, we let p = .9.

  • U(Sun) = .9*100 + .1*1 = 90.1
  • Suppose for WalMart we let p - .2
  • U(Walmart) = .2*100 + .8 * 1 = 20.8
  • Since Sun is preferred, its p is higher.
  • Note that we are indifferent between
  • [.9, Microsoft; .1,McDonalds] and working at Sun.
  • The expected value of the two choices is the same!!!
Difficulties with utility-based approaches:
    • where do the numbers come from?
    • we don’t always think in terms of utilities!
    • hard to formulate tasks in these terms
    • Exponential states – extracting utilities may be difficult. Simple additive doesn’t express substitutes or complements.
  • Advantages
    • human-like – maximize pleasure
    • aids reuse - change rewards, get new behavior
    • flexible – can adapt to change in environment (new opportunity, option becomes less advantageous)
expected utility optimal agents
Expected Utility & Optimal Agents
  • Write P(r | Ag, Env) to denote probability that run roccurs when agent Agis placed in environment Env, noting non-deterministic results. We don’t know what state will result from our action.
  • Note sum of all choices is 1:
expected utility on average utility optimal agents
Expected Utility (on average utility) & Optimal Agents
  • Then optimal agent Agoptin an environment Envis the one that maximizes expected utility
  • arg says – return argument which maximizes the formula