Instrumental/Operant Conditioning

Instrumental/Operant Conditioning

Instrumental Conditioning involves three key elements: • a response • usually an arbitrary motor response • relevance or belongingness • an outcome (the reinforcer) • bigger reward = better conditioning • contrast effects • a relation between the response and outcome • contiguity • contingency

Contiguity • delay of reinforcement • bridge the delay with a conditioned reinforcer • marking procedure • studies of delay of reinforcement show that a perfect causal relation between the response and outcome is not sufficient to produce strong instrumental responding • even with a perfect causal relation, conditioning does • not occur if reinforcement is delayed too long • led to the conclusion that response-reinforcer • contiguity, rather than contingency, was the critical • factor

Skinner’s Superstition Experiment A landmark study in the debate about the role of contiguity versus contingency Method Food presented to pigeons every 15 s regardless of the behavior of the bird. Result Birds showed stereotyped behavior patterns as time for food delivery approached.

Skinner’s operant conditioning explanation: • Adventitious (accidental) reinforcement of the bird’s behavior • Stresses the importance of contiguity between R and the reinforcer Logic of Skinner’s explanation: • Animal could pick out which response was being reinforced • Animal was insensitive to contingency • Extinction was much weaker than acquisition

The Staddon and Simmelhag Superstition Experiment A landmark study that challenged Skinner’s interpretation Method Basically the same procedure as Skinner, except fixed time interval of 12 s, and birds were observed and their behavior recorded on all sessions.

Responding at Asymptote Fixed Time R1 Fixed Interval R1 R7 R2 Terminal Responses R8 R1 = magazine wall R2 = pecking key R3 = pecking floor R4 = ¼ circle R5 = flapping wings R6 = window wall R7 = pecking wall R8 = moving along mag wall R9 = preening R10 = beak to ceiling R11 = head in magazine R8 R3 R4 R4 R3 R1 R1 R7 R2 Interim Responses R8 R6 R7 R3 R5

Importantly, early in training,terminal responses were not contiguous with food, interim responses were; According to Skinner’s adventitious reinforcement hypothesis, interim responses should have become the terminal responses, but they did not. • thus superstitious key pecking is not due to adventitious reinforcement.

Classical Conditioning Explanation of Superstition CS US UR Time since food Food Pecking CR Pecking

Animal Behavior Terminology • Appetitive Behavior – Occurs when reinforcement not available • Consumatory Behavior – Occurs when reinforcement is about to appear

Contingency • effects of controllability of reinforcers • a strong contingency between the response and a reinforcer means the response controls the reinforcer • most of the research has focused on control over aversive stimulation • contemporary research originated with studies by Seligman and colleagues • they investigated the effects of uncontrollable shock on subsequent escape-avoidance learning in dogs • the major finding was that exposure to uncontrollable shock disrupted subsequent learning • this phenomenon called the learned-helplessness effect

The learned-helplessness effect The design for LH experiments is outlined in Table 5.2, page 153 Experiment by Seligman and Maier (1967) - demonstrated the basic LH effect - 3 groups of dogs Phase 1: Group 1: Escape - restrained and given unsignaled shock to hindfeet • could terminate the shock by pressing either of 2 panels on • either side of snout Group 2: Yoked - placed in same restraint and given same # and pattern of shocks • could not terminate the shocks by pressing the panels; they were • shocked whenever Escape animals were shocked Group 3: Control - just put in restraint

The learned-helplessness effect Phase 2: - all dogs were treated alike • put in a 2-compartment shuttlebox and taught a • normal escape/avoidance reaction • dogs could avoid shock by responding during a 10-s • warning L or escape shock once it came on by jumping • to other side of compartment • if subject did not respond in 60 s, the shock was • terminated Thus, the experiment tested whether prior inescapable shock affected escape/avoidance learning

The learned-helplessness effect Results: The Escape group learned as easily as the Control group The Yoked group showed an impairment This deficit in learning is the learned-helplessness effect

The learned-helplessness effect • the yoked group received the same number of shocks as the escape group, so the failure to learn is not simply due to having received shock in phase 1 • rather, the failure to learn was due to the inability to control shock in phase 1 • no matter which response they performed, their behavior was unrelated to shock offset in phase 1 • according to Seligman and Maier, the lack of control in phase 1 led to the development of the general expectation that behavior is irrelevant to the shock offset • this expectation of lack of control transferred to the new situation in phase 2, causing retardation of learning

Explanations of the LH effect The learned-helplessness hypothesis • based on the conclusion that animals can perceive the • contingency between their behavior and the reinforcer • so, the original theory emphasized the lack of control • over outcomes • according to this position, when the outcomes are • independent of the subject’s behavior, the subject • develops a state of learned helplessness which is • manifest in 2 ways: • there is a motivations loss indicated by a decline in • performance and heightened level of passivity • the subject has a generalized expectation that reinforcers • will continue to be independent of its behavior • this persistent belief is the cause of the future learning deficit

The LH hypothesis has been challenged by studies showing that it is not the lack of control that leads to the LH outcome, but rather the inability to predict the reinforcer • receiving predictable, inescapable shock is less damaging than receiving unsignaled shock • if inescapable shock is signaled, then see less learning deficit • if you present a cue that tells the animal the shock is coming, • then see less learning deficit • animal still can’t escape the shock (i.e., still uncontrollable), • but they know its coming • presentation of stimuli following offset of inescapable eliminates the LH deficit • this was demonstrated in an experiment by Jackson & Minor, • (1988)

Jackson & Minor (1988) 4 groups of rats Phase 1: Escape Group: • received unsignaled shock that rats could terminate by turning • a small wheel There were 2 Yoked groups Feedback Group: • house-light was turned off for a few seconds when shock ended No-Feedback Group: • no stimulus was given when shock was turned off No-Shock Control Group: Phase 2: • all rats trained in a shuttlebox where they could run to other • side to turn off shock

Results: Escape and No shock groups performed better than the Yoked Group – this is the typical LH effect Yoked/feedback group learned as well as the Escape and No shock groups

The Jackson & Minor (1988) experiment demonstrated that receiving a feedback stimulus following shock offset eliminated the typical learning deficit So, the learning deficit is not due to lack of control, as suggested by the LH Hypothesis, but rather, it is due to a lack of predictability

Explanations of the LH effect 1. The learned-helplessness hypothesis 2. Attentional deficits • inescapable shock may cause animals to pay less attention • to their actions • if an animal fails to pay attention to its behavior, it will • have difficulty associating its actions with reinforces in • escape-avoidance conditioning • if the response is marked by an external stimulus, which • helps the animal pay attention to the appropriate response, • the LH deficit is reduced • demonstrated in an experiment by Maier, Jackson & Tomie • (1987) – described on p 154-155 of your text.

Chapter 6 Schedules of reinforcement

Schedules of Reinforcement A schedule of reinforcement is a program, or rule, that determines how and when the occurrence of a response will be followed by a reinforcer The simplest schedule is a continuous reinforcement schedule, abbreviated CRF - every response = reinforcer - rare in everyday life - more common to have partial, or intermittent, reinforcement

Schedules of Intermittent Reinforcement There are 4 basic schedules of intermittent reinforcement Ratio schedules – reward is based on the number of responses Interval schedules – a response that occurs after a certain period of time has elapsed is reinforced Each of these classes is further divided into Fixed or Variable schedules

Schedules of Intermittent Reinforcement Fixed Interval Fixed Ratio Variable Ratio Variable Interval

Fixed Ratio (FR) There must be a specified number of responses to obtain each reinforcer. This number is FIXED (i.e., the same number to obtain each reinforcer) thus the reinforcer is predictable. Effect: Post-reinforcement pause (PRP; longer with higher ratios) followed by a ratio run (NB: ratio strain when very high ratios required) Example: ‘piecework’ in factories

PRP Ratio Run

Variable Ratio (VR) There must a specified number of responses made to obtain each reinforcer The specified number of responses required varies from one reinforcer to the next; Thus the reinfrocer is unpredictable. Effect: generally no PRP; steady very high rate Example: slot machines

Fixed Interval (FI) A specified interval of time must pass during which no reinforcer is delivered no matter how many responses are made Once the interval is over, then the next response produces the reinforcer The time interval is fixed (i.e., the same time interval must pass before each reinforcer), thus the reinforcer is predictable. Effect: Postreinforcement pause (PRP) followed by gradually accelerating response rate, producing the characteristic FI scallop Example: Exams spaced apart at certain intervals – study more closer to MT and Final

Variable Interval (VI) A specified interval of time must pass during which no reinforcer is delivered no matter how many responses are made Once the interval is over, then the next response produces the reinforcer The specified length of the time interval varies from one reinforcer to the next, thus the reinforcer is not predictable Effect: generally no PRP; stable rate of responding Example: time between customers in a store

Instrumental/Operant Conditioning

Instrumental/Operant Conditioning

Presentation Transcript

Instrumental or operant conditioning

Operant Conditioning

Psychology in Action (8e) by Karen Huffman

DAGs intro with exercises 3h D irected A cyclic G raph

Loneliness

Learning

Basic Principles of Operant Conditioning

Instrumental Conditioning

Myers PSYCHOLOGY

Module 22: Operant Conditioning

Instrumental Conditioning

Operant Conditioning

Operant Conditioning

Operant Conditioning

Operant Conditioning

Intro to Operant Conditioning

Operant Conditioning:

Operant Conditioning

Operant Conditioning

CHAPTER 12

Operant Conditioning

Module 10