Operant Conditioning

Chapter 13, Unit 4 Psychology Operant Conditioning

While CC is useful for explaining learned behaviour, there are many other learned behaviours that CC cannot explain, such as behaviours that are voluntary. • Much of our learning occurs by trial and error. • We all make adjustments to our behaviour according to the outcomes or consequences it produces. • Operant conditioning is the learning that take places as a result of these consequences. Introduction…

Trial and error learning • Describes an organism’s attempts to learn, or to solve a problem, by trying alternative possibilities until a correct solution or desired outcome is achieved. • Involves a number of attempts (trials) and a number of incorrect choices (errors) before the correct behaviour is learned. • Also referred to as instrumental learning, as the individual is ‘instrumental’ in learning the correct response. • More recently however, this learning has been referred to as operant conditioning, because the individual ‘operates’ on the environment to solve a problem.

Trial and error learning cont… • It involves: • Motivation • Exploration • Incorrect and correct responses • Reward • Receiving a reward of some kind leads to the repeated performance of the correct response, strengthening the association between the behaviour and its outcome.

Thorndike’s experiments with cats • American psychologist Edward Lee Thorndike (1874-1949) undertook the first studies of trial and error learning. • He put a hungry cat in a ‘puzzle box’ and placed a piece of fish outside the box where it could be seen (and smelt) but was just outside of the cat’s reach. • The cat had to learn to escape from the box by operating a latch to escape from the box by operating a latch to release a door on the side of the box. • Learning was measured by the time it took the cat, on consecutive trials to escape.

The law of effect • The results of Thorndike’s experiments left Thorndike to develop the law of effect. • It states that a behaviour that is followed by ‘satisfying’ consequences is strengthened and a behaviour that is followed by ‘annoying’ consequences is weakened. • In the puzzle-box experiments, behaviour that enabled the cat to escape and get to the food (satisfying) was more likely to occur and behaviour that kept the cat in the box (annoying) was less likely to occur. • The cat became instrumental in obtaining its release to get the food.

Operant conditioning • Term operant conditioning (OC) was not introduced until some years after Thorndike’s experiments with cats escaping from puzzle boxes. • OC was coined by American psychologist Burrhus Skinner. • He referred to the responses observed in a trial and error learning as ‘operants’. • An operant is a response that occurs and acts on the environment to produce some kind of effect. • OC is based on the principle that an organism will tend to repeat behaviours that have desirable consequences, or that will enable it to avoid undesirable consequences. • Organisms will tend not to repeat behaviours that have undesirable consequences.

Burrhus Skinner (1904-1990) • Began his own experiments in the 1930s, but used to term operant conditioning to emphasise that animals and people learn to operate on the environment to produce desired consequences. • He also contrasted operants with respondents in CC. • Respondents are behaviours elicited by known or recognised stimuli (e.g. the meat powder making the dog salivate in Pavlov’s experiment). • He believed that all behaviour can be explained by the relationships between the behaviour, its antecedents (the events that precede or come before it), and its consequences. • He argued that any behaviour that is followed by a consequence will change in strength and frequency depending on the nature if that consequence.

The Skinner box • He created an apparatus called a Skinner box, which is a small chamber in which an experimental animal learns to make a particular response for which the consequences can be controlled by the researcher. • It is attached to a cumulative recorder which indicates how often each response is made (frequency) and the rate of response (speed).

Skinner’s experiments with rats • 1938 – Skinner uses the box to demonstrate OC. • 1. a hungry rat is placed in the box. • 2. it scurries around and randomly touches parts of the floor and walls. • 3. rat accidentally presses the lever and rat food is released into the box. • After additional repetitions, the rat’s random acts subsided and were replaced with more consistent lever pressing. • Eventually, the rat was pressing the lever as fast as it could eat each pellet. • Pellet is reward for the correct response. • Skinner referred to different types of rewards as a reinforcer.

Skinner’s experiment cont… • The hunger of the rats was their motivation for frantic activity. • Skinner believed that there was no need to search for internal agents to explain changes in behaviour. • This view was based on the notion that behaviour can be understood in terms of environmental or external influences.

Reinforcement & Punishment Elements of operant conditioning

Elements of operant conditioning • Central to OC is reinforcement (reward). • A response that is rewarded is strengthened, whereas one that is punished is weakened.

Reinforcement • Reinforcement may involve receiving a pleasant stimulus or ‘escaping’ an unpleasant stimulus. • Reinforcement is applying a positive stimulus or removing a negative stimulus to subsequently strengthen or increase the likelihood of a particular response that it follows. • A reinforcer is any object or event that changes the probability that an operant behaviour will occur again. • The term reinforcer is often used interchangeably with the term reward although they are not technically the same. • 1 difference is that a reward suggests an outcome that is positive& a stimulus is a reinforcer if it strengthens the preceding behaviour. • Also a stimulus can be rewarding because it is pleasurable, but it cannot be said to reinforce unless it increases the likelihood of a response occurring. A person might enjoy eating chocolate & find it pleasurable, but chocolate cannot be considered to be a reinforcer unless it promotes or strengthens a particular response.

Schedules of reinforcement • The schedule of reinforcement is the way in which the reinforcement is delivered in experimental settings. • It influences the speed of learning and the strength of the learned response. • Reinforcement may be provided on a continuous or partial reinforcement schedule. • Continuous reinforcement is when every correct response in the early stages of learning is reinforced (the reinforcer is typically provided immediately after every correct response). • Partial reinforcement is the process of reinforcing some correct responses but not all of them. It may be delivered in number of ways or by different ‘schedules’.

The term schedule of reinforcement refers to the frequency and manner in which a desired response is reinforced. For instance, reinforcement can be given after a certain number of correct response have been made (i.e. after an interval). Furthermore, reinforcement may be given on a regular basis, such as after every 6th correct response, or every 30 seconds following a correct response (that is, fixed); or it may be unpredictable (that is, variable) Schedules of reinforcement Cont…

Positive reinforcement • The food pellet in the Skinner box is a positive reinforcer for the hungry rat pressing the lever. • A positive reinforcer is a stimulus that strengthens or increases the likelihood of a desired response by providing a satisfying consequence (reward). • Positive reinforcement occurs from giving or applying a positive reinforcer after the desired response has been made.

Negative reinforcement • A negative reinforcer is any unpleasant or aversive stimulus that, when removed or avoided, strengthens or increases the likelihood of a desired response. • Negative reinforcement is the removal or avoidance of an unpleasant stimulus. It has the effect of increasing the likelihood of a response being repeated. • E.g. a Skinner box has a grid on the floor through which a mild electrical current can be passed continuously. The rat can feel the unpleasant foot shock (stimulus). When the rat presses the lever, the electric current is switched off and the mild shock is taken away. • The removal of the shock (negative reinforcer) is referred to as negative reinforcement.

Distinction between + & - reinforcers • Positive reinforcers are given and negative reinforcers are removed or avoided. • Yet because both procedures lead to desirable consequences, each procedure strengthens (reinforces) the behaviour that produced the consequence. • Examples of negative reinforcement in everyday life: • Turning off a scary video • Taking an aspirin to remove a headache • Not drink-driving for fear of losing your license • In these examples, the removal of the negative reinforcer is providing a satisfying or desirable consequence.

Positive reinforcer (+) =adding something pleasant Negative reinforcer (-) = subtracting something unpleasant A quick calculation

punishment • Punishment is the delivery of an unpleasant stimulus following a response, or the removal of a pleasant stimulus following a response. • It has the same unpleasant quality as a negative reinforcer, the punishment is given or applied, whereas the negative reinforcer is prevented or avoided. • When closely associated with a response, punishment weakens the response, or decreases the probability of that response occurring again over time.

Factors that influence the effectiveness of reinforcement and punishment • Reinforcement is intended to increase the likelihood of a behaviour being repeated and punishment is intended to decrease the likelihood of behaviour being repeated. • In OC, what happens after the desired response is performed is very important in determining the strength of learning and the rate at which is occurs. • E.g. when in the process of OC the consequence is presented, the time lapse between the response and consequence, and the appropriateness of the consequence used are all important in determining the effectiveness of reinforcement or punishment and therefore learning.

Order of presentation • For reinforcement and punishment to be used effectively, it must be presented after a desired response, never before. • This ensures that an organism learns the consequences of a particular response. • E.g., presenting a child with a lolly after every time they use the toilet instead of their nappy when they are in the process of being toilet trained.

timing • Reinforcement and punishment are most effective when given immediatelyafter the response has occurred. • This allows for association between the response and the reinforcer or punisher. • It also influences the strength of the response, e.g., if there is a delay, the learning will generally be very slow to progress and in some cases may not occur at all. • This is easily controlled in a lab, but not as easy in everyday life. • E.g. a delay between studying hard in Year 12 and receiving your desired ENTER. • Or receiving a detention for misbehaviour can occur more than one day after the misdemeanour.

appropriateness • For any stimulus to be a reinforcer, it must provide a pleasing or satisfying consequence (reward) its recipient. • Technically, it will not be known if something will act as reinforcer until after it has been used. • Also it cannot be assumed that a reinforcer that works in one situation will work in another. • Similarly, for any stimulus to be an appropriate punisher, it must provide a consequence that is unpleasant and therefore likely to decrease the likelihood of the undesirable behaviour. • An inappropriate punisher can have the opposite effect and produce the same consequence as a reinforcer.

Acquisition, Extinction, Spontaneous Recovery, Stimulus Generalisation, Stimulus Discrimination – involved in both CC and OC, however, the way in which these processes occur is slightly different in operant conditioning. Key processes in operant conditioning

acquisition • Refers to the overall learning process during which a specific response is established. • Differs from acquisition in CC as the means by which the behaviour is acquired is different & the types of behaviours acquired through OC are usually more complex than the reflexive, involuntary responses that became learned responses in CC. • In OC, acquisition is the establishment of a response through reinforcement. • The speed that the response is established depends on whether continuous or partial reinforcement is used. • Also, a gradual progression towards a more complex target behaviour can be achieved, by reinforcing successive approximations. This is known as shaping.

Acquisition cont… • Shaping is a procedure in which reinforcement is given for any response that successively approximates and ultimately leads to the final desired response, or target behaviour. • Consequently, shaping is also known as the method of successive approximations. • Skinner used shaping in 1 experiment where he set a target behaviour for a pigeon to turn a complete circle in an anticlockwise direction. • He initially continually reinforced the pigeon with a food pellet that was delivered through a mechanically operated door every time it turned slightly to the left. • He then waited until the pigeon turned left further before reinforcing it. • By limiting the reinforcement only to those responses that gradually edged towards the target behaviour, Skinner was able to condition the pigeon to turn complete circles regularly.

Extinction • In OC, extinction is the gradual decrease in the strength or rate of a conditioned (learned) response following consistent non-reinforcement of the response. • It is said to occur when a conditioned response is no longer present. • With OC, extinction occurs over time, but after reinforcement is no longer given. • E.g., when Skinner sopped reinforcing his rats or pigeons with food pellets, their conditioned response (e.g. of lever pressing or turning circles) was eventually extinguished. • Extinction is less likely to occur when partial reinforcement is used; i.e. when reinforcement does not regularly follow every correct response, as the uncertainty of the reinforcement leads to a greater tendency for the response to continue.

Spontaneous recovery • Same as in CC, extinction is often not permanent in OC. • After the apparent extinction of the CR, spontaneous recovery can occur and the organism will once again show the response in the absence of any reinforcement.

Stimulus generalisation • This occurs when the correct response is made to another stimulus that is similar (but not necessarily identical) to the stimulus that was present when the CR was reinforced (usually at a reduced level). • E.g. the sound of a car back firing as it goes past an athletics carnival may cause the athletes to generalise this sound to that of the starter’s pistol.

Stimulus discrimination • In OC, stimulus discrimination occurs when an organism makes the correct response to a stimulus and is reinforced, but does not respond to any other stimulus, even when stimuli are similar (but not identical). • Skinner taught lab animals to discriminate between similar stimuli by reinforcing some responses but not others. • E.g. a pigeon in a Skinner box could be taught to discriminate between a red and a green light, by reinforcing the pigeon when it pecked a target when the green light was illuminated, but not when the red one was. • Also, sniffer dogs are used in airports throughout the world to detect the smuggling on contraband items (e.g. drugs). • They have been taught this by OC.

The role of the learning, timing of the stimulus and response, the nature of the response Comparison of cC and oc

Similarities of CC and Oc • Acquisition • Extinction • Spontaneous recovery • Stimulus generalisation and discrimination • Both types of conditioning are achieved as a result of the repeated association of 2 events that follow each other closely in time. • These similarities have led some psychologists to believe that both OC & CC are variants of a single learning process. • E.g. when Little Albert learned to fear the rat, his response (trembling) was CC. But when he learned to avoid the rat by crawling away (a response that had the effect of reducing his fear), that was an example of OC

Differences of CC & OC • OC • Emphasis on the consequences of a response. • Involves voluntary responses • CC • The behaviour of the learner does not have any environmental consequences. • Response is involuntary.

The role of the learner • In CC the learner is relatively passive when either the CS of the UCS is presented. • In OC the learner must actively operate on the environment so as to obtain the reinforcement or the punishment.

Timing of the stimulus and response • In CC the response depends on the presentation of the UCS occurring first. • In OC the presentation of the reinforcer depends on the response occurring first. • In CC, the timing of the 2 stimuli (CS, then UCS), produces an association between them that conditions the learner to anticipate the UCS and respond to it even if is not presented. • In OC, the association that is conditioned is between the stimulus and the response. • In CC the timing of the 2 stimuli needs to be very close and the sequencing is vital – the CS must come before the UCS. • In OC, while learning generally occurs faster when the reinforcement or punishment occurs soon after the response, there can be a considerable time difference between them.

The nature of the response • In CC, the response by the learner is usually a reflexive, involuntary one. • In OC, the response by the learner is usually a voluntary one. • In CC, the response is often one involving the action of the autonomic nervous system, and the association of the 2 stimuli is often not a conscious or deliberate one. • In OC, the response is more likely to involve the central nervous system and to be conscious, intentional and often goal-directed.

Operant Conditioning