Educational Psychology Interactive: Operant Conditioning

An Introduction to Operant (Instrumental) Conditioning

Citation: Huitt, W., & Hummel, J. (1997). An introduction to operant (instrumental) conditioning. Educational Psychology Interactive. Valdosta, GA: Valdosta State University. Retrieved [date] from, http://www.edpsycinteractive.org/topics/behavior/operant.html

A human being fashions his consequences as surely as he fashions his goods or his dwelling. Nothing that he says, thinks or does is without consequences.
- Norman Cousins, 20^th century editor and author

The major theorists for the development of operant conditioning are Edward Thorndike, John Watson, and B. F. Skinner. This approach to behaviorism played a major role in the development of the science of psychology, especially in the United States. They proposed that learning is the result of the application of consequences; that is, learners begin to connect certain responses with certain stimuli. This connection causes the probability of the response to change (i.e., learning occurs.)

Thorndike labeled this type of learning instrumental. Using consequences, he taught kittens to manipulate a latch (e.g., an instrument). Skinner renamed instrumental as operant because it is more descriptive (i.e., in this learning, one is "operating" on, and is influenced by, the environment). Where classical conditioning illustrates S-->R learning, operant conditioning is often viewed as R-->S learning since it is the consequence that follows the response that influences whether the response is likely or unlikely to occur again. It is through operant conditioning that voluntary responses are learned.

The 3-term model of operant conditioning (S--> R -->S) incorporates the concept that responses cannot occur without an environmental event (e.g., an antecedent stimulus) preceding it. While the antecedent stimulus in operant conditioning does not elicit or cause the response (as it does in classical), it can influence it. When the antecedent does influence the likelihood of a response occurring, it is technically called a discriminative stimulus.

It is the stimulus that follows a voluntary response (i.e., the response's consequence) that changes the probability of whether the response is likely or unlikely to occur again. There are two types of consequences: positive (sometimes called pleasant) and negative (sometimes called aversive). These can be added to or taken away from the environment in order to change the probability of a given response occurring again.

General Principles

There are 4 major techniques or methods used in operant conditioning. They result from combining the two major purposes of operant conditioning (increasing or decreasing the probability that a specific behavior will occur in the future), the types of stimuli used (positive/pleasant or negative/aversive), and the action taken (adding or removing the stimulus).

Outcome of Conditioning

Increase Behavior Decrease Behavior

Positive
Stimulus Positive
Reinforcement
(add stimulus)
Response Cost
(remove stimulus)

Negative
Stimulus Negative
Reinforcement
(remove stimulus)
Punishment
(add stimulus)

Schedules of consequences

Stimuli are presented in the environment according to a schedule of which there are two basic categories: continuous and intermittent. Continuous reinforcement simply means that the behavior is followed by a consequence each time it occurs. Intermittent schedules are based either on the passage of time (interval schedules) or the number of correct responses emitted (ratio schedules). The consequence can be delivered based on the same amount of passage of time or the same number of correct responses (fixed) or it could be based on a slightly different amount of time or number of correct responses that vary around a particular number (variable). This results in an four classes of intermittent schedules. [Note: Continuous reinforcement is actually a specific example of a fixed ratio schedule with only one response emitted before a consequence occurs.]

1. Fixed interval -- the first correct response after a set amount of time has passed is reinforced (i.e., a consequence is delivered). The time period required is always the same.

Notice that in the context of positive reinforcement, this schedule produces a scalloping effect during learning (a dramatic dropoff of responding immediately after reinforcement.) Also notice the number of behaviors observed in a 30 minute time period.

2. Variable interval -- the first correct response after a set amount of time has passed is reinforced. After the reinforcement, a new time period (shorter or longer) is set with the average equaling a specific number over a sum total of trials.

Notice that this schedule reduces the scalloping effect and the number of behaviors observed in the 30-minute time period is slightly increased.

3. Fixed ratio -- a reinforcer is given after a specified number of correct responses. This schedule is best for learning a new behavior

Notice that behavior is relatively stable between reinforcements, with a slight delay after a reinforcement is given. Also notice the number of behaviors observed during the 30 minute time period is larger than that seen under either of the interval schedules.

4. Variable ratio -- a reinforcer is given after a set number of correct responses. After reinforcement the number of correct responses necessary for reinforcement changes. This schedule is best for maintaining behavior.

Notice that the number of responses per time period increases as the schedule of reinforcement is changed from fixed interval to variable interval and from fixed ratio to variable ratio.

In summary, the schedules of consequences are often called schedules of reinforcements because there is only one schedule that is appropriate for administering response cost and punishment: continuous or fixed ratio of one. In fact, certainty of the application of a consequence is the most important aspect of using response cost and punishment. Learners must know, without a doubt, that an undesired or inappropriate target behavior will be followed by removal of a positive/pleasant stimulus or the addition of a negative/aversive stimulus. Using an intermittent schedule when one is attempting to reduce a behavior may actually lead to a strengthening of the behavior, certainly an unwanted end result.

Premack Principle

The Premack Principle, often called "grandma's rule," states that a high frequency activity can be used to reinforce low frequency behavior. Access to the preferred activity is contingent on completing the low-frequency behavior. The high frequency behavior to use as a reinforcer can be determined by:

asking students what they would like to do;

observing students during their free time; or

determing what might be expected behavior for a particular age group.

Analyzing Examples of Operant Conditioning

There are five basic processes in operant conditioning: positive and negative reinforcement strengthen behavior; punishment, response cost, and extinction weaken behavior.

Postive Reinforcement--the term reinforcement always indicates a process that strengthens a behavior; the word positive has two cues associated with it. First, a positive or pleasant stimulus is used in the process, and second, the reinforcer is added (i.e., "positive" as in + sign for addition). In positive reinforcement, a positive reinforcer is added after a response and increases the frequency of the response.

Negative Reinforcement-- the term reinforcement always indicates a process that strengthens a behavior; the word negative has two cues associated with it. First, a negative or aversive stimulus is used in the process, and second, the reinforcer is subtracted (i.e., "negative" as in a "-" sign for subtraction). In negative reinforcement, after the response the negative reinforcer is removed which increases the frequency of the response. (Note: There are two types of negative reinforcement: escape and avoidance. In general, the learner must first learn to escape before he or she learns to avoid.)

Response Cost--if positive reinforcement strengthens a response by adding a positive stimulus, then response cost has to weaken a behavior by subtracting a positive stimulus. After the response the positive reinforcer is removed which weakens the frequency of the response.

Punishment--if negative reinforcement strengthens a behavior by subtracting a negative stimulus, than punishment has to weaken a behavior by adding a negative stimulus. After a response a negative or aversive stimulus is added which weakens the frequency of the response.

Extinction--No longer reinforcing a previously reinforced response (using either positive or negative reinforcement) results in the weakening of the frequency of the response.

Rules in analyzing examples. The following questions can help in determining whether operant conditioning has occurred.

a. What behavior in the example was increased or decreased?

b. Was the behavior increased (if yes, the process has the be either positive or negative reinforcement), or decreased (if the behavior was decreased the process is either response cost or punishment).

c. What was the consequence / stimulus that followed the behavior in the example?

d. Was the consequence / stimulus added or removed? If added the process was either positive reinforcement or punishment. If it was subtracted, the process was either negative reinforcement or response cost.

Examples. The following examples are provided to assist you in analyzing examples of operant conditioning.

a. Billy likes to campout in the backyard. He camped-out on every Friday during the month of June. The last time he camped out, some older kids snuck up to his tent while he was sleeping and threw a bucket of cold water on him. Billy has not camped-out for three weeks.

l. What behavior was changed? camping-out

2. Was the behavior strengthened or weakened? weakened (eliminate positive and negative reinforcement)

3. What was the consequence? having water thrown on him

4. Was the consequence added or subtracted? added

Since a consequence was added and the behavior was weakened, the process was punishment.

b. Every time Madge raises her hand in class she is called on. She raised her hand 3 time during the first class, 3 times in the second and 4 times during the last class.

l. What behavior was changed? handraising

2. Was the behavior strengthened or weakened? strengthened (eliminates response cost, punishment, and extinction)

3. What was the consequence? being called on

4. Was the consequence added or subtracted? added

Since the consequence was added and the behavior was strengthened, the process is positive reinforcement.

c. Gregory is being reinforced using a token economy. When he follows a direction / command he earns a point. At the end of each day, he can "buy" freetime, t.v. privileges, etc. with his points. When he misbehaves or doesn't follow a command, he loses points. Andrew used to call his mom names. Since he has been on the point system, his name calling has been reduced to almost zero.

l. What behavior was changed? name calling

2. Was the behavior strengthened or weakened? weakened (eliminate positive and negative reinforcement)

3. What was the consequence? losing points

4. Was the consequence added or subtracted? subtracted

Since the consequence was subtracted and the behavior was weakened, the process is response cost.

d. John does not go to the dentist every 6-months for a checkup. Instead, he waited until a tooth really hurts, then goes to the dentist. After two emergency trips to the dentist, John now goes every 6-months.

1. What behavior was changed? going to the dentist

2. Was the behavior strengthened or weakened? strengthened (eliminate response cost and punishment)

3. What was the consequence? tooth no longer hurting

4. Was the consequence added or subtracted? subtracted

Since the consequence was subtracted and the behavior was strengthened, the process is negative reinforcement.

Applications of Operant Conditioning to Education:

Our knowledge about operant conditioning has greatly influenced educational practices. Children at all ages exhibit behavior. Teachers and parents are, by definition, behavior modifiers (if a child is behaviorally the same at the end of the academic year, you will not have done your job as a teacher; children are supposed to learn (i.e., produce relatively permanent change in behavior or behavior potential) as a result of the experiences they have in the school / classroom setting.

Behavioral studies in classroom settings have established principles that help teachers organize and arrange classroom experiences to facilitate both academic and social behavior. Instruction itself has also been the focus of numerous studies, and has resulted in a variety of teaching models for educators at all levels. Programmed instruction is only one such model. Programmed instruction requires that learning be done in small steps, with the learner being an active participant (rather than passive), and that immediate corrective feedback is provided at each step.

Tutorials:

Positive Reinforcement by Athabasca University
Negative Reinforcement from Maricopa Community College

Return to:

All materials on this website [http://www.edpsycinteractive.org] are, unless otherwise stated, the property of William G. Huitt. Copyright and other intellectual property laws protect these materials. Reproduction or retransmission of the materials, in whole or in part, in any manner, without the prior written consent of the copyright holder, is a violation of copyright law.