Back to Journal Cover Page
Volume 6, Number 2
Submitted: September 8, 2000
Resubmitted: January 22, 2001
Accepted: January 23, 2001
Publication date: January 26, 2001
United States Military Academy at West Point
Unlike most research on reinforcement schedules, which has utilized animal subjects, this study utilized human subjects. When human subjects have been used in the past, the subjects have not usually received accurate knowledge of their prior performance. Moreover, with animals, it is probably impossible both to convey information about prior results to them and determine their awareness of it. In contrast, this operant experiment measured response times in human subjects after the subjects were accurately informed of their prior performance. Sixteen college students in an experimental psychology class were tested for reaction time in groups under different reinforcement schedules (control, 0.5, and 0.1). The six control group subjects received no knowledge of their results; the five 0.5 group subjects received knowledge of results and were reinforced when their reaction time was quicker than the median time of their previous 40 trials; and the 0.1 group (of five subjects) received knowledge of results and were reinforced when their reaction time was in the top 10% of their previous 40 trials. It was found that the relationship between reaction time and knowledge of results and reinforcement was equivocal. However, the slow reaction time of possibly one subject in the 0.5 group probably resulted in inordinately slow average reaction times that distorted the results. There remains support for the contention that reinforcement schedules improve reaction times in humans and that successive approximation remains useful.
Since Thorndike (1898), the idea of learning through reinforcement has gathered great impetus and achieved much success. However, Thorndike's learning model was considered incomplete because he manipulated and changed only behaviors that had a non-zero probability of occurring. Skinner (1938) also recognized the possibility of changing behaviors that had an initial non-zero probability of occurring. However, Skinner went further and used shaping to produce new (zero probability) behavior. His concept of shaping, or successive approximation, involved reinforcing behaviors that were successively closer to a targeted response. Behavior could be selected or extinguished by using current configurations of reinforcement contingenices in the manner in which species adapt to environmental conditions. Skinner believed that behavior could be shaped until the subject had reached her or his biological potential. Accordingly, to measure the ability of humans to adapt to different reinforcements, this study was conducted to determine humans ability to differentiate among responses and to determine the effect that reinforcement schedules have on the reaction times of humans (see, generally, Galbicka, 1988; Galbicka, 1994).
Allman and Platt (1973) sought to refine Skinner's technique of successive approximation. They suggested that the important variable in shaping was not the degree of contact with reinforcement, but rather the usually confounded variable of selectiveness of reinforcement. Thus, Platt (1974) concluded that, to control contact and selectiveness of reinforcement in response shaping, a researcher must be able to "order" a set of behaviors with respect to response generalization. Platt based his notion on Thorndike's simple concept of contact, that is, the sense that a larger proportion of behavioral events are drawn from the reinforced class (although additional work focuses on independent variables rather than "contact;" Galbicka, 1988 & 1999).
Accordingly, Platt introduced "selectiveness." Platt defined measurable behaviors by determining what proportion of measurable behaviors emitted by a subject fell within the range of reinforced values on a pre-determined shaping dimension. Thus, any criteria that reinforce all behavioral events near the targeted behavior will represent the same degree of selectiveness. Criteria that reinforce all behavioral events, within a specific range of values, and which contain a larger proportion of current measurable behaviors, will represent both less selectiveness and a different degree of contact. Consequently, Platt developed a percentile reinforcement schedule that would adequately introduce selectiveness and control contact.
In analyzing the effects of the frequency of reinforcements, researchers have often suggested that variable or intermittent reinforcement schedules are most effective in shaping human behavior (Barnes & Keenan, 1994; Hantula, 1992; and Whitehall & McDonald, 1993) (see generally Goltz, 1992). Platt (1974) believed that by utilizing percentile reinforcement schedules a researcher could analyze variables that had been accepted as facilitating behavioral change, but had remained confusing. Reinforcement must be defined in terms of an organism's current behavior so as to control particular relationships between behavior and reinforcement criteria. Platt determined that, to achieve a maximum behavioral effect, researchers must control the proportion of measurable behavioral events by setting a specific value within the range of reinforcement values on the shaping dimension. Essentially, reinforcement and knowledge of results (after either a correct or incorrect response) can facilitate behavioral change. The present study tried to determine what reinforcement schedule would provide maximum change.
Although much research has been done on reaction time (RT), there has been less research on the effect that knowledge of results (KR) has on behavior, especially with regard to humans. Bower and Ongley (1975) have indicated that reaction time was shorter with knowledge of results than with no knowledge of results. Rychto (1973) also found that reaction time improved when subjects had full knowledge of their performance and when they received an evaluation from the experimenter. While the subjects in this study may not have acquired full knowledge of their prior performance, the subjects current performance was based on a general, if not specific and full, knowledge of the connections between prior and current reaction times, and not just on prior performance itself. Therefore, the construction of this experiment raises but does not answer the question of whether full and less than full knowledge of results can serve as equivalent reinforcers.
This experiment utilizes human subjects, while most of the research on reinforcement schedules has utilized animals, usually rats and pigeons. In rats, a greater frequency of food delivery increases their responses (Lydersen, 1993), and their response rates peak approximately twenty minutes after the beginning of a session, regardless of the length of a session (McSweeney, 1992). Armus (1988) found that reaction times of rats can be shortened to less than 1 second, especially when a fixed rate reinforcement schedule is used, when the rats are required to exhibit increasingly higher response effort. Like Armus (1988), Elsmore and McBride (1994), in an eight-arm radial maze study, achieved similar results with both fixed internal and random internal schedules of reinforcement. But Huang, Krukar, and Miles (1992) found that rats respond more frequently after receiving continuous reinforcement as opposed to partial reinforcement. Yet optimality theory suggests that periodic responding is more effective in shaping behavior than random responding (Broadbent, 1994).
However, there is little research on how percentile reinforcement schedules, combined with knowledge of results, affect reaction times in humans, and there is little research on humans and percentile schedules at all. Although research may involve humans, it does not involve reaction times. [See Cohen & Blair, (1998).] In one experiment involving human subjects, Hantula and Crowell (1994) found that only subjects who experienced irregular-partial (compared with subjects receiving intermittent and continuous reinforcement) results from a stock investment continued investing in a failed stock, although the issue of accuracy was not material to this study. In contrast, in the present study, the human subjects were informed correctly about their prior performance. Researchers (Neef, Mace, & Shade, 1993) have provided human subjects with access to their performance, but their research focused on only two subjects, both of whom were seriously emotionally disturbed, and their results were highly variable.
Other research, while useful, has incorporated only "gross" reinforcement. That is, researchers have administered reinforcement based on a subject's performance, but the subject was cognitively unaware of her or his actual performance. The subject may have accurately inferred from "honest" (as opposed to manipulated) positive or negative reinforcement her or his actual performance. But researchers have not both expressly and accurately informed (i.e., through knowledge of results) human subjects about their actual performance. For example, Cerutti (1994) shaped guessing among subjects but did not inform them about their guesses, and Grabitz and Hammerl (1993) analyzed sequential and quantitative constraints. In neither experiment were the subjects explicitly aware of their actual performances. Also, Williams and Johnston (1992) analyzed conjugate reinforcement schedules. In the present study, except with regard to the control group (which received no reinforcement), the subjects received at least accurate information about their prior performances.
Sixteen students from a college experimental psychology class served as subjects. They were randomly assigned to three experimental groups (6 in the control group and 5 in each of the other groups).
A standard sound-proof room served as the test room. It was illuminated by an overhead 25-watt non-glare light bulb. Each subject sat in a three foot high swivel chair and was administered the required trials by a Cromemco Act V computer.
The reinforcement schedule consisted of control, 0.5, and 0.1 conditions. In the control group, six subjects were administered 240 trials each day for three consecutive days. Trials were given in blocks of 60 so that means and medians were recorded for each block as well as for each day. To begin each trial, the phrase "get ready" would flash on the computer's screen. After the initial presentation of the "get ready" signal, 1300, 1500, or 1800 milliseconds (ms) would elapse, varied randomly, before a "beep" would sound. Subjects were instructed to press an appropriate key on a keyboard as quickly as possible after hearing the beep. The "get ready" would disappear 500 ms after the beep, leaving the screen blank for 3500 ms. The subjects in the control group received no reinforcement. Next, a "get ready" would signal a new trial.
The 0.5 group contained five subjects and was identical to the control group, except that instead of viewing a blank screen after a response, the five 0.5 group subjects would receive results of their presses. That is, if a response was made more quickly than the median response for the 40 previous trials, the phrase "Good-That's a fast one" appeared on the screen for 1500 ms. [Until the first 40 trials were completed each day, the results were based on how many trials had been completed already.] If a subjects response was slower than the median of the previous 40 trials, then the phrase "too slow" would appear for 1500 ms. Following these results, 2000 ms would elapse before the next trial began. The comparison memory always included the 40 most recent trials (i.e., the memory continuously updated).
The 0.1 group consisted of five subjects and was identical to the 0.5 group, except that to achieve the reward of "Good--that's a fast one" on the screen, the subjects had to respond, with regard to reaction time, in the top 10% of the responses for their previous 40 trials.
In all groups, if a subject anticipated the "beep" and pressed the key before the beep or within 80 ms after the beep (a response time that was considered physically impossible in this study), the phrase "you jumped the gun--prepare for the next trial" appeared, and that trial was discarded and another replaced it. The computer stored all results for future reference.
The means of the reaction times and the means of the medians of the reaction times (in milliseconds) for each group are shown in Table 1.
Table 1. Groups
Means of Medians (in milliseconds)
A one-way analysis of variance was conducted on the blocks of scores. The main effect of reinforcement was found to be insignificant (F = 0.66, p> .05). Also, a one-way analysis of variance was used to compare performance on the initial block of trials with the performance on the final four blocks of trials. This too was found to be insignificant (F = 0.75, p> .05).
A one-way analysis of variance comparing the medians of the blocks of the groups proved insignificant (F = 0.49, p> .05). Additionally, a one-way analysis of variance conducted on the medians of the final four blocks of the groups and the initial blocks of the groups proved insignificant (F = 0.80, p> .05).
The results are equivocal regarding the proposition that reinforcement schedules and successive approximation affect behavioral change. As shown in Table 1, the subjects in the 0.5 group, a response-reinforced group that received knowledge of results, took longer to respond than the subjects in the control group, who received no reinforcement and no knowledge of results. However, the 0.1 group (the group that received knowledge of results and had its responses reinforced only if a subject's response was in the top 10% of the previous 40 responses), as expected, consistently exhibited quicker reaction times than the other groups.
Rather than conclude that reinforcement schedules are invalid, it would be expedient to conduct this study using a larger number of subjects. For example, each condition (control, .05, and .01 could contain a minimum of 10 subjects. Although the analyses of variances proved insignificant, the control and 0.1 groups nevertheless performed as expected. Thus, the unaccountably slow reaction times of the 0.5 group caused the statistical insignificance. This was apparently due to the inordinately slow reaction times of one subject in the 0.5 group. Furthermore, the unusually large standard deviations for the 0.5 group, shown in Table 1, indicate that even one subject with slow reaction times may have distorted the analysis.
To extend a future study further, an alternative control group might be considered. That is, in the present study, the control group received no response feedback, but the .01 and .05 groups received feedback on the speed of their responses. A new control group might receive the same frequency of feedback independent of the reaction time, although such feedback might not be characterized as "accurate" feedback in that this feedback would be received regardless of the subjects reaction times (fast or slow). Nonetheless, such a control group would be receiving the same exposure to feedback (i. e., "contact;" Platt, 1974) as the .01 and .05 groups. This alternative control group would make it possible to compare the selective and non-selective effects of differential feedback of reaction times.
Finally, instead of thinking in terms of humans knowledge of results, perhaps future researchers might characterize the computer messages as "reinforcers" and the procedure employed as "response feedback." However, while this characterization would utilize more traditional terminology, the characterization would be general in nature might not help to determine the exact relationship between the responses and the consequences (knowledge of results) in the experiment. Moreover, the fact that there were no differences between reaction times on the initial and final four blocks (for both mean and median reaction times) may lead to a conclusion that practice did not affect reaction times. Although such a conclusion seems counter-intuitive, even unsupported by other research, it may be that reinforcement was not adequately presented in this experiment. It is possible that the subjects quickly reached asymptote in their reaction times.
It would be helpful to replicate this experiment with three groups (control, 0.5, and 0.1), using perhaps 18 or 21 subjects as opposed to the 16 in this experiment, to determine whether reinforcement existed and what role the computer messages served. Also, a within-subject analysis could be conducted wherein subjects could be informed randomly of their reaction times before they are informed of their reaction times on the basis of a reinforcement schedule. In a new study, these subjects might receive more material reinforcers (such as money or credit), rather than the reinforcer, "Good-Thats a fast one," which is dependent solely on individual "feelings." Also, the subjects could be exposed to the reinforcers for longer periods of time to measure acquisition even more precisely.
In future experiments, subjects might be tested in each group. By putting the subjects from the 0.5 group in either the control or 0.1 group, and then comparing previous means and results of all groups, a more comprehensive conclusion might ultimately be reached. The inordinately slow responses of one subject would thereby be mitigated. Also, it would be useful simply to increase the number of subjects. Then, sharper correlations could be drawn by reporting reaction times for blocks of trials during the experiment. If the reinforced subjects did not show greater improvement, albeit some improvement, than the control (or non-reinforced) subjects, then it could be determined whether the improvement was attributable to practice or reinforcement.
Allman, H. D. and J. R. Platt (1973). "Differential reinforcement of interresponse times with controlled probability of reinforcement per response." Learning and Motivation, 4:40-73.
Armus, H. L. (1988). "Effect of response effort requirement on relative frequency of short interresponse times: CRF and FR-5 reinforcement schedules." Bulletin of the Psychonomic Society, 26:139-40.
Barnes, D. and Keenan, M. (1994). "Response-reinforcer contiguity and human performance on simple time-based reinforcement schedules." Psychological Record, 44:63-90.
Bower, G. H. and G. C. Ongley (1975). "The effect of knowledge of results upon contingent negative variation in a reaction time situation with a variable foreperiod." Physiological Psychology, 3:257-260.
Broadbent, H. A. (1994). "Periodic behavior in a random environment." Journal of Experimental Psychology: Animal Behavior Processes, 20:156-175.
Cerutti, D. T. (1994). "Compliance with instructions: effects of randomness in scheduling and monitoring." Psychological Record 44:259-269.
Cohen, D. J. and C. Blair (1998). "Mental rotation and temporal contingencies." Journal of the Experimental Analysis of Behavior, 70:203-214.
Elsmore, T. F. and S. A. McBride (1994). "An eight alternative concurrent schedule: Foraging in a radial maze." Journal of the Experimental Analysis of Behavior, 6:331-348.
Galbicka, G. (1988). "Differentiating the behavior of organisms." Journal of the Experiemental Analysis of Behavior Analysis, 50:343- 354.
Galbicka, G. (1994). "Shaping in the 21st Century: moving percentile schedules into applied settings." Journal of Applied Behavior Analysis, 27:739-760.
Goltz, S. M. (1992). "A sequential learning analysis of decisions in organizations to escalate investments despite continuing costs or losses." Journal of Applied Behavior Analysis, 25:561-574.
Grabitz, H. and M. Hammerl (1993). "Transfer effects as a function of sequential and quantitative schedule constraints." Integrative Physiological and Behavioral Science, 28:182-185.
Hantula, D. A. (1992). "The basic importance of escalation." Journal of Applied Behavior Analysis, 25:579-583.
Hantula, D. A. and C. R. Crowell (1994). "Intermittent reinforcement and escalation processes in sequential decision making: A replication and theoretical analysis." Journal of Organizational Behavior Management. 14:7-36.
Huang, I., J. D. Krukar and S. P. Miles (1992). "Effects on reinforcement schedules on rats' choice behavior in extinction." Journal of General Psychology, 119:201-211.
Lydersen, T. (1993). "Schedule-induced timeout: Effects of timeout-contingent delayed reinforcement." Behavioural Processes, 31:323-335.
McSweeney, F. K. (1992). "Rate of reinforcement and session duration as determinants of within-session patterns of 15 responding." Animals Learning and Behavior, 20:160-169.
Neef, N. A., F. C. Mace and D. B. Shade (1993). "Impulsivity in students with serious emotional disturbance: The interactive effects of reinforcer rate, delay, and equality." Journal of Applied Behavior Analysis, 26:37-52.
Platt, J. R. (1974). "Percentile reinforcement: paradigms for experimental analysis of response shaping." In G. H. Bower (Ed.), The Psychology of Learning and Motivation: Advances in Research and Theory (pp. 271-296). San Diego, CA: Academic Press.
Rychto, T. (1973). "Effects of knowledge of results on simple reaction time." Polish Psychological Bulletin, 4 (1):23-25.
Skinner, B. F. (1938). The Behavior of Organisms. New York: Appleton.
Thorndike, E. L. (1898). "Animal Intelligence: An experimental study of the associative processes in animals." Psychological Review Monographs, 2 (4, Whole No. 8).
Whitehall, B. V. and B. A. McDonald (1993). "Improving learning persistence of military personnel by enhancing motivation in a technological training program." Simulation and Gaming, 24:294-313.
Williams, D.C. and J. M. Johnston (1992). "Continuous versus discrete dimensions of reinforcement schedules: An integrative analysis." Journal of the Experimental Analysis of Behavior, 58(1):205-228.
Tim Bakken is an assistant professor of law at the United States Military Academy at West Point. He received J.D. and M.S. degrees from the University of Wisconsin at Madison. Tim-Bakken@usma.edu.
Back to Journal Cover Page