Nicoline Grinager Ambrose
and Ehud Yairi,
"The Tudor Study: Data and Ethics,"
American Journal of Speech-Language Pathology,
vol. 11, 190-203, May 2002
Fair Use excerpts reproduced
an additional appendix to
"Retroactive Moral Judgment and the Evolution of Ethics in Human Subjects Research:
A Case Study in Context,"
June 17, 2001
In brief, during 2001 a small, dusty 1939 masters thesis was put under a journalist's microscope and became the target for a national, supermarket-tabloid-style media blitz of criticism involving the ethics of "causing stuttering" in human research subjects.
The relevance of the Ambrose-Yairi study 63 years later is that the authors conclude, in effect, that the 1939 thesis did not prove the theory with which it is credited. In other words, the researcher did not, and could not have, "caused stuttering" in the subjects.
For this and many other reasons the authors also conclude that most all of the ethical criticisms of the study are misplaced and unjustified.
Needless to say, the scientists who have authored this article were not known to, and were in no way encouraged by, Nicholas Johnson. Indeed, their conclusions gain in credibility precisely because they are essentially "attacking" the findings of the researcher and supervisor of the 1939 study -- although the "attack" is that of thoughtful, careful and serious academics, not that of yellow journalists. They simply believe that their "reanalysis of the original data . . . failed to confirm the hypotheses of the investigator and her advisor." In short, their conclusions can in no sense be dismissed as those of colleagues, friends, supporters or apologists for "the investigator and her advisor."
Anyone seriously interested in the subject of the Ambrose-Yairi study will of course want to read it in full. If you are a member of the American Speech-Language-Hearing Association it is available to you online. It also should be available to any library patron through inter-library loan if not available locally.
For purposes of citation,
page numbers are indicated [in brackets]. -- N.J., August 26, 2002
* * *
Our reanalysis of the original data from the Tudor study failed to confirm the hypotheses of the investigator and her advisor. Differences in judges' perceptual ratings pre- and posttreatment were statistically nonsignificant. Similarly, mean differences in frequencies of specific disfluency types pre- and posttreatment were nonsignificant. For the disfluencies counted in this study, there were no observable or significant increases or decreases in any disfluency type for any of the four groups. The only larger change was for the normally fluent group labeled stutterers (Group ILA) who increased interjections in their speech, a disfluency type that is typically regarded as normal. Some individuals in this group had considerable change in one or two types,
but none that provided direct evidence of stuttering. This group of six children is of primary interest because they received negative comments about their normal disfluencies, and have complained, according to Dryer's investigative report, about long-term serious problems related to fluency.
As we have already noted,
for Group IIA (normal speakers labeled as stuttering), none of the judges
described any of the children's.speech as stuttered at the end of the study.
This conclusion is reflected in Tudor's final statement: "All of the subjects
in Group IIA (the children to whom the label "stutterer" had been attached)
showed similar types of speech behavior during the experimental period.
A decrease in verbal output was characteristic of all six subjects; that
is, they were reluctant to speak and spoke only when they were urged to.
Second, their rate of speaking was decreased. They spoke more slowly and
with greater exactness. They had a tendency to weigh each word before they
said it. Third, the length of.response was shortened. The two younger subjects
responded with only one word whenever possible. Fourth, they all became
more self-conscious. They appeared shy and embarrassed in many situations.
Fifth, they accepted the fact that there was something definitely wrong
with their speech. Sixth, every subject reacted to his speech interruptions
in some manner. Some hung their heads, other gasped and covered their mouths
with their hands; others laughed with embarrassment. In every case the
children's behavior changed noticeably" (p.147). Thus, neither the independent
judges, nor the investigator, as biased as she might
have been, mention stuttering as the outcome of this study. Additionally, the investigator makes strong statements in her conclusions concerning reduced verbal output, slower speaking rate, etc., but no data whatsoever were provided to back up such statements. From a scientific point of view, the absence of any reliability data renders interpretations of data, individual or group, as supporting treatment effects meaningless. Tudor's repeated remarks about the emotional and behavioral changes of the children are informative, but they do not constitute sufficient, credible evidence in support of her interpretations that they resulted from the experimental treatment. She was in no position to make meaningful pre-post treatment comparisons. She did not know the children prior to the study and failed to take elementary steps to obtain any baseline data about their personality, emotionality, and other behaviors to which she refers.
Assessments of the two types
of quantifiable data, perceptual and speech, clearly indicate that all
four experimental questions were answered in the negative. That is, the
study failed to demonstrate any significant influence of labeling on the
level of disfluent speech either in children who stutter or in children
who do not stutter, under the experimental conditions as described. It
is particularly apparent that the most critical experimental condition
failed to show that more disfluency (or stuttering) was induced at the
end of the study by means of occasional labeling in Group IIA, children
normally speaking. Consequently, the conclusion that the data supported Johnson's diagnosogenic theory in any way is untenable.
It is not clear when and why the speech-related difficulties reported for some members of Group IIA emerged. It is possible that an already existing mild stuttering was not detected, or was overlooked, at the beginning of the study, reinforced throughout the study, but still not detected at its conclusion. It is also possible, based on the investigator's comments, that the methods may have instilled negative reactions to speaking over the course of the study. It is less than likely, however, that stuttering would have emerged as a result of the treatmem months after the treatment was terminated, when no stuttering was reported at the end of the study. It is very difficult, therefore, to know if, and how, the experimental procedures are related to the reported current long-term communication problems of some of the individuals from that group. Many factors that are completely independent of the study might have been operating, as Dryer reported other experts have also indicated.
In spite of the immense international popularity accorded to the diagnosogenic theory for more than three decades, dissenting voices were soon heard (e.g., Wingate, 1962). Four main types of scientific evidence have been mounted against it:
1. The nature of speech disfluencies exhibited by children who begin stuttering and by normally speaking children. A substantial body of data reported by several investigators has indicated that (a) although disfluencies in young children are normal, most children produce only a few of them (e.g., Yairi, 1981); (b) objective analyses show that disfluencies in the speech of children close to the time when they first begin stuttering are abnormal from the start of the disorder, and differ substantially, along several dimensions, from disfluencies of normally speaking children (Ambrose & Yairi, 1999; Throneburg & Yairi, 1994; Yairi & Lewis, 1984); Johnson et al.'s (1959) own disfluency data do not support his assertions as shown by McDearmon (1968), and (c) parents have often reported that they perceived abnormal speech in their children who stutter from the first day of stuttering onset (Yairi, 1983). Data-backed objections to the Johnson's theory were expressed by Yairi and Lewis (1984), stating that "The present findings, then, do not support the assertion that the disfluent speech behavior of children just regarded as stutterers is basically similar to the disfluent speech of those children not regarded as stutterers. Although other researchers, particularly Wingate (1962), objected to Johnson's diagnosogenic theory of the onset of stuttering, this study provides more direct evidence which calls the theory into question" (p. 154).
2. Experimental punishment of stuttering. Another type of evidence that negates the main assumptions of the theory is available from studies showing that adverse contingencies to stuttering, including negative verbal reactions (Cooper, Cady, & Robbing, 1970), loud sound bursts (Flanagan, Goldiamond, & Azrin, 1958), and electrical shock (Siegel and Martin, 1965) could result in an effect opposite to what would have been predicted by Johnson's theory; that is, substantial decline in stuttering. Experiments with preschool
children show the same effect. Martin, Kuhl, and Haroldson (1972) called attention to children's stuttering, yet stuttering dropped to near-zero levels.
3. Parental correction of stuttering. Studies in which parents report advising their children to "stop" and "slow down" when they stutter (obviously calling attention to it) showed that the children recovered from stuttering in spite of their parents' negative reactions (see review by Wingate, 1976). According to the diagnosogenic theory, these are exactly the reactions that should have increased stuttering.
4. Non-environmental etiologies. Strong evidence has emerged for non-environmental factors underlying stuttering etiology. Continuous indications during the past 40 years for strong genetic components to stuttering (Ambrose Cox, & Yairi, 1997; Howie, 1981; Kidd, Heirnbuch, & Records, 1981) have provided impetus for linkage analyses studies design to identify general location of responsible genes. Indeed, Cox et al. (2000) reported three chromosomes suspected to contain loci for genes transmitting susceptibility to stuttering. In addition, using twin pairs where one or both twins stuttered, Felsenfeld et al. (2001) examined the proportions of genetic and environmental effects involved in the variability of the expression of stuttering. Results showed that about 70% can be accounted for by genetics, and about 30% are attributed to unique, or nonshared, environmental effects. There was no significant effect for shared environmental factors. Working in a different direction, other investigators have found anomalies in the speech-language area of the brains of people who stutter, that may be associated with increased risk for the development of stuttering (Foundas, Bollich, Corey, Hurley, & Heilman, 2001).
These and other developments of modem science have rendered the diagnosogenic theory obsolete as an acceptable explanation of the direct cause of stuttering. Based on. current scientific evidence we reject the notion that several clinical sessions with Ms. Tudor, held at most on the average once per, two weeks and complemented by ineffective orphanage staff participation, caused stuttering in most of the targeted children. Our re-analysis shows that, in fact, the Tudor study yielded the earliest evidence against the diagnosogenic theory. Had the study been published at the time and subjected to thorough scrutiny, it is quite possible that modern history of stuttering might have been quite different. One cannot deny, however, that the theory contains important elements. For example, the psychological dynamics that affect stuttering after it has begun, for which Johnson's theory may have had relevance, are not contested in our evaluation of the Tudor study. It is quite likely that parent-child interaction during the stage of stuttering plays an important role in the further course of the disorder. It is possible that the procedures employed in the Tudor study resulted in unpleasant, perhaps painful emotional reactions in the participants. Such influence, however, should not be confused with the cause of stuttering.
The ethical issues that were raised in the Mercury's article regarding the Tudor study deserve a more thorough analysis in several dimensions. First is the use of human subjects. Strong condemnations of the study and those responsible for it were expressed immediately after the newspaper article was published. In a follow-up article, the Mercury News (2001b) published a formal apology issued by the University of Iowa, and John Bernthal (2001), President of the American Speech-Hearing-Language Association, expressed reservations about the study. The present authors also stated that "it is unquestionable that the study was ethically wrong" (Yairi & Ambrose, 2001, p. 17). But the study must be viewed in the context of its time. To begin with, we believe that differences in standards that prevailed at that time should be recognized. Strict human subjects regulations did notexist and the research culture was considerably more lax than it is today. For example, it has been reported (Paden, 2001) that during that period, other investigators at the University of Iowa conducted laboratory experiments using gunshot noise (without live ammunition) to study the effect of startle on the stuttering of students. Thus, we are faced with the complicated question of whether ethics are relative or absolute. Specifically, when considering if the study was ethical, do the period in which it was conducted and that period's acceptable standards, context, and mindframe matter?
Inasmuch as there is willingness to recognize differences in standards that existed 60 years ago, the remaining major concern in the case of the Tudor study is whether or not the experimenter and her mentor intended to cause harm by turning normally speaking children into children who stutter. Our review of the study reveals no such apparent intent. The study investigated whether the level of disfluency could be changed as a result of labeling. It was not to create stutterers. Even if there was an unstated goal to increase disfluency to a level perceived as stuttered speech, there is no indication that Tudor or Johnson believed that, if successful, this would make the children chronic stutterers. This, in our opinion is a critical point in judging the ethics of those involved in the conduct of the study. In this respect, one should also keep in mind that Johnson published a pamphlet in 1934 stating that the cause of stuttering is organic, primarily from interference with a child's natural handedness. He had only expressed the idea of negative evaluation of one's own disfluency as the first step towards stuttering in 1938, referring to listener reactions in a single sentence. Wingate's (2001) statement that Johnson had discussed labeling in these two sources is false. Thus, the Tudor study was completed three years before he briefly articulated his early ideas of the diagnosogenic theory in 1942. It is quite possible that his notions about the full relation between normal disfluency, listener reactions, and stuttering were still in a formative stage when the Tudor study was planned. The theory's full exposition came 20 years after the study was completed (Johnson et al., 1959), when he believed that his data gave credence to the theory.
It is also interesting to note that no criticism has been directed toward the part of the study that was designed to see if labeling would increasing stuttering in children who were already stuttering. Is it more acceptable to make an
existing disorder worse? Indeed, there seems to be a tacit assumption that the increase is only temporary and, therefore, acceptable in lieu of the expected benefits to understanding the dynamics of the disorder. There is an appreciable body of scientific literature reporting studies designed to test conditions assumed to have the potential of temporarily increasing stuttering. Cooper, Cady, and Robbins (1970) tested effects of verbal reinforcement of disfluencies of stutterers and normally speaking controls. Oelschlaeger and Brutten (1975) paid stuttered for each time they stuttered to see if stuttering could be reinforced, as did Starkweather and Lucker with several children (1978). Other examples of operant conditioning experiments that attempted to increase stuttering are summarized in Costello and Ingham (1984): In fact, experimentation with delayed auditory feedback was thought to have elicited stuttering from normally fluent speakers (e.g., Lee, 1951). It appears to us that few would question the ethics of these experiments in their historical context. The point that we want to make is that Johnson may well have conceived the Tudor study within this frame of reference, assuming that any possible increase in disfluency would be temporary.
* * *
Regarding Dryer's assertion that there was a cover-up of the study by Johnson, it has, in fact, been available at the University of Iowa library and. has been checked out by a good number of readers, including the present authors. Johnson may have ended collaboration with Tudor, and made no further reference to the study, but the actual document was
available. Other theses he directed were also never published (e.g., Taylor, 1937).
A third dimension of ethics pertains to the behavior and action of others parties involved in reporting the Tudor Study. In this regard, the Mercury's decision to release personal information for nine identifiable participants in the Tudor study is questionable, whether or not the newspaper had permission to identify them by name and/or by number and age. Were the nine individuals aware that publishing their names or numbers allowed readers access to all of the personal information in the study, which is a public document, such as IQ scores? Also, while Dryer's decision to contact participants with the information that they were part of a study is legitimate and responsible, providing them with findings--of which the reporter's understanding and interpretation were clearly inadequate--is irresponsible. The participants have every right to know what really happened, but it is less than fair to provide them with inaccurate information.
Summary and Epilogue
A critical review of the Tudor study has revealed fundamental flaws in its design and execution. As it stands, the study failed to provided any credible scientific support that stuttering was produced "in the laboratory" or to lend other support to the diagnosogenic theory of stuttering. It appears, though, that the procedures employed caused unpleasant reactions in some of the children, which interfered with their verbal communication. Our assessment of the ethical issues suggests that the study should be viewed within the common standards of the period, that there is no evidence of intent to harm, and that the objective of increasing disfluent speech should not be confused with instilling chronic stuttering in normally fluent children. It is clear that such a project would not be allowed under present standards of the scientific community.
Finally, in spite of the controversy regarding the Tudor thesis, there is no question that Johnson's contribution to the study of stuttering remains very significant in many positive ways. Although his theory on stuttering nset has not prevailed, we must remember that sequences of errors are the stepping stones for progress in science. His work provided tremendous impetus for the study of various aspects of stuttering, particularly in calling our attention to the, then neglected, formative stages of the disorder. His clinical belief that if stuttering is learned, it can also be unlearned, has inspired many people who stutter to improve their speech.
On the positive side, Tudor
Jacobs has expressed sincere regret for any harm that may have been caused
by the study, . . .