CHAPTER 3
Development of Verbal, Quantitative, and
Subject
Matter Competence
Consistent with the structure and organization of How College Affects Students, this chapter reviews the evidence from the 1990s pertaining to the influence of college on the acquisition of subject matter knowledge and academic skills. (Chapter Four focuses on more general cognitive development and intellectual skills less directly, or specifically, tied to the academic program.) Our notion of subject matter knowledge and academic skills casts a wide net that reflects the often diverse ways in which the knowledge and skill outcomes of college have been assessed. Examples include, but are not limited to, the general and specific academic knowledge and skills assessed by standardized tests (such as the Graduate Record Examination, the College Basic Academic Subjects Examination, and specific subtests of the Collegiate Assessment of Academic Proficiency), measures of content learning in specific courses, level of verbal and mathematical competence, and individual self-reports of gains in general and specific dimensions of academic knowledge and skills.
Our review in this chapter synthesizes the findings of studies that employ these and related measures of knowledge and academic skills. We do not include cumulative grade point average, or college grades, within this rubric. To be sure, there is evidence to suggest that college grades in some disciplines have moderate correlations (in the .35 to .50 range) with standardized measures of achievement such as the Graduate Record Examination (D. Smith, 1992) and that grades remain a statistically significant predictor of GRE scores when controls are in effect for important confounding influences (A. Astin & Astin, 1993). Other research, however, suggests that the associations between college grades and standardized measures are so small that they call into question the validity of grades as an objective measure of learning (Bridgeman & Lewis, 1994). We acknowledge that in certain circumstances grades may well reflect learning. At the same time, it is also clear that grades are influenced by many factors essentially extraneous to how much one learns during college. These factors include the type and selectivity of the institution attended (A. Astin & Astin, 1993; Kuh & Hu, 1998), the studentÕs major field of study or academic discipline (Barnes, Bull, Campbell, & Perry, 1998; D. Cheng & Chen, 1999; Ekstrom & Villegas, 1994; Kuh & Hu, 1998; Menec, Perry, & Hunter, 1996; Pollio, 1996; M. Thompson & Smart, 1998), situational constraints such as stress and workload (Hatcher, Prus, Englehard, & Farmer, 1991), faculty cognitive style (T. O'Brien & Thompson, 1994), faculty attitudes toward testing and grading (L. Cross, Frary, & Weber, 1993), and the specific type of course or coursework being taken (Kuh & Hu, 1998; Sabot & Wakeman-Linn, 1991). Because of these potential confounding influences, as well as others identified by Pascarella & Terenzini (1991), we concluded that it is extremely hazardous to make comparisons of student learning based on cumulative grades across different courses, different academic disciplines, or different institutions. Consequently, as in our predecessor volume, we treat student grades not as an outcome of college that stands for how much is learned, but rather as an indicator of the extent to which a student successfully complies with the academic norms or requirements of the institution. Thus, grades are viewed as one among a number of dimensions of the college experience (both academic and non-academic) where the student may demonstrate different levels of involvement, competence, or achievement. In subsequent chapters, we review the evidence pertaining to the influence of college grades on various dimensions of postcollege achievement.
Change During College
Conclusions from
In our 1991 synthesis, we uncovered 17 studies conducted between 1934 and 1981 that estimated the extent to which students made gains during college on standardized tests of subject matter knowledge or academic skills. The findings were markedly consistent across five decades of research. Our best estimate of freshman-to-senior gains from this body of evidence was that they averaged approximately .56 of a standard deviation for general verbal skills, .24 of a standard deviation for general mathematical or quantitative skills, and .87 of a standard deviation for specific subject matter knowledge.[1] These numbers represented improvements over entering student competencies of approximately 21 percentile points, 9.5 percentile points and 30.8 percentile points, respectively.
The second major conclusion was that the evidence was unclear as to when during the postsecondary experience these changes or gains in subject matter knowledge and academic skills are most likely to occur. Some evidence suggested that the greatest gains occurred during the first two years of college while other evidence suggested that students continue to make important gains through their senior year.
Evidence from the 1990s
With a few exceptions, the literature of the 1990s paid relatively little attention to estimating the gains in subject matter knowledge or academic skills that occurred during college. The modest literature that does exist consists largely of cross-sectional studies (e.g., comparing current freshmen with current seniors) using standardized instruments, or student self reports of gains made during college. Indeed, we uncovered only two longitudinal studies both of which were conducted at single institutions. Thorndike and Andrieu-Parker (1992) gave the College Basic Academic Subjects Examination (CBASE) to a sample of 197 students in October of their first year of college and then tested them again with the same instrument 18 months later, in May of their second year of college. The CBASE is a standardized, criterion-referenced achievement test focusing on the degree to which students have mastered particular skills and competencies consistent with completion of general education coursework. It consists of four general subject areas, each of which has either two or three subareas: English (reading & literature, writing); Mathematics (general mathematics proficiency, algebra, geometry); Science (laboratory and field work, understanding fundamental concepts); and Social Studies (history, social sciences) (Osterlind & Merz, 1990). On average, students demonstrated gains in all four subject areas of about .25 of a standard deviation. This number converts to a 10 percentile point gain. In other words, if incoming students at the institution were functioning at the 50th percentile, after about two years of exposure to college they were functioning at the 60th percentile.
Underwood, Maes, Alstadt, and Boivin (1996) administered the College Outcome Measures Program (COMP) Objective Test to 41 students when they were beginning freshmen and then again when they were seniors. The COMP is a standardized test that assesses Òliberal arts competenciesÓ such as using science, using art, and solving problems. The freshman-to- senior growth on the COMP total score was 9.6 points. No standard deviations were reported, however, so we could not directly compute an effect size. Our reading of other studies that have used the COMP suggest that the typical standard deviation is in the neighborhood of 12 points. From this we indirectly inferred an increase on the COMP total score from the freshman to senior year as about .80 of a standard deviation, or 29 percentile points.
While the Thorndike and Andrieu-Parker (1992) and the Underwood et al. (1996) studies have the advantage of being longitudinal (i.e., following the same students over time), they are substantially limited in generalizability by their small, single institution samples. Evidence with greater generalizability is reported in a cross-sectional study (i.e., students of different class standing measured at the same time) by Osterlind (1996; 1997). Osterlind administered the CBASE to nearly 75,000 students in 56 colleges and universities located in 13 different states. The institutions included research universities, regional universities, and liberal arts colleges. We employed the statistical information reported by Osterlind to compute effect sizes estimating the CBASE advantages of seniors over freshmen. The assumption of such cross-sectional comparisons is that the senior-freshman comparison reflects the influence of college. On the English test seniors had an advantage over freshmen of .77 of a standard deviation (a 27.9 percentile point advantage); for Mathematics the advantage was .55 of a standard deviation (20.9 percentile points); on the Science test the advantage was .62 of a standard deviation (23.2 percentile points); and on the Social Studies test the advantage of seniors over freshmen was .73 of a standard deviation (26.7 percentile points).
Because it reports CBASE scores for each class level, the Osterlind (1996; 1997) study also permits one to estimate the pattern or timing of changes that occur during college. Using his reported class level means and the pooled standard deviations we estimated the advantage of sophomores over freshmen on the four CBASE subject areas to be as follows: English, .66 of a standard deviation; Mathematics, .72 of a standard deviation; Science, .58 of a standard deviation; and Social Studies, .60 of a standard deviation. Thus, based on estimates from the CBASE data, it would appear that the vast majority of changes in subject matter knowledge or academic skills that takes place during college occurs during the first two years. Indeed, the estimated advantages of sophomores over freshmen on the CBASE English, Science, and Social Studies tests were 86%, 94%, and 82% as large, respectively, as the corresponding advantages of seniors over freshmen on the same tests. On the Mathematics test, the estimated advantage of sophomores over freshmen was actually about 1.3 times as large as the corresponding advantage of seniors over freshmen.
It may well be that the pattern or timing of estimated changes suggested by OsterlindÕs (1996; 1997) extensive data reflect in part the nature of the instrument employed. Recall that the CBASE is a standardized test designed to measure mastery of skills and competencies imparted in general education coursework. Since much of this coursework may be taken during the first two years of college, one might expect the majority of advances in CBASE subject areas to occur by the sophomore year. The fact that the growth in mathematics skills estimated during the first two years of college is larger by about a third than the corresponding difference between freshmen and seniors may also reflect the fact that most mathematics courses required as part of general education coursework are taken early during college. Subsequently, students who do not major in quantitative or scientific fields may actually regress somewhat in mathematics skills by their senior year (Flowers, Osterlind, Pascarella, & Pierson, 1999; Wolfle, 1983).
In addition to estimated changes or growth on standardized measures of subject matter knowledge, there is also a body of literature that documents student self-reports of gains in these areas. Not surprisingly, findings from this research are generally consistent with those reported from studies using objective, standardized instruments. Employing a variety of self-report instruments, the evidence is consistent in suggesting that the higher oneÕs class standing (e.g., seniors versus freshmen) the greater the gains one reports and that students attending college full-time and completing more credit hours report significantly greater gains than students attending part-time who complete fewer credit hours (Bauer, 1995; Conklin, 1990b; Feldman, 1994; Greer, Weston, & Alm, 1991; Knight, 1994; Kuh & Hu, 1999a; Lincoln, 1991; Pettit, 1992; Tan, 1995; R. Williams, 1996). Because of the nature of the self-report instruments used and because of the fact that many studies do not report the requisite statistical information (i.e., group standard deviations as well as group means), however, it is difficult to estimate effect sizes from this literature.
While the evidence on gains in subject matter knowledge and academic skills appears much less extensive in the 1990's than it was in the literature we reviewed for How College Affects Students, we observed little that would change the general, and quite unsurprising, conclusion from our predecessor volume. Students on average do, in fact, make statistically significant and, in some areas, substantial gains in subject matter knowledge and academic skills during college.
This conclusion, however, needs to be understood within the context of two additional, and important, lines of work conducted during the 1990s. The first of these is rather sobering. Barton and Lapointe (1995) report the results of a massive national study of literacy conducted by the Educational Testing Service. The sample of over 26,000 individuals was representative of American adults 16 years of age and older. Three types of literacy were consideredÑprose, document, and quantitativeÑwith each type having five levels of competence, level 5 being the highest. Consistent with the evidence on gains reviewed above, Barton and Lapointe found out that the relative level of all three types of literacy was strongly related to degree of exposure to postsecondary education. College graduates had higher levels of literacy than those with some college, who, in turn, had higher levels of literacy than those who had never attended college. In terms of absolute levels of literacy, however, the performance of college graduates ranged, in the words of Barton and Lapointe (1995, p. 2), Òfrom a lot less than impressive to mediocre to near alarming, depending on who is making the judgment.Ó For example, only 53% of college graduates performed at level 4 or 5 in prose literacy, where they could integrate and synthesize information from complex passages or make high level inferences based on text. Similarly, only 47% of college graduates performed at level 4 or 5 in document literacy, where they were required to make high-level inferences from complex documents that contain distracting information. Finally, in quantitative literacy only 53% of college graduates were functioning at levels 4 or 5, where they could handle two or more arithmetic operations in a sequence or where they could perform multiple arithmetic operations.
The sobering conclusions of the Barton and Lapointe (1995) study are significant in their own right. However, their study is also important because it moves beyond simply documenting how much change occurs during college to estimating the functioning of college graduates (as well as those with less exposure to postsecondary education) against absolute criteria of performance. As such, it suggests an important direction for future inquiry. Knowing the specifics of how well college graduates can function in an absolute sense on salient dimensions of subject matter competence and academic skills may be equally, if not more, important than knowing how much they change or grow during college.
The second line of work leads to a somewhat more optimistic conclusion. Semb and Ellis (1994) synthesized the results of nearly 100 individual studies of knowledge retention (long-term memory) of subject matter content typically taught to adult or near-adult populations. Their synthesis attempted to estimate the average relative percentage of loss of the content originally learned in terms of three types of tasks: recognition, recall, and cognitive skills (e.g., concept identification, prediction, explanation, comprehension). Across all studies the mean percent loss scores were 16.17% for recognition, 28.25% for recall, and 13.32% for cognitive skills. In short, students can apparently retain much of the knowledge taught in classrooms. Perhaps even more significant, it was found that increasing the level of original learning increases later retention performance (Semb & Ellis, 1992; Semb, Ellis, & Araujo, 1993). In subsequent sections of this chapter and the chapter on cognitive skills and intellectual growth, we will review evidence concerning the effectiveness of different instructional approaches in enhancing student learning.
Net Effects of College
It is one thing to conclude that increases in subject matter knowledge and academic skills occur during college. It is quite another to conclude that these increases occur because of college. Problems inherent to both the longitudinal and cross-sectional designs of the studies reviewed in the previous section make it risky to assume that the amount students change during college or the average advantage of seniors over freshmen reflects the unique or net impact of college. For example, in longitudinal studies that follow the same individuals over time, the difference between scores in the first and fourth year of college might at least partially reflect the fact that students are growing older. The natural process of maturation could lead to a certain amount of increased knowledge and skill acquisition between the first and fourth year of college that is independent of the effects of college. This growth might occur through such mechanisms as work, travel, or personal reading, quite apart from oneÕs normal course of study. To the extent that gains between the first and fourth years of college capture normal maturation as well as the influence of college itself, they may spuriously overestimate the latter.
Threats to the internal validity of longitudinal studies are discussed in considerable detail in the technical appendix and pp. 66-67 of How College Affects Students. (By Òinternal validityÓ we mean the extent to which gains in learning can be attributed to the impact of college and not other influences.) Suffice it to say that, in the absence of a comparison group of similar individuals who have less or perhaps even no exposure to college, it is extremely difficult to determine how much of the documented freshman-to-senior gains are due to the impact of college and how much are due to other influences.
In cross-sectional studies (e.g., those comparing test scores of separate samples of freshmen and seniors at the same time), differential institutional recruiting standards for successive comparison classes and/or the natural attrition of less capable students from freshman to senior year might yield a markedly more selective population of seniors than the population of freshmen with whom they are compared. Consequently, the differences observed between freshmen and seniors could simply be the result of comparing samples from populations differing in ability or motivation rather than a reflection of the influence of college.
Studies that attempt to determine whether differences in subject matter knowledge and skill acquisition are attributable to differences in exposure to postsecondary education (i.e., the net effect of college) are by necessity more complex in scope and design than investigations that simply document change during college or the advantage of seniors over freshmen. Typically, they involve comparing groups with different levels of exposure to postsecondary education and the use of various statistical procedures (e.g., multiple regression, analysis of covariance) to control or adjust for the influence of salient confounding variables such as academic ability or motivation.
Conclusions from
In our 1991 synthesis we found relatively little research that focused on the net effects of college on subject matter knowledge and skills. The research we did uncover was concerned largely with verbal and quantitative skills. For verbal skills, the estimated net effect of graduating from college compared to not attending college was between .26 and .32 of a standard deviation. This estimation converts to an advantage of between 10.3 and 12.6 percentile points. For mathematical skills, the estimated net effect of college was between .29 and .32 of a standard deviation, which converts to an advantage of between 11.4 and l2.6 percentile points. The instruments used to assess verbal skills and mathematical knowledge in this body of research were far from comprehensive. Consequently, they probably did not fully capture the impact of college.
Evidence from the 1990s
As in our 1991 synthesis, we did not uncover a large body of research that estimated the net effects of college on subject matter knowledge and academic skills. Indeed, we uncovered only two studies that focused primarily on this issue, Flowers, Osterlind, Pascarella, and Pierson (1999) and Myerson, Rank, Raines, and Schnitzler (1998). Flowers, Osterlind, Pascarella, and Pierson (1999) reanalyzed a subset of OsterlindÕs (1996; 1997) extensive sample of students who took the College Basic Academic Subjects Examination (CBASE). Their sample consisted of between 18,500 and 20,500 students at 56 four-year colleges and universities located in 16 different states. They weighted the sample to make it more representative and introduced statistical controls for student precollege academic ability (composite verbal and quantitative ACT score or the SAT equivalent), race, sex, college grades, cumulative credits taken in postsecondary education, and the average ACT score of the sample of students at the institution attended. With statistical adjustments made for these control variables, they reported senior advantages over freshmen on the four CBASE subject areas as follows: English = .59 of a standard deviation, or 22 percentile points; Mathematics = .32 of a standard deviation, or 13 percentile points; Science = .47 of a standard deviation, or 18 percentile points; and Social Studies = .46 of a standard deviation, or 18 percentile points. The addition of age as a control variable had only a trivial influence on these estimates.
The results of the Flowers et al. (1999) analyses lead to two additional observations. First, their estimates of the net effects of college (i.e., the senior-freshman comparisons adjusted for the set of confounding variables) are markedly smaller than the unadjusted senior-freshman differences reported by Osterlind (1996; 1997). Using our estimates of effect sizes from Osterlind (1996; 1997), it would appear that the introduction of statistical controls by Flowers et al. reduced the senior advantage over freshmen by about 23% on the English test, about 42% on the Mathematics test, 24% on the Science test, and about 37% on the Social Studies test. This finding is not particularly surprising given the potentially higher rates of attrition of the least academically able students during college. Part of the senior advantage over freshmen on CBASE competencies estimated from the Osterlind (1996; 1997) investigation likely reflects uncontrolled or unadjusted differences between freshmen and seniors in academic ability.
A second observation concerns the consistency of findings from the Flowers et al. (1999) investigation and our 1991 conclusions from How College Affects Students. Recall that in our previous synthesis we concluded that college had a net positive impact on quantitative skills of between .29 and .32 of a standard deviation. This reported advantage is remarkably similar to the net senior advantage over freshmen of .32 of a standard deviation on the CBASE Mathematics test reported by Flowers et al. Our 1991 conclusion concerning verbal skills was that college had a positive net impact of between .26 and .32 of a standard deviation. This advantage is somewhat smaller than the senior advantage over freshmen on the CBASE English test of .59 of a standard deviation reported in the Flowers et al. investigation. This inconsistency may be explainable in large part by the instruments used. The studies we reviewed for our 1991 synthesis focused largely on vocabulary development while the CBASE English test covers a much more wide-ranging set of skills, such as literature comprehension and evaluation, and writing skills. As such, the latter is more likely to capture the comprehensive impact of a postsecondary program of study than instruments that simply measure vocabulary development
Myerson, Rank, Raines, and Schnitzler (1998) analyzed data from a cohort of the nationally representative National Longitudinal Study of Youth to estimate the net impact of exposure to postsecondary education on the Armed Forces Qualification Test (AFQT). The AFQT score represents a composite of scores on verbal and quantitative tests consisting of word knowledge, paragraph completion, arithmetic reasoning and mathematics knowledge. Using a creative cross-sectional design, they controlled for selective attrition by conducting analyses only on those who ultimately completed a bachelorÕs degree, but had no postgraduate work, and who had completed either 12, 13, 14, 15, or 16 years of formal schooling at the time they took the AFQT in 1980. In addition to controlling for selective attrition through their design, Myerson et al. also introduced statistical controls for race, age, and socioeconomic status. With these controls in place, they found that years of postsecondary education completed had a statistically significant, positive effect on AFQT scores. The statistical detail they report makes it difficult to determine the size of the effect. However, using the graphs and the sample characteristics they present, we estimate, for their entire sample, that having a bachelorÕs degree (i.e., 16 years of education) produced an advantage compared to 12 years of education of about .25 of a standard deviation (about 10 percentile points). Although it should be noted that this number is a less precise effect-size estimate than others reported in this section, it deviates only slightly from our conclusions about the net effects of college on verbal and quantitative skills in How College Affects Students. (Recall that we estimated the impact on verbal skills to be between .26 and .32 of a standard deviation and the impact on quantitative skills to be between .29 and .32 of a standard deviation.) Since the AFQT consists largely of tests measuring verbal and quantitative competencies, the similarities are perhaps not too surprising.
Although not nearly as comprehensive in scope or as focused on estimating the net effects of college as the Flowers et al. (1999) or the Myerson et al. (1998) investigations, there are nevertheless several additional studies with findings that bear on the influence of college on the acquisition of subject matter knowledge and skills. Hagedorn, Siadat, Nora, and Pascarella (1997) analyzed data from the National Study of Student Learning (NSSL) to determine the factors leading to gains in mathematics during the first year of college. NSSL was a longitudinal study focusing on the factors influencing learning and cognitive development in college. It traced the growth of students from 23 two- and four-year colleges located in 16 states through their third year of postsecondary education. The study dependent variable was change during the first year of college on a 35-item mathematics test from the Collegiate Assessment of Academic Proficiency (CAAP). The CAAP mathematics test measured the studentÕs ability to solve mathematical problems typical to college curricula while emphasizing quantitative reasoning over formula memorization. With statistical controls in place for such factors as gender, race, study, involvement, extracurricular involvement, work responsibilities, and type of math courses taken, number of enrolled credit hours had a significant positive effect on gains in all analyses conducted. On average, students enrolled for between 21 and 24 semester hours had about a .30 of a standard deviation (11.8 percentile points) advantage in first year mathematics gains over students enrolled for 6 or fewer hours.
Another investigation analyzing data from the National Study of Student Learning sought to determine the institutional factors and college experiences that influenced reading comprehension through the third year of college for African-American students (Flowers & Pascarella, 1999). The dependent measure was the CAAP reading comprehension test, a 36-item standardized test that assesses reading comprehension as a product of skill in inferring, reasoning, and generalizing. With statistical controls for level of precollege reading comprehension, academic motivation, socioeconomic status, age, the average cognitive ability of students at the institution attended, study and work involvement, and the pattern of coursework taken, number of credit hours completed had a significant positive effect on third-year reading comprehension. After three years of postsecondary education, African-American students who had completed between 60 and 72 credit hours had a reading comprehension advantage of about .45 of a standard deviation (17.4 percentile points) over students who had completed less than 20 credit hours.
In addition to studies employing standardized measures such as the CBASE or the CAAP, a number of investigations have estimated the net impact of college on subject matter knowledge and skills using student self reports (Knight, 1994; W. Knox, Lindsay, & Kolb, 1992; R. Williams, 1996). Consistent with findings from those studies using objective, standardized measures, the results of these investigations clearly suggest that students with greater exposure to postsecondary education report learning more or experiencing greater gains in science and technology, general education, and the arts and humanities than those individuals with less exposure. These statistically significant advantages favoring those individuals with more exposure to college persist in the presence of statistical controls for such factors as age, gender, race, family and job responsibilities, socioeconomic status, major, campus involvement, and academic ability. Because of the nature of the instruments employed and the statistical data reported, however, estimating effect sizes for these studies is problematic.
We uncovered only one study (Flowers, Osterlind et al., 1999) that permitted estimation of the timing of the net effects of college on subject matter knowledge and skills. In addition to reporting the CBASE advantages of seniors over freshmen (i.e., the estimate of the overall effect of college), Flowers et al. also reported the corresponding net advantages of sophomores over freshmen. (In both analyses, statistical controls were made for individual precollege academic ability, race, sex, college grades, postsecondary credits taken, and the average ACT score of the students at the institution attended.) Their findings suggest that in Mathematics and Science virtually all of the net effects of college take place by the sophomore year. In English approximately 75% of the net effect of college and in Social Studies about 85% of the net effect of college was evident by the sophomore year. It is worth reiterating that this pattern may reflect the fact that the instrument used in the study (the College Basic Academic Subjects Examination) is designed to assess academic competencies imparted in general education courseworkÑmuch of which is typically designed to be taken in the first two years of college. Instruments not so focused on measuring the effects of general education might have yielded a different pattern of effects.
Between-College Effects
In estimating the net effects of college attendance on learning in general, the problem was one of distinguishing the unique impact of differential exposure to postsecondary education from differences in the characteristics or traits of individuals that might influence the learning outcomes being assessed. Students who attend college full time, for example, may have substantially different levels of academic ability, secondary school preparation, and academic motivation than those with less or no exposure to postsecondary education. In trying to estimate between-college effects, one is confronted with an analogous methodological problemÑnamely, how does one identify the unique or net influence of attending different kinds of postsecondary institutions given the fact that different kinds of institutions recruit and enroll substantially different kinds of students (A. Astin, 1991; A. Astin & Astin, 1993; Pascarella & Terenzini, 1991)? Since students who attend college are not distributed randomly across different postsecondary institutions, it is likely that the achievement, as well as the other outcomes associated with different types of postsecondary institutions, is confounded by differences in the motivations, academic aptitudes, secondary school experiences, and aspirations of the students they enroll. Sometimes the shorthand for this ubiquitous and irksome methodological problem is Òseparating the socialization effect from the recruitment effectÓ (Pascarella & Terenzini, 1991). We will confront this socialization versus recruitment issue, albeit in different forms, repeatedly throughout the remainder of the book.
Attempts to separate the socialization effects of different institutions from the effects of the differences in background characteristics among the students they recruit and enroll typically entails the use of reasonably complex statistical controls (e.g., regression analysis, path analysis, analysis of covariance). While there are a number of variations, the typical study we reviewed used some form of regression analysis to estimate the extent to which institutional characteristics or environments affected learning outcomes above and beyond the influence of individual student characteristics. (For a more detailed presentation and discussion of these methods see Ethington, 1997; or Pascarella & Terenzini, 1991, Appendix A).
The study of between-college effects has been complicated by a major student demographic trend that has grown more pronounced over the last quarter of a century, but which has become particularly identifiable in the literature of the last ten years. Both Adelman (1994; 1998a; 1998b) and McCormick (1997) provide substantial evidence from nationally representative data sets, such as the National Longitudinal Study of the High School Class of 1972 (NLS-72), High School and Beyond (HS & B), and the Beginning Postsecondary Survey (BPS), indicating that, as a group, American undergraduates have become highly mobile consumers of postsecondary education. For example, during the period 1972-1984, 38.8% of individuals in the NLS-72 sample who started at a four-year college eventually attended two or more undergraduate institutions. The corresponding population parameter estimates for the 1982-93 period from the HS & B sample and for the 1989-94 period from the BPS sample are 52.3% and 50.1%, respectively (Adelman, 1998a).
Given such a national rate of multi-institutional attendance, one might reasonably question both the interpretability and generalizability of studies that estimate institutional effects on such outcomes as learning and cognitive growth (as well as other outcomes such as values, attitudes, psychosocial development, or moral development). How does one estimate institutional effects when the effects are both multiple and, quite likely, take place over varying lengths of time? Not surprisingly, we uncovered no investigation that even attempts this extremely complex and daunting task. Rather, the body of evidence on between-college effects gets around this issue essentially by ignoring it in one of two typical ways. The first is composed of studies that estimate institutional effects over a comparatively short period, such as the first year of college. The second involves studies that follow students for an extended period of time but confine their analyses to the subsample who attend only one postsecondary institution. When properly conducted, such investigations can provide important insights to the nature and magnitude of institutional effects on students. Indeed, given the penchant for nationally-visible periodicals to use institutional resources and Òselectivity" as proxies for the quality of the education one supposedly receives, insights based on rigorously collected and analyzed data are a valuable counterbalance. Nevertheless, it is important to underscore that the findings of the vast majority of between-college studies may not necessarily generalize to the substantial numbers of American students who attend more than one undergraduate institution.
Conclusions from
Our 1991 synthesis found little consistent evidence indicating
that measures of institutional ÒqualityÓ or environmental characteristics had
more than small, and generally trivial, net influences on how much a student
learns during college. When student
precollege traits were controlled statistically, only three variables had statistically
significant, positive associations with standardized measures of achievement
across at least two independent samples: frequency of student-faculty
interaction, degree of curricular flexibility, and faculty membersÕ formal
educational level. It is important,
however, to underscore the fact that the magnitude of these associations was
quite small and perhaps of questionable practical importance. Given the persistent conventional wisdom
of major differences in Òeducational qualityÓ among American postsecondary
institutions (Ewell, 1989; Gilmore,
1990; Grunig, 1997; L. Lewis & Kingston, 1989; Schmitz, 1993), we conceded that the weight
of evidence was counter-intuitive.
However, we also underscored the distinct possibility, consistent with
the evidence, that with differences in the academic capabilities of student
bodies taken into account, four-year institutions with substantial differences
in their stock of human, financial, and educational resources may have
essentially similar impacts on how much students learn as undergraduates. We also suggested the possibility of a
diminishing return relationship between institutional resources (e.g., library
size) and learning that may at least explain why increases in level of
resources beyond a certain threshold level is not similarly matched by a
proportional increase in measured student learning.
Evidence from the 1990s
Compared to our previous synthesis, we uncovered a substantial body of evidence focusing on between-college effects on the development of verbal, quantitative, and subject-matter competence. This evidence falls generally into three categories: 1) institutional characteristics, 2) institutional type, and 3) institutional environments.
Institutional Characteristics
Of all the institutional characteristics that one might study in relation to between-college effects, the one considered most consistently in the studies we reviewed was institutional student-body selectivity. This characteristic was typically defined operationally as the average scores of entering students, or in some cases the entire undergraduate student body, on standardized tests such as the SAT, the ACT or their equivalent. There are at least two reasons why institutional selectivity is such a prominent variable in research on between-college effects. First, as compared to measures of the institutional environment, which can at times be somewhat subjective and abstract concepts, institutional selectivity is a relatively easy to obtain, straightforward, low inference measure. It can be communicated by a single number (e.g., an average SAT of 1200 or an average ACT of 26.5) that is both reasonably recognizable and understandable by large numbers of people. Second, and certainly not unrelated, is the fact that institutional selectivity is often used as a proxy for institutional Òquality,Ó both by scholars who study the impact of college (Brewer, Eide, & Ehrenberg, 1996; Dale & Krueger, 1999; K. Daniel, Black, & Smith, 1996b; Hagedorn et al., 1999; Hilmer, 1997; Katchadorian & Boli, 1994; Lillard & Gerner, 1999; Rumberger & Thomas, 1993), and by the more general public (consider college rankings in such outlets as U. S. News and World Report). It is almost as if it were axiomatic that if the college you attend is difficult to get into you must be getting a ÒbetterÓ education.
We uncovered 10 studies, based on three independent samples, that investigated the impact of college selectivity on various standardized measures of academic achievement. Consistent with our 1991 synthesis, the weight of evidence from these studies provides little support for the premise that attendance at a selective institution has a consistent and substantial positive influence on how much one learnsÑat least as measured by standardized tests. One series of important studies analyzed longitudinal data from the 1985 cohort of the Cooperative Institutional Research Program (CIRP). This sample of students in slightly more than 200 four-year institutions was followed up in 1989, with analyses focusing on individuals who took various standardized tests typically used in the admission process to graduate or professional school (e.g., the Verbal and Quantitative portions of the Graduate Record Examination, the Medical College Admissions Test, the Law School Admissions Test, and the National Teachers Examination) (Anaya, 1992, 1996, 1999a, 1999b, 1999c, 2001; A. Astin & Astin, 1993; Opp, 1991). Although the specified prediction equations differ across studies, the general approach taken in the various analyses was to first statistically control for important student precollege variables (e.g., SAT/ACT scores, high school achievement, aspirations, family background) and college experiences (e.g., major, interactions with faculty and peers, measures of academic and social engagement). With these controls in place, estimates were then made of the net influence of different institutional characteristics on the various standardized tests. Institutional selectivity (i.e., the average SAT/ACT score of entering students) had trivial and statistically non-significant effects on the Quantitative score of the Graduate Record Exam, the Medical College Admissions Test, the Law School Admissions Test, and all three subscores of the National Teachers Examination (i.e., Communication, General Knowledge, and Professional Knowledge).[2] Anaya (1999b) found a very small positive effect of institutional selectivity on GRE Verbal scores. However, in an earlier series of analyses using the same data, but a more extensive set of controls, the effect of institutional selectivity on GRE Verbal scores was non-significant (Anaya, 1996).
It is important to note that standardized tests, such as the Graduate Record Examination, are taken by a relatively small percentage of students while they are actually in college. For example, extrapolating from the description of Astin's (1993b) national Cooperative Institutional Research Program sample, we estimate that roughly 20-25 percent of all students in four-year colleges actually took the GRE while in college between 1985 and 1989. Thus, estimates of the impact of institutional selectivity taken from the 1985-89 CIRP data may have limited generalizability to the entire population of college students. However, other investigations, based on independent samples, report evidence that is quite consistent with that yielded by the extensive analyses of the 1985-89 CIRP sample. In their cross-sectional analyses of the College Basic Academic Subjects Examination, described in the previous section of this chapter on the net effects of college, Flowers et al. (1999) sought to determine if the net difference on CBASE tests between freshmen and seniors differed in magnitude at institutions differing in student-body selectively (estimated by the average ACT score of the student body at the 56 institutions in the sample). With controls in place for such factors as individual ACT scores, college grades, race, and gender, institutional selectively had a trivial and statistically non-significant impact on freshman-senior differences across all four tests (English, Mathematics, Science, and Social Studies). Similarly, analyzing longitudinal data from 23 two- and four-year colleges, Edison, Doyle, and Pascarella (1998) and Whitt, Pascarella, Pierson, Elkins, and Marth (in press) found that institutional selectivity (operationally defined as the average reading comprehension, mathematics, and critical thinking levels of entering students) had no statistically significant general effects across gender (i.e., for both men and women) on standardized measures of writing skills or science reasoning at the end of the second year of college or on reading comprehension at the end of the third year of college.
Evidence from studies using self-reports of student learning is generally consistent with evidence from studies employing standardized tests.[3] Knox, Lindsay, and Kolb (1992) analyzed data from the National Longitudinal Study of the High School Class of 1972. With controls in effect for such factors as academic ability, race, gender, major, and place of residence, institutional selectivity was essentially unrelated to how much respondents reported having learned during college. Consistent findings are reported by Astin (1993a) and Toutkoushian and Smart (2001) in the prediction of student self-reports of gains along dimensions such as general knowledge, knowledge of a field or discipline, problem solving, and writing skills. Similarly, Kuh and Hu (1999b) found that student-body selectivity among 33 research universities was essentially unrelated to the size of gains seniors reported they made during college in 23 areas of learning and personal development. Hayek and Kuh (1998) did find that seniors from selective liberal arts colleges reported making greater gains on a measure of Òcapacity for life-long learningÓ (consisting of academic skills, thinking skills, and functioning in groups) than did seniors from doctoral-granting universities, comprehensive colleges, and less-selective liberal arts colleges. However, the only control in effect in their analyses was student socioeconomic status. Given the absence of more salient controls for the background characteristics of the students reporting the extent of gains during college, it is problematic to interpret this finding as being the result of the impact of selective liberal arts colleges. It may simply be that such colleges recruit and enroll students who are more open to the impacts of postsecondary education to begin with. Similar students attending other institutions in Hayek and KuhÕs sample might report learning gains of essentially the same magnitude.
Understanding just why selectivity tells us so little about an institutionÕs net impact on how much students learn is a complicated issue. Probably the most useful research in this regard is provided in a creative study of course examination rigor at 40 research universities differing in undergraduate student-body selectivity (Braxton, 1993; Nordvall & Braxton, 1996). A sample of 115 final examinations from the institutions were categorized by the percentage of questions asked in four areas, reflecting ascending levels of complexity and sophistication: knowledge (simple recall or recognition of course content); comprehension (ability to grasp meaning of course content so that one can explain or summarize course material); application (ability to apply course content to new or real situations); and critical thinking (ability to analyze, synthesize, and evaluate). Controlling for course level, the discipline in which the course was offered, whether or not it was intended for major, and class size, institutional selectivity did have a small negative association with the percentage of knowledge level questions asked. This finding is seemingly consistent with AstinÕs (1993b) finding that institutional selectivity is negatively associated with the use of multiple choice questions on tests. More, importantly, however, selectivity had no significant relationship with the percentage of examination questions asked at the higher-order levels of comprehension, application, or critical thinking levels. This finding suggests that more selective research universities tend not to give any more rigorous examinations than less selective ones. To the extent that rigor in course examinations reflects similar rigor in the instruction received (an association that cannot be determined from the study), it may be that undergraduate selectivity alone is simply not a particularly effective way of identifying universities that have demanding academic programs.
To be sure, there is other work on the relationship between institutional characteristics and student learning in the literature of the 1990s. For example, net of student background characteristics and other influences, institutional size was found to have small, but statistically significant, positive effects on Graduate Record Examination quantitative scores as well as on National Teachers Examination communications skills and professional knowledge scores (A. Astin & Astin, 1993; Opp, 1991). Similarly, the percent of undergraduate students who are Asian had statistically significant, positive net effects on both the verbal and quantitative scores of the GRE (Anaya, 1996). Finally, and perhaps not surprisingly, AstinÕs (1993b) analyses found that, even when controls are made for SAT scores and other important variables, there is some evidence that different standardized test scores are influenced by the dominance of certain academic majors in an institution. For example, GRE verbal scores were positively influenced by the percentage of history or political science majors, while scores on the Law School Admissions Test were positively influenced by the percentage of social sciences majors. Unfortunately, all of these findings are based on a single sample, the 1985-89 CIRP data discussed previously. We found little in the way of attempts by other scholars to verify the robustness of the findings through replication on independent samples.
Institutional Type
A small body of literature attempts to estimate the net influence of attendance at different types of institutions on studentsÕ acquisition of subject matter knowledge and academic skills. The vast majority of this research focuses on comparisons of two-year community college versus four-year institutions, historically black colleges versus predominantly white colleges, womenÕs institutions versus coeducational institutions, and comparisons among institutions differing in classifications that might include several dimensions such as public universities, private four-year colleges, comprehensive universities, etc.
A series of studies analyzing data from the National Study of Student Learning (NSSL) sought to estimate the comparative effects of attending two-year versus four-year institutions. In the initial study, Bohr, Pascarella, Nora, Zusman, Jacobs, Desler and Bulakowski (1994) compared samples of students from a single two-year college and a large research university, both located in the same urban area, on first year gains in standardized measures of reading comprehension and mathematics. With controls for precollege scores on the two-standardized tests, residence, age, work responsibilities, and full- or part time enrollment, first year gains made by two-year college students were essentially the same in magnitude as the gains made by four-year college students. Employing a similar analytic model, this finding was replicated on an independent sample of students from five two-year and six four-year institutions located in eight different states (Pascarella, Bohr, Nora, & Terenzini, 1995). When this second sample was followed through the second year of postsecondary education and with controls made for such factors as precollege academic ability, age, race, and full- versus part-time enrollment, there were only trivial and statistically non-significant differences between two- and four-year college students on standardized measures of writing skills and science reasoning. This parity persisted irrespective of whether the comparison group was the six four-year colleges used in the Pascarella et al. (1995a) study or a more varied and academically selective group of 18 four-year institutions (Pascarella, Edison, Nora, Hagedorn, & Terenzini, 1995-96). Though not using standardized measures of learning, other investigations comparing the academic performance of two- and four-year college students, or the quality of instruction they receive, report findings that are consistent with those yielded by the National Study of Student Learning (Banta & Associates, 1993; Conklin, 1990; Montondon & Eikner, 1997). Thus, the small body of evidence we reviewed suggests that two-year colleges, which enroll nearly 40% of all students in postsecondary education nationally, may be fostering learning along such basic dimensions as reading comprehension, mathematics, writing skills, and scientific reasoning with about the same level of proficiency as a substantial segment of four-year institutions (Pascarella, 1999).
Another small, but important, body of research in the 1990s has focused on the extent to which college racial composition influences the acquisition of subject matter competence and academic skills of African-American students. There are about 100 historically black colleges (HBCs) and universities in the United States (Roebuck & Murty, 1993) which educate about 27% of all African-American college students (Higgins, Cook, Ekeler, Sawyer, & Prichard, 1993; Nettles, Perna, & Freeman, 1999). While these institutions have been an important source of the countryÕs African-American leaders and professionals, it is clear that, compared to predominantly white institutions (PWIs), HBCs as a group function at a distinct disadvantage in terms of financial and educational resources (Gladieux & Swail, 1999; "Vital signs," 1996). Despite this resource disadvantage, the evidence we reviewed suggests that HBCs may be at least as proficient as PWIs, if not more so, in fostering the learning of African-American students.
A series of analyses of the National Study of Student Learning (NSSL) data base followed a sample of African-American students attending two HBCs (one public and one private) and 16 public and private PWIs for three years (Bohr, Pascarella, Nora, & Terenzini, 1995; Flowers & Pascarella, 1999; Pascarella, Edison, Hagedorn, Nora, & Terenzini, 1996). With controls in place for such factors as precollege academic ability, sex, academic motivation, age, socioeconomic status, and the average academic ability of the entering students at each institution, differences between African-American students at HBCs and PWIs on standardized measures of end-of-first-year reading comprehension and mathematics and end-of-second-year science reasoning were trivial in magnitude and statistically non-significant. On standardized measures of end-of-second-year writing skills and end-of-third-year reading comprehension, African-American students at HBCs had a statistically significant advantage over their counterparts at PWIs of about .33 of a standard deviation (13 percentile points). Similar results have been reported when student learning is measured by student self-reports rather than objective, standardized tests. African-American students at HBCs typically report learning gains that are equal to (M. Kim, 2002b; Watson & Kuh, 1996) or greater (DeSousa & Kuh, 1996; Flowers & Pascarella, 1999) in magnitude than their counterparts attending PWIs.
It is not entirely clear how historically black colleges are able to provide educational experiences that compensate for their relative lack of educational resources. However, some clues are available. A fairly extensive body of literature suggests that HBCs provide a social-psychological climate that is more conducive to the academic adjustment and comfort of African-American students than do PWIs (For a brief review of these studies, see Bohr et al., 1995). Consistent with this, DeSousa & Kuh (1996) found that African-American students attending HBCs had a distinct advantage (about .7 of a standard deviation) over their counterparts attending PWIs in a scale measuring their level of effort and involvement in such academic activities as writing experiences, course learning, interaction with faculty, library use, science learning, and interactions with peers based on course content. Such differences in academic effort and involvement may at least partially explain why comparative outcomes between HBCs and PWIs on various measures of student learning do not reflect differences in educational resources.
Although womenÕs institutions educate only a very small percentage (about 2%) of women in American postsecondary education (Ricci, 1994), they have been the focus of considerable scholarly attention. This attention is likely due to the fact that graduates of these institutions are remarkably over-represented in terms of professional achievement and leadership positions (H. Astin & Leland, 1991; Tidball, Smith, Tidball, & Wolf-Wendel, 1999). (We will review this literature in greater detail in a subsequent chapter on career and economic returns to college.) Compared to the literature linking attendance at womenÕs institutions with career achievement, there is only a very small body of evidence on the net impact of womenÕs institutions on student learning in college. The evidence is mixed but, overall, does not support the notion that single-sex institutions foster womenÕs acquisition of subject matter competence or academic skills any more proficiently than coeducational institutions. For example, analyzing the 1985-89 CIRP data, Anaya (1992; 1996) found that, when controls were in effect for such factors as SAT scores, sex, secondary school grades and whether a student majored in engineering or the physical sciences, attending a womenÕs institution had a small, but statistically significant negative impact on GRE quantitative scores and essentially no impact on GRE verbal scores. Similarly, analyzing the same data and with essentially the same analytical approach, Astin (1993b) found that attendance at a womenÕs institution had a small negative influence on Medical College Admissions Test scores. Findings reported byD. Smith, Wolf, and Morrison (1995), based on the 1986-90 CIRP data, are somewhat at odds with those reported by Anaya and Astin. However, this incongruence may be attributable in large measure to the fact that their dependent variable, a scale termed Òlearning goals and outcomes,Ó was a composite of self-reported items that tapped womenÕs gains in preparation for graduate school and academic self-confidence, as well as gains in general and specific knowledge. Net of such factors as SAT scores, high school grades, and socioeconomic status, attendance at a womenÕs institution had essentially no direct influence on Òlearning goals and outcomes,Ó but did have a small, positive indirect effect, mediated primarily through increased academic involvement.
Finally, there is a small body of research that attempts to estimate the net effects of institutional classifications such as public university, private four-year college, and the like, on student acquisition of subject matter competence and academic skills. The findings from this research are confusing and, at times, seemingly contradictory. For example, Anaya (1999b) found that, even when controls were made for such factors as SAT scores and academic major, attending a public four-year college negatively influenced GRE verbal scores, while attending a private four-year college negatively influenced GRE quantitative scores. Conversely, employing a similar analytic model with a subsample of the same data, Anaya (1999c) found that attending either a public or private university had positive effects on Medical College Admissions Test scores. Similarly equivocal findings are reported in studies using various student self-reports of how much they learned during college. Kuh and Hu (1999b) report that students at private research universities report greater gains than their counterparts at public research universities, but when Knox, Lindsey and Kolb (1992) introduced controls for such factors as student academic ability and major field of study, the effect of attending a private institution became small and statistically non-significant. Smart (1996, 1997) found that attendance at a research or doctoral granting university, a comprehensive college or university, or a liberal arts college was unrelated to self-reports of learning gains when controls were made for precollege dispositions to enter different fields of study.[4]
That such broad classifications tell us little that is consistent or definitive about institutional effects on student learning should not come as too great a surprise. Classifications such as public research university, comprehensive college or university, or private liberal arts college are so general that each might include institutions differing substantially on other characteristics that have more immediate and important implications for how much students learn. Put differently, there may be so much within-classification variability among institutions that it washes out between-classification influences.
Institutional Environments
While institutional characteristics, such as student body selectivity and size, and institutional classifications, such as two-year college or historically black college, are relatively straight forward indicators of institutional traits requiring little inference, the same cannot always be said of measures of the institutional environment. In the research we reviewed, the nature of institutional environments is often inferred from aggregate student (or faculty) responses on items, either about themselves or their experiences at the institution, that cluster into internally consistent (i.e., reliable) scales. Scholars then attempt to name the scale in a way that accurately reflects or signifies the underlying construct being measured by the cluster of items. Thus, for example, Arnold, Kuh, Vesper, and Schuh (1991) used student perceptions of the nature of their interactions with peers, faculty, administrators, and professional staff at their institutions to form an environmental measure which they termed Òsupportive personal relationships.Ó Similarly, Astin (1993b) used a cluster of items measuring faculty goals for themselves and undergraduates which he termed Òfaculty commitment to the studentÕs personal development/altruism.Ó
This method is certainly a very legitimate, and often creative, means for assessing different dimensions of the environmental emphasis of an institution. However, one is often presented with a non-trivial challenge when attempting to synthesize the results of different investigations that measure institutional environments in this way. Whereas low inference characteristics such as student academic selectivity or size are relatively objective and interpretable across studies, different scholars often measure institutional environments in substantially different ways or from a very different perspective. This difference means that obtaining comparable findings of environmental effects across different investigations is frequently problematic. Consequently, in an attempt to make some sense of the different findings, we have chosen to group them into the following general categories or general environmental emphases: scholarship and learning; relationships among students, faculty, and professional staff; vocational/professional training; and racial/gender equity.
The most extensive and useful work on the impact of institutional environments on academic learning has been conducted by Astin (1993b; 1996)and by Kuh and his colleagues (Arnold et al., 1991; Arnold, Kuh, Vesper, & Schuh, 1993; Kuh, Arnold, & Vesper, 1990, 1991; Kuh, Pace, & Vesper, 1997; Kuh & Vesper, 1992; Watson & Kuh, 1996). While the estimated environmental effects they report appear to be generally quite small, both report consistent, and perhaps not surprising, evidence to suggest that the acquisition of subject matter knowledge and academic skills is enhanced by institutional environments that emphasize scholarship and learning. For example, in analyses of several different independent samples, a scale measuring the strength of an institutionÕs emphasis on scholarly, intellectual, and aesthetic matters had statistically significant, positive relationships with studentsÕ self-reported gains in areas such as general education, understanding science and technology, and understanding the arts, literature, and humanities (Arnold et al., 1993; Davis & Murrell, 1993; Kuh et al., 1997; Kuh, Schuh et al., 1991; Kuh & Vesper, 1992). These relationships persisted even with controls in different studies for confounding influences such as academic preparation, educational aspirations, socioeconomic status, race, work responsibilities, and other dimensions of the institutional environment. Corroborating evidence is reported by Whitt et al. (in press). With controls for student ability, institutional selectivity, and measures of academic effort and involvement, they found that the same scale used by Kuh and his colleagues had statistically significant positive effects on an objective, standardized measure of writing skills. Astin (1993b) takes a somewhat different approach in that he looks at environmental characteristics from a faculty as well as a student perspective. Nevertheless, the results he reports are also consistent with those of Kuh and colleagues. Controlling for student academic ability and a battery of other confounding influences, Astin found that the average scholarly orientation of an institution's faculty had statistically significant, positive effects on scores on the Law School Admissions Test and on both the verbal and quantitative scores of the Graduate Record Examination.
There is also a body of evidence to suggest that the nature of an institution's social or relational environment has implications for student learning. This evidence, however, is not totally consistent, and the direction of the findings appears to depend on whether one measures learning with student self-reports or objective, standardized measures. In a series of studies carried out primarily by Kuh and his colleagues, an environmental measure was employed that taps the extent to which students regard their relationships with peers, faculty, and administrators at their institution as friendly, approachable, and helpful (the negative end is competitive, remote, and frigid) (Kuh et al., 1997). Across the different studies conducted, this scale has typically had statistically significant, positive effects on studentsÕ self-reports of gains in such areas as general education skills and understanding the arts, literature and humanities. Moreover, these effects persisted even in the presence of controls for confounding influences such as studentsÕ academic preparation, socioeconomic status, work responsibilities, and other dimensions of the institutional environment (Arnold et al., 1993; Davis & Murrell, 1993; Kuh et al., 1997; Kuh, Schuh et al., 1991; Watson & Kuh, 1996). Using somewhat different measures of the social/relational environment of the institution, similar results are reported by Glover (1996) for two-year colleges and Graham (1997; 1998) for both two- and four-year colleges.
Related investigations by Kuh, Arnold, and Vesper (1990) and Hayek and Kuh (1998) have disaggregated this overall measure of an institutionÕs social environment and report that it is the extent to which peers and student groups are seen as friendly and supportive and to which faculty are seen as approachable, helpful, and encouraging that have the most important positive implications for how much students report learning during college. Using a different approach to the measurement of institutional environments, Astin (1993b) reports generally similar findings in his analyses of the 1985-1989 CIRP data. With controls for important student precollege traits and other potentially confounding influences, a scale termed Òstudent orientation of faculty,Ó that measured the extent to which an institutionÕs faculty were accessible to students and concerned with them as individuals, had a statistically significant, positive impact on studentsÕ self-reported growth in writing skills. More generally, the student orientation of the institution, as indicated by the percentage of the institutional budget spent on student services, also positively influenced self-reported writing skill gains.
Although they are perhaps not directly comparable, somewhat different results are obtained when the acquisition of subject matter competence and academic skills is assessed with objective, standardized measures rather than student self-reports. For example, Astin (1993b) found that faculty perceptions of the extent to which an institutionÕs environment was characterized by competition among students had net positive impacts on both Graduate Record Examination quantitative scores and scores on the Medical College Admissions Test. Similarly, the percentage of an institutionÕs budget spent on student services, which positively influenced self-reported writing gains, had a negative impact on the general knowledge score of the National Teachers Examination.
A third body of evidence has focused on whether or not an institutionÕs environmental emphasis on vocational preparation influences the acquisition of subject matter knowledge or academic skills. Nearly all of the research in this area has employed a two-item scale, developed by Kuh and colleagues, that taps studentsÕ perceptions of the practical value of the coursework at their institution and the extent to which their institution emphasizes the development of vocational and occupational competence (Kuh et al., 1990). There is replicated evidence on different samples to suggest that an institutionÕs practical/vocational emphasis, as measured by this scale, has statistically significant positive effects on studentsÕ self-reported gains in understanding science and technology. This effect persists even in the presence of controls for institutional type, individual student academic and social involvement, and measures of the scholarly and social environment of the institution (Arnold et al., 1991, 1993; Kuh et al., 1990; Kuh, Arnold et al., 1991; Watson & Kuh, 1996). There is less consistent evidence to suggest that an institutionÕs practical/vocational emphasis has a net influence on more general education gains such as studentsÕ self-reported growth in understanding literature and the arts or in writing skills. Some investigations suggest that it does (Hayek & Kuh, 1999), some suggest that it does not (Arnold et al., 1993; Davis & Murrell, 1993; Watson & Kuh, 1996), and still others suggest that the effects are positive in some institutional types and negative in others (Kuh et al., 1997).
When the criterion is objective, standardized tests rather than student self-reports, there is little evidence to suggest that the practical/vocational emphasis of an institution has consistent net effects on student learning. Using the scale developed by Kuh et al. (1990), and controlling for student academic ability, measures of student academic and social involvement during college, as well as other measures of the institutional environment, Whitt et al. (in press) found that the practical/vocational emphasis of a community college environment had a statistically significant, positive effect on reading comprehension. However, this effect did not hold for four-year colleges, and the same environmental dimension had only trivial, non-significant impacts on other learning outcomes such as mathematics, writing skills, or science reasoning.
Finally, there is a body of literature arguing that an institutional environment that is relatively free of racial or gender bias may foster the learning of students of color and women (Gallos, 1995; Hayes & Flannery, 1997; Kraft, 1991; Sandler, Silverberg, & Hall, 1996). There is some evidence to support this contention, although in the case of gender equity the evidence is complicated by how one defines the institutional environment. Analyzing data from the National Study of Student Learning, Cabrera, Nora, Terenzini, Pascarella, and Hagedorn (1999) sought to determine if a measure of perceived racial discrimination inhibited the self-reported learning gains of both Caucasian and African-American students. The measure of discrimination in the institutional environment was based on studentsÕ perceptions of such things as witnessing discriminating gestures or words directed at minorities or witnessing acts of racism or prejudice. With statistical controls in place for such factors as secondary school and college grades, parental encouragement, and measures of the social and academic experience of college, the level of prejudice or discrimination in the institutional environment had a statistically significant and negative total impact on the self-reported gains of African-American students in quantitative skills, analytical thinking, and understanding the fine arts. About 2/3 of the effect was indirect, mediated through the negative influence of prejudice/discrimination on the quality of African-American studentsÕ academic and social experiences. The environmental prejudice/discrimination scale had little or no impact on the self-reported learning gains of Caucasian students.
Similar findings are reported by Hagedorn, Siadat, Nora, and Pascarella (1997) using essentially the same sample and measure of environmental prejudice/discrimination as Cabrera et al. (1999), but with gains on an objective measure of mathematics skills as the dependent variable. With controls for such factors as full- or part-time enrollment, quality of teaching received, study habits, work responsibilities, and social involvement, environmental prejudice/discrimination negatively influenced gains in mathematics skills made by students of color. In contrast to Cabrera et al., however, Hagedorn et al. also found environmental prejudice/discrimination negatively influenced the gains in mathematics skills made by Caucasian students.
Both the Cabrera et al. (1999) and Hagedorn et al. (1997) investigations were conducted on the same sample, and we failed to uncover research based on an independent sample that directly replicates their findings.[5] However, AstinÕs (1993b) extensive analyses on the 1985-89 CIRP data provide evidence that is not inconsistent with their findings. With controls for student academic ability and a large number of other potentially confounding influences, Astin found that the diversity emphasis of an institution had a significant, positive effect on the quantitative score of the Graduate Record Examination.
A number of scholars have posited that American coeducational postsecondary institutions often create a Òchilly climateÓ for women which can be detrimental to their intellectual and personal development (Allen & Niss, 1990; Boyer, 1987; R. Hall & Sandler, 1982, 1984; Holland & Eisenhart, 1990; Sandler et al., 1996). Although the presence of such a Òchilly climateÓ has been challenged by observational studies of classroom behaviors (Brady & Eisler, 1996; Cornelius, Gray, & Constantinople, 1990; Fassinger, 1995; D. Williams, 1990), other researchers have attempted to determine if womenÕs perceptions of gender inequity in an institutionÕs environment inhibit their learning (Hagedorn, Siadat, Nora et al., 1997; Pascarella et al., 1997; Whitt, Edison, Pascarella, Nora, & Terenzini, 1999b). This latter body of research is based on analyses of a single sample (the National Study of Student Learning), and gender inequity (or the chilly climate) is assessed by a scale measuring womenÕs perceptions of the extent to which they have experienced such things as being singled out in class for their gender, being treated differently by faculty because they are women, or observing prejudice against women by other students at the institution.
The findings from this research are mixed. When the institutional environment is assessed from the level of individual womenÕs perceptions of gender inequity the weight of evidence supports the contention that gender inequity has a negative impact on how much women learn during college. Controlling for such factors as precollege academic ability, the type of coursework taken, and other measures of academic and social involvement, the gender inequity or chilly climate scale had statistically significant, negative impacts on: a standardized measure of learning consisting of reading, mathematics, and critical thinking for two-year college women (Pascarella et al., 1997); gains on a standardized measure of mathematics skills for women at both two- and four-year colleges (Hagedorn, Siadat, Nora et al., 1997); and self-reported gains in writing and thinking skills, understanding science, and understanding the arts and humanities for women at both two- and four-year colleges (Whitt et al., 1999b). The only exception to this was a small positive influence of institutional gender inequity on reading comprehension for women at four-year colleges (Whitt et al., 1999b).
When environmental gender equity was assessed at the aggregate institutional level, however, the nature of the findings was markedly different. Whitt et al. (in press) redefined the gender inequity environment as the average responses on the gender inequity/chilly climate scale by the sample of women at each institution. Subsequently, this aggregate measure of institutional-level gender inequity was found to have positive associations with womenÕs gains on a standardized measure of mathematics skills. This association persisted even in the presence of statistical controls for factors such as a studentÕs precollege academic ability, extensive measures of a studentÕs academic and social effort and involvement, and other measures of the aggregate institutional environment.
A Final Thought on Between-College Effects
If there is one thing that characterizes the research on between-college effects on the acquisition of subject-matter knowledge and academic skills it is that in the most internally valid studies even the statistically significant effects tend to be quite small and often trivial in magnitude. (For excellent examples of studies that estimate the size of between-college effects relative to other influences, see Angoff & Johnson, 1990; Ethington, 1998; Kuh & Hu, 1999a). There may, of course, be some methodological and measurement reasons for this phenomenon (see, for example, Pascarella & Terenzini, 1991, pp. 80-83). However, perhaps the most useful explanation from a substantive standpoint is one argued cogently by Baird (1988; 1991) and Pace (1997) and supported empirically in creative studies by Smart and his colleagues (Smart, 1997; Smart & Feldman, 1998). Specifically, we find few major between-college influences because aggregation of characteristics or environmental stimuli at the institutional level provides indexes that are simply too remote from the actual social and intellectual forces that shape individual learning in college. Though it likely varies by institutional size and mission, the vast majority of colleges and universities in the American postsecondary system have important subenvironments with more immediate and powerful impacts on individual students. Because of these subenvironments we should probably anticipate a greater diversity of impacts within than between institutions. We turn now to a synthesis of the impact of some of these subenvironments in the next section on within-college effects.
Conclusions from
In our 1991 synthesis we came to the following general conclusions:
1. Although it is likely that students learn the most during college in the subject matter specific to their major, we found little methodologically sound evidence to indicate that academic major had a differential influence on the acquisition of subject matter knowledge or academic skills outside the major.
2. What is learned during college is differentially influenced by the pattern of courses taken, even when student ability is controlled. This research is in its nascent stages, however, and we cannot yet determine if replicable patterns of differential coursework effects on learning exist across institutions.
3. There is little consistent evidence to suggest that mastery of factual subject matter content is accomplished any more efficiently in small, discussion-oriented classrooms than in large, lecture-oriented classrooms. However, the evidence does suggest that the former may be preferable when the goal of instruction is affective or higher-order cognitive skills.
4. Considerable evidence exists to suggest that certain individualized instructional approaches or systems emphasizing small, modularized units of content, mastery of one unit before moving to the next, timely and frequent feedback to students on their progress, and active student involvement in the process of learning are consistently effective in improving subject matter learning over more traditional instructional formats such as lecture and recitation. Of the five individualized instructional approaches reviewed, four of them (audio-tutorial, computer-based, programmed, and visual-based instruction) showed statistically significant modest learning advantages over traditional approaches of from 6 to 10 percentile points. The typical advantage attributable to the personalized system of instruction (PSI or the Keller Plan) was 19 percentile points.
5. Given the nature of the evidence, it is probably an overstatement to say that we know what causes effective teaching. However, we do know what effective teachers do and how they behave in the classroom. Student subject matter learning appears to be enhanced when teachers (1) structure and organize class time well, (2) are clear in their explanation of concepts, (3) have a good command of the subject matter and are enthusiastic in its presentation, (4) present unambiguous learning stimuli to students (for example, using examples and analogies to identify key points, signal a topic transition clearly), (5) avoid vague terms and language mazes, and (6) have good rapport with students in class (are open to student opinions and encourage class discussion and the like) and are accessible to students outside of class. Perhaps the most important finding in research on teacher behaviors that are associated with student learning is that some behaviors may themselves be learnable (for example, structuring and organizing class time efficiently, teacher clarity).
6. Not all subject matter learning in college is simply a function of what the institution does to the student in instructional settings. Rather, much depends on the quality of the studentÕs effort in making use of the range of learning opportunities provided by the institution. Instructional strategies such as tutoring and studying material for the purpose of teaching it to someone else appear to enhance student involvement or effort in learning, thereby enhancing subject matter mastery. Course work, however, may not be the only arena where student involvement or effort is associated with increased learning. Though not as methodologically sound as that from instructional experiments, there is nevertheless a considerable body of correlational evidence to suggest that how much a student perceives himself or herself as having learned in college is a function of his or her effort in the social as well as the academic system of the institution. Such effort seems to be independently and positively influenced by living on campus (versus commuting to college) and by attending a small institution.
Evidence in the 1990s pertaining to within-college effects on the acquisition of verbal, quantitative, and subject matter competence is both extensive and varied. Indeed, we uncovered literally hundreds of relevant studies and scores more that were marginally relevant. In order to make sense of this voluminous literature, we have grouped the studies into the following eight general topics or clusters: (1) academic major, (2) coursework patterns, (3) class size, (4) general pedagogical approaches, (5) focused classroom instructional techniques, (6) teacher behaviors, (7) academic effort/involvement, and (8) social and extracurricular effort/involvement. Within these general clusters, we have limited our review to specific areas of inquiry where there is a reasonable body of evidence upon which to base conclusions. Thus, we have tended to exclude evidence based on the findings of single studies which lack replication or, at least, the attempt at replication.
Academic Major
We found little in the research from the 1990s that is at odds with the conclusions from our previous synthesis concerning the impact of academic major. Net of precollege ability and other potential confounding influences, undergraduates tended to make the greatest gains in subject matter areas consistent with their major area of study. This finding was particularly true of majors that place a premium on quantitative competencies. For example, analyzing the 1985-89 CIRP data, Astin (1993b) and Anaya (1992; 1996) found that majoring in physical science, engineering and technical fields had small positive effects on GRE quantitative scores. Similar findings for the physical and biological sciences are reported by Angoff and Johnson (1990) with an independent sample of over 20,000 students who took both the SAT and the GRE. The evidence is less clear-cut for the acquisition of verbal skills. Astin (1993b) found that majoring in the social sciences positively influenced GRE verbal scores, but this was not replicated by Anaya (1992; 1996), working with the same data, or by Angoff and Johnson (1990), analyzing an independent sample. Similarly, Lehman and Nisbett (1990) found that gains in verbal reasoning during college were essentially unrelated to whether a student majored in the natural sciences, humanities, social sciences, or psychology.
The evidence with respect to the impact of major on professionally-related knowledge is, at best, mixed, and, therefore, less than convincing. Analyzing the same data, Astin (1993b) and Opp (1991) report somewhat different findings with respect to the net impact of majoring in education on National Teachers Examination Scores. Controlling for academic ability and other confounding influences, Astin found that majoring in education negatively influenced the general knowledge score on the NTE, but positively influenced the NTE professional knowledge score. Opp, however, reported that majoring in education (versus another major) neither facilitated nor inhibited scores on any of the NTE area tests. We suspect that this difference in findings from the same sample reflects differences between the two studies in the confounding influences taken into account.
There is also evidence to suggest that academic major may have little consistent net impact on scores on the Medical College Admission Test. Net of confounding influences Astin (1993b) found that no specific major enhanced scores on the MCAT, although scores were negatively influenced by majoring in the allied health fields. AnayaÕs (1999a) analyses suggest that, with the possible exception of a very small, positive influence of majoring in the physical sciences, undergraduate major field of study is essentially unrelated to MCAT scores. Consistent with the conclusion that academic major has little or no impact on MCAT scores is a body of recent research suggesting that performance in medical school, as measured by such criteria as clinical science grades, scores on the National Board of Medical Examiners Examination, or U.S. Medical Licensing Examination scores, is largely unrelated to undergraduate major when undergraduate grades and other confounding influences are taken into account (M. Hall & Stocks, 1995; S. Smith, 1998; Sorenson & Jackson, 1997).[6]
Coursework Patterns[7]
It seems logical, perhaps even axiomatic, that what students learn in college is related to the coursework they take as undergraduates (E. Jones & Ratcliff, 1990, 1991; Pike, 1992a; Ratcliff, Jones, Guthrie, & Oehler, 1991; Ratcliff & Yaeger, 1994). Not surprisingly, there is a substantial body of knowledge to support this contention, even when student academic ability is controlled. However, beyond the unsurprising conclusion that students tend to acquire the most knowledge and the highest level of academic skills in areas where they take the most courses, it has been difficult to uncover consistent patterns of effects across institutions. This difficulty in itself is not particularly surprising since even courses with the same name or topic may differ substantially across institutions in terms of both the content covered and the level at which they are taught.
The most focused research in this area has been conducted by Ratcliff and colleagues (E. Jones & Ratcliff, 1991; Ratcliff, 1993; Ratcliff & Jones, 1993; Ratcliff & Yaeger, 1994). This body of research has most often used the Graduate Record Examination as the dependent variable and has typically attempted to control for student precollege academic ability, usually in the form of SAT or ACT scores. The method is empirically driven and essentially consists of calculating residual GRE scores; that is, the difference between oneÕs actual GRE score and the GRE score one is predicted to get based on precollege academic ability scores. Subsequently, student transcripts are examined and the courses reported on them are clustered into patterns based on the residual scores of the students who enrolled. In this way, clusters of courses associated with the greatest net learning (i.e., the largest positive residual scores) can be identified (Ratcliff & Jones, 1993). This ÒCoursework Cluster Analysis ModelÓ (Ratcliff, 1993) has been applied at a number of different institutions and it would appear that there are two basic generalizations from the findings: 1) students who took different patterns of coursework learned different things and developed different academic skills, at least as measured by different item types on the Graduate Record Examination, and 2) the structure of general education or a core curriculum in institutions did not produce important impacts on learning as measured by the GRE. For example, quantitative skills were linked to upper-division course work in economics, music, physical therapy, and business as well as general mathematics courses. Because the coursework cluster analytic model is empirically rather than theoretically driven, however, the findings concerning general education may have an alternative explanation. The linking of non-quantitative courses with high mathematics residuals may be correlational rather than causal. Students who develop strong quantitative abilities in general education courses may simply be more likely to subsequently enroll in specific non-quantitative courses.
It is also quite possible that the Graduate Record Examination is not the most content-valid instrument for assessing the impact of a general education curriculum. Indeed, analyses of data from a national sample and a single institution sample of seniors who took the College Basic Academic Subjects ExaminationÑCBASE (a test specifically designed to measure the outcomes of general education) found that whether or not students took general education courses in physical sciences/engineering, business/accounting/economic/statistics, or liberal arts significantly predicted scores on the four CBASE subtests, even in the presence of controls for precollege academic ability (Pike, 1992b). In a more focused, single institution study Olsen (1991) reported similar results for the impact of general education science-oriented course work on sophomoresÕ scores on the science reasoning module of the Collegiate Assessment of Academic Proficiency. Unfortunately, a somewhat counterintuitive finding reported by Knight (1993a; 1993b) increases ambiguity about the effects of general education coursework. Using the College Outcome Measures Project (COMP) Objective Test (another measure of general education skills) as the dependent measure, Knight found just the reverse of what might be anticipated. With institutional type and entering academic ability taken into account, students attending institutions where less than 40% of undergraduate curricular requirements were devoted to general education and where there was not equal distribution of general education courses within the requirement had significantly higher gains on the COMP than students at institutions where 40% or more of the undergraduate curriculum was devoted to general education and where there was equal distribution of courses within the general education requirement. This finding seems more consistent with those of Ratcliff (1993) than with those of either Pike or Olsen.
Although it is often not the major focus of inquiry, evidence pertaining to the impact of coursework on the acquisition of subject matter competence and academic skills is reported in a substantial number of additional studies. These topics include: historical knowledge (Grossweiler & Slevin, 1995); mathematics skills (Gray & Taylor, 1989); reading comprehension (Bohr, 1994/1995); writing skills (Edison et al., 1998); verbal and quantitative scores on the Graduate Record Examination (Anaya, 1992, 1996; A. Astin, 1993b; Hurtado, 1990); general academic knowledge and skills (Pike, 1991b); The Medical College Admissions Test (Anaya, 1999c; A. Astin, 1993b); the Law School Admissions Test (A. Astin, 1993b); and the National Teachers Examination (A. Astin, 1993b). Nearly all of the studies introduce controls for precollege academic ability (SAT/ACT scores), and, in some cases, other important confounding influences such as race, gender, academic motivation, and full- or part-time enrollment. While there are one or two unexpected findings and exceptions (Pike, 1991b), the weight of evidence from this body of studies is quite clear in suggesting that scores on each of the outcome instruments are significantly, if modestly, improved by taking undergraduate courses that emphasize content or the acquisition of skills consistent with what the instrument measures. Put differently, other things being equal, undergraduates learn and become skilled in what they study. Thus, history knowledge is improved by the number of history courses taken; mathematics and quantitative skills by the number of mathematics and science courses taken; reading comprehension and verbal ability by the number of literature, English, foreign language, and writing skills courses (and somewhat surprisingly, by the number of science courses taken); writing skills by the number of courses taken in the arts and humanities; Medical College Admissions test scores by the number of science courses; and scores on the Law School Admissions Test and the National Teachers Examination by the number of interdisciplinary courses taken.[8], [9]
Class Size
The literature we reviewed from the decade of the 1990s suggests that we may need to revise, at least to a certain extent, our 1991 conclusion that subject matter knowledge is acquired with equal proficiency in large as well as small classes. We uncovered ten studies that focus on the effects of class size on course learning. All of the investigations are quasi-experimental or correlational in design and, with some variation across studies, each attempts to control for important confounding influences such as academic ability or prior achievement, instructor experience, amount of homework, and the like. Unfortunately, five of the studies used course grade as the measure of learning, one study used both course grade and a common final examination, and four studies (all in the field of economics) employed an objective standardized measure: the Test for Understanding of College Economics (TUCE).
The weight of evidence from the body of research using course grade as the dependent measure is reasonably clear in suggesting that, other factors being equal, increasing class size has a statistically significant, negative influence on subject matter learning. This finding held in five of the six studies that used course grade as a measure of learning (Biner, Welsh, Barone, Summers, & Dean, 1997; Keil & Partell, 1998; Raimondo, Esposito, & Gershenberg, 1990; Scheck, Kinicki, & Webster, 1994; P. Thompson, 1991). The sixth study (Goldfinch, 1996) found no difference in course grades between students in small classes versus students in large classes; but the former had significantly higher scores than the latter on a common final examination.
In the four studies of economics classes using the same standardized measure of economics knowledge (TUCE) as the dependent variable, the weight of evidence is less clear. This lack of clarity may be attributable to the use of different samples from the same data, or different operational definitions of class size. In the most extensive of the three investigations, Zietz and Cochran (1997) analyzed data from 189 separate economics courses taught at 53 separate institutions. The institutions included community colleges, liberal arts colleges, comprehensive universities, and doctoral granting institutions. With controls for such factors as institutional control and type, pre-course TUCE score, cumulative college grades, gender, interest in economics, course meetings per week, amount of homework, and instructor experience and professional preparation, Zietz and Cochran found that increasing class size beyond 30 students negatively influenced individual student scores on the post-course administration of the TUCE. However, a study of a subsample of the same data, but using classes rather than individuals as the unit of analysis, failed to find that class size negatively influenced achievement on the TUCE (Kennedy & Siegfried, 1997). Similarly, a third study of 12 economics sections at a single institution (Lopus & Maxwell, 1995) found that class size had statistically non-significant impact on course achievement when controls were in effect for a battery of potential confounding influences, including pre-course TUCE scores, college grades, and course instructor characteristics. Most recently, Becker and Powers (2001) suggest that, when a statistical adjustment is made for sample attrition, initial class size has a negative influence on pre- to post-course gains on the TUCE. However, they caution that their results are only suggestive and tentative. The conflicting evidence and continuing methodological problems surrounding this small body of research make it difficult to form a firm conclusion.
Taken as a body of research it would appear that, when learning is measured by course grade, class size may have a negative influence. However, when learning is assessed by a standardized measure, the evidence is not compelling that class size has a negative influence, at least in the field of economics.
General Pedagogical Approaches
In this section we attempt to synthesize a vast literature on the learning impacts of general or broad-based pedagogical/instructional approaches. These approaches include: learning for mastery, computer and information technology, distance learning, active learning, collaborative learning, cooperative learning, small-group learning, supplemental instruction, constructivist-oriented approaches (e.g., constructivist teaching, problem-based learning), and learning communities. Because of the literally hundreds of studies that have been conducted on these general instructional approaches we cannot review each investigation in detail. Rather, we will attempt to provide an overall assessment of what the weight of evidence suggests, citing as many relevant studies as we can. In some areas we benefit from the results of research syntheses that have already been conducted.
Learning For Mastery
In our 1991 synthesis, we reviewed the accumulated evidence pertaining to the effects on the acquisition of subject matter knowledge of the personalized system of instruction (PSI or the Keller Plan). In addition to concluding that PSI resulted in an average course content achievement advantage (over conventional methods) of about 19 percentile points, we also concluded that the instructional feature accounting for the majority of PSIÕs impact was the requirement that students demonstrate mastery of one unit of course content before moving on to the next. Mastery learning is not the exclusive domain of the personalized system of instruction, however. There is also a body of evidence on an instructional procedure termed Òlearning for masteryÓ (LFM), or group-based mastery learning (Bloom, 1968; Guskey, 1985), that we overlooked in our 1991 synthesis. Like PSI, learning for mastery requires students to demonstrate mastery of course content against an absolute criterion. However, as explicated by Guskey and Piggott (1988), it differs from PSI in that PSI requires students to pace themselves through self-instructional materials while the instructor provides support and individual assistance only when needed. In contrast, the mastery learning model relies primarily on a group-based and instructor-paced approach to instruction. As such, mastery learning is generally more conveniently adapted to classroom situations where a single teacher has charge of 25 or more students and both instructional time and the curriculum are relatively fixed. In a mastery learning class, the teacher determines the pace of the original instruction and directs accompanying feedback and corrective procedures (Guskey & Pigott, 1988). Students who fail unit quizzes in learning for mastery courses usually receive individual or group tutorial help on the unit before moving on to the new material or before being allowed to take the course final examination (Kulik, Kulik, & Bangert-Drowns, 1990). (Excellent, detailed discussions of the implementation of learning for mastery are provided by Bloom, 1968; Guskey, 1985; Guskey & Pigott, 1988; Kulik et al., 1990).
We uncovered two meta-analyses of learning for mastery at the postsecondary level. Guskey and Pigott (1988) synthesized the results of 12 college-level studies, while Kulik, Kulik and Bangert-Drowns (1990) based their conclusions on a synthesis of 19 studies conducted with college samples. Our best estimate is that there is about 60-65% overlap in the two groups of studies considered. Both meta-analyses considered only experimental or quasi-experimental studies, and the subject areas included algebra, history, education, biology, reading, psychology and English. Guskey and Pigott found that group-based mastery learning approaches had an average course achievement advantage over conventional (non-mastery) approaches of .41 of a standard deviation (16 percentile points) while Kulik, Kulik, and Bangert-Drowns report a corresponding achievement advantage of .68 of a standard deviation (25 percentile points). It would also appear, in comparison to non-mastery approaches, that students in mastery taught classes spend more class time on-task or engaged in learning and that instructors in mastery taught classes use class time more efficiently (Guskey & Pigott, 1988), both of which may account for its impact on knowledge acquisition.
Interestingly, there is not a great deal of evidence to suggest that group-based mastery learning is any less effective than mastery learning within the PSI self-paced model. For example, Kulik, Kulik, and Bangert-Drowns (1990) report an achievement advantage based on 67 college-level PSI studies of .48 of a standard deviation, compared to .68 for group-based mastery learning studies. Moreover, Semb, Ellis, and Araujo (1993) present experimental evidence suggesting parity between introductory psychology students taught by PSI or group-based mastery approaches, not only in end-of-course achievement, but also in course content retained four and eleven months after the end of the course.
Computers and Information Technology
It appears to be widely accepted that computers and related information technologies have the potential to fundamentally transform the nature of teaching and learning in postsecondary education (e.g., Green, 1996; Kozma & Johnston, 1991; Kuh & Vesper, 1999; Upcraft, Terenzini, & Kruger, 1999; West, 1996). The promise is, indeed, substantial.
Used appropriately and in concert with powerful pedagogical approaches, technology is supposed to enhance student learning productivity. It does this by enriching synchronous classroom activities and providing students with engaging self-paced and asynchronous learning opportunities that enable students to learn more than they would otherwise at costs ultimately equal to or below that of traditional classroom based instruction. . . . .(Kuh & Vesper, 1999, p. 1)
Not surprisingly, during the decade of the 1990s there has been an extensive body of inquiry on the use of computers and related technologies to assist or augment postsecondary instruction. We found little in this literature that would lead us to fundamentally alter our 1991 conclusion that computer-assisted instruction is linked to modest increases in course-level achievement. In the early 1990s, a number of scholars produced either narrative or quantitative (meta-analytic) research reviews suggesting that computer-based instruction leads to modest improvements in subject-matter acquisition with a decrease in instructional time (Cartright, 1993; P. Cohen & Daganay, 1992; Kulik & Kulik, 1991; Leonard, 1990; Liao & Bright, 1991; McComb, 1994; Teich, 1991; Weller, 1997).
The most comprehensive of these research syntheses is the meta-analysis of Kulik and Kulik (1991). They synthesized the results of 149 experimental and quasi-experimental studies conducted from 1984 to 1991 with postsecondary samples. The course content was in mathematics, science, social science, reading and language, and vocational training. Ninety-one (91) of the studies were classified as computer-assisted instruction (CAI), where the computer provides (a) drill-and-practice exercises, but not new material, or (b) tutorial instruction that includes new material; 17 of the studies were classified as computer-managed instruction (CMI), where the computer evaluates student test performance, then guides students to appropriate instructional resources, and keeps records of student progress; and 35 of the studies were classified as computer-enriched instruction (CEI), where the computer (a) serves as a problem-solving tool, (b) generates data at the studentÕs request to illustrate relationships in models of social or physical reality, or (c) executes programs developed by the student. In all three categories, computer-based instruction (versus traditional instructional approaches such as lecture, discussion, and text) produced average improvements in tested understanding of course content that were statistically significant. The average effect sizes were: CAI: .27 of a standard deviation (11 percentile points); CMI: .43 of a standard deviation (17 percentile points); and CEI: .34 of a standard deviation (13 percentile points). Overall, only chance differences in effect sizes were found for studies based on true experiments and those based on quasi-experiments. Weighting the effect sizes by the number of studies in each category, we computed an effect size across all three types of computer-based instruction of .31 of a standard deviation (12 percentile points).
Thirty-two (32) of the 149 postsecondary studies reviewed by Kulik and Kulik (1991) also compared the instructional time for students in computer-based and traditional classes. The ratio of instructional time for computer-based students to instructional time for students in traditional classes averaged .70 across all comparisons. In other words, students in computer-based classes required about two-thirds as much instructional time as their counterparts in traditionally taught classes.
Since the publication of Kulik and KulikÕs (1991) meta-analysis, the impact of computer-based instruction on learning has continued to be the focus of substantial inquiry. Most of this research employs true experimental or quasi-experimental designs in which various forms of computer-based instruction are compared with traditional or conventional instructional approaches such as lecture, discussion, or text. The weight of evidence from this more recent body of research is quite consistent in suggesting that, compared to similar students taught by traditional instructional methods, the knowledge acquisition of students in computer-based courses is either: 1) significantly better (Agarwal & Day, 1998; Alavi, 1994; Askar & Koksal, 1993; Connolly, Eisenberg, Hunt, & Wiseman, 1991; Faryniarz & Lockwood, 1992; Gregor & Cuskelly, 1994; Huang & Aloi, 1991; Lowe & Bickel, 1993; Marttunen, 1997; Mose & Maney, 1993; Reisman, 1993; Riding & Chambers, 1992; Vitale & Romance, 1992; Williamson & Abraham, 1995), or 2) not significantly different (Adams, Kandt, Thronmartin, & Waldrop, 1991; Billings & Cobb, 1992; Carlsen & Andre, 1992; Guy & Frisby, 1992; Marrison & Frick, 1993; Murphy & Davidson, 1991; Smeaton & Keogh, 1999; Taraban & Rynearson, 1998; Tjaden & Martin, 1995; White, 1999). There was only isolated evidence in which students taught by traditional methods significantly outperformed students receiving computer-based instruction (Watkins, 1998). We computed an effect size for all of the studies in this recent body of literature that provided the requisite statistical information. The average effect size, favoring computer-based instruction, was .28 of a standard deviation (11 percentile points). There appeared to be little or no difference in the mean effect sizes of studies using true experiments or quasi-experiments. Though admittedly this is a rougher estimate, it is nevertheless quite consistent with the overall effect size of .31 of a standard deviation that we derived from Kulik and KulikÕs (1991) comprehensive meta-analysis. Also consistent with Kulik and KulikÕs conclusions was evidence suggesting that students in computer-based classes require less instructional time than their counterparts in traditionally-taught classes (Leonard, 1992; Taraban & Rynearson, 1998; Tjaden & Martin, 1995).[10]
Although our synthesis found extensive work focusing on computer use in individual courses, we uncovered only two studies of the impact of computer use during college on student learning. Possibly because they use different measures of computer use, different institutional samples, and different measures of student learning, the results of the two studies are only partially consistent. Kuh and Vesper (1999) analyzed data from over 125,000 students at 205 four-year institutions. With controls for such factors as college grades, age, gender, work responsibilities, parental education, and educational aspirations, a measure of the extent to which students felt they gained a familiarity with computers was significantly and positively associated with self-reported gains in such areas as writing clearly, problem solving, and self-directed learning.
Flowers, Pascarella, and Pierson (1999) considered the impact of both e-mail and different types of computer use on objective, standardized measures of end-of-first-year reading comprehension and mathematics. Their sample was drawn from the 5 two-year and 18 four-year institutions participating in the National Study of Student Learning. With controls in place for such factors as precollege reading and mathematics achievement, academic motivation, patterns of coursework taken, the academic selectivity of the institution attended, and the quality of instruction received, they found differences in the impact of computer use between the two- and four-year sample. Electronic mail use had no significant impact on either reading comprehension or mathematics for the four-year sample, but had statistically significant, if modest, negative impacts on both outcomes in the two-year sample. Consistent with the literature on computer-based classroom instruction, the use of computers for class assignments had a net positive impact on reading comprehension for two-year college students. The corresponding effect for four-year college students, however, was trivial and statistically non-significant. Engaging in computer word-processing had a small but statistically significant, positive impact on reading comprehension for the four-year sample which persisted even when additional controls were introduced for studentsÕ first-year writing experiences.
Interestingly, the extensive body of research indicating consistent, albeit modest, positive impacts of computer-based instructional approaches on student learning appears to have initiated an ongoing and vigorously contested debate in the educational technology field. Clark (1991; 1994) has argued that the results of such research and of research syntheses such as those by Kulik and Kulik (1991) are questionable because the medium of instruction does not influence learning; the quality of teaching and instruction does. The essence of his argument appears to be that studies comparing the relative advantage for student learning of one instructional medium (e.g., computers) over another (e.g., lecture) will inevitably confound the instructional medium with the quality or type of instruction received. Consequently, all the supposed learning benefits attributed to computer-based instruction could just as easily be explained by the specific instructional methods they support or augment (Ehrmann, 1995; Weller, 1996).
On the other, or at least a different, side of this debate, Kozma (1991; 1994a; 1994b) and Reiser (1994) have argued that the specific ways computers or instructional technology are employed is not irrelevant to instruction or student learning. Indeed, certain technological attributes make certain kinds of instructional approaches possible, or enhance their impact. Moreover, some types of computer applications may be particularly effective in supporting some kinds of instructional approaches or learning goals, with some kinds of learners. In effect, computer or information technology approaches may interact with learner characteristics. The essence of these arguments would appear to be that it is probably fruitless to focus on computers as a conveyer of some type of instructional approach. Rather, what counts is how information technology or visual media integrated with instructional approaches can facilitate knowledge construction and meaning-making and, thereby, increased learning, on the part of students (Kozma, 1994a, 1994b).[11]
Distance Learning
Paralleling, and indeed dependent upon, the growth and development of information and media technologies has been the dramatic growth of distance or remote site instructional offerings in postsecondary education (El-Khawas, 1995; Keegan, 1993; Moore & Thompson, 1997; Walsh & Reese, 1995). For example, a 1997 report by the National Center for Education Statistics found that about 60% of American public two- and four-year institutions offered distance education courses, usually in the form of either one-way prerecorded courses or two-way interactive video courses (Lewis, Farris, & Alexander, 1997). Distance education has been used to deliver remote site or off-campus courses in a variety of fields such as religious education, business, library science, teacher education, general studies, medicine and nursing, social sciences, social work and scientific and technical fields (Burgess, 1994).
Literally hundreds of studies have addressed the issue of whether instruction delivered to remote sites, via various media technologies, is as effective as conventional on-campus, face-to-face instruction. In most instances, this research question is essentially the same as asking whether instructional media, such as television, videotapes, two-way interactive video, or computer conferencing positively or negatively influence student learning. Within the context of on-campus versus remote site instruction, the clear weight of evidence from this research appears to support the contention of Clark and others that the specific medium of instruction has little impact on how much students learn