Travis: Meaningful Assessment

Meaningful Assessment
Jon Travis

Citation: Travis, J. (1996). Meaningful assessment. Clearing House, 69(5), 308-312.

Return to: | Readings in Educational Psychology | Educational Psychology Interactive |

As accountability becomes a major driving force in American secondary education, the use of tests to evaluate performance has increased (Feuer, Fulton, and Morison 1993). Realistically the use of testing for such summative grading or ranking of individuals or institutions encompasses only one of three basic purposes for evaluation (Bloom, Hastings, and Madaus 1971; Mitchell 1992) (the other two being student placement and the improvement of learning, instruction, or an institution). Moreover, if students are to be tested simply to establish criteria for assigning grades and communicating institutional accountability, many truly educational outcomes of evaluation are ignored.

Critical Weaknesses of Assessment Procedures

Whether prepared by teachers for classroom use or designed by large publishing houses for standardized evaluation of students, educators, and institutions, traditional testing follows a typical objective format. The criteria measured in these numerous objective tests tend to be confined to knowledge and skills. In fact, the knowledge and skills determined to be important in the classroom are not even well represented in most standardized tests (Bickel 1994; Darling-Hammond and Lieberman 1992; Mitchell 1992; Neill and Medina 1989; Shepard 1989), which rely chiefly on lower cognitive skills (Kellaghan and Madaus 1991). Moreover, traditional testing is limited in measuring skills as well (Wiggins 1989).

Unfortunately, some educators maintain the assumption that knowledge and skills are the only viable criteria for measuring student outcomes in education. What they do not realize is that two additional criteria have been suggested as a necessary part of assessment: attitudes and behavior (Parnell 1990; Terenzini 1989). Cohen and Brawer (1969) suggested that behavior change in students is the ultimate criterion for measuring student learning. Although standardized tests are somewhat limited in measuring skill development, they may be almost useless in ascertaining student attitudes and behavior changes. Wiggins's (1989) medical analogy, comparing traditional testing to the use of pulse rate as a measure of a person's total health, is indeed apropos. Therefore, as long as we use traditional tests, we will have incomplete measurement.

Because those tests tend to be norm-referenced-comparing students and institutions with one another-they do little to enhance intended student outcomes (Wiggins 1989). Instead, students and educators have come to focus merely on the test scores, rather than using testing as a learning tool. Indeed, educators frequently teach students the test material itself (Brandt 1989; Haney and Madaus 1989; Mitchell 1992; Neill and Medina 1989; Shepard 1989; Worthen 1993). In an extreme example of this type of test preparation, it may even be possible for teachers to prepare students to pass tests that require reading, writing, and mathematics skills without actually developing these skills (Kellaghan and Madaus 1991).

Yet another pervasive inadequacy in standardized testing is the assumption that all students can be assessed using the same instrument (Brandt 1989; Herman 1992; Jervis 1989; Kellaghan and Madaus 1991; Maeroff 1991; Mitchell 1992). That notion is related to the harshest criticism usually leveled at standardized tests, that they are essentially biased (Feuer, Fulton, and Morison 1993; Haney and Madaus, 1989; Wiggins 1989, 1993; Zappardino 1995). Presuming that each student has unique experiences, background, and learning styles, no single instrument could realistically be sufficient to measure such individual development. Herman (1992) indicated that the traditional, standardized format is anything but good assessment. Yet, that traditional testing practice remains widespread (Darling-Hammond and Lieberman 1992; Kellaghan and Madaus 1991; Stewart 1993).

Alternatives to traditional objective testing may be similarly flawed. These alternative assessments include performance-based tasks, portfolios, journals, interviews, and attitude inventories. They have been criticized for needing excessive administration time, continual criteria adjustment, and repeated alteration of evaluation questions (Maeroff 1991). Performance assessment, in particular, has been condemned for lack of validity and fairness (Cizek 1991); for insufficient psychometric rigor and excessive expense, in currency as well as time (O'Neil 1992; Nuttall 1992); and for a lack of technical quality, standardization, and continuity among institutions (Worthen 1993). Because alternative assessment is still rather new, a lack of critical commentary from within has also been noted (Worthen 1993).

Perhaps the most serious problem with alternative assessments in connection with grading is the difficulty in maintaining consistency among different students (Popham 1995). Realistically, though, the process of grading, with its inherent subjectivity and arbitrariness, is perhaps the major problem itself. To alleviate such a problem, educators might consider the criterion of improvement or development by the student as an indicator of achievement. Belcher (1987) referred to this as "value-added assessment" (31). Based on the student's individual capabilities and prior progress, which are more readily available through alternative assessment measures, this classification of student growth should be recognized as more educationally responsible.

Meaningful Assessment

In order to ensure that any assessment activity used in schools is meaningful to the education process, attention to the purpose for each test and to the criteria being measured is necessary. Assessment conducted merely for accountability reasons is not instructionally sound. To be meaningful, the act of assessment must in some way enhance the learning process. Hence, the specific approach used to obtain meaningful assessment is likewise critical.

Establishing Purpose for Assessment

The key, of course, for any type of assessment activity is to initially establish the purpose or goals of the process (Chittenden 1991; Kellaghan and Madaus 1991; Wiggins 1989). Wiggins (1989) challenged educators to determine who should really be served by these tests: schools, the public, politicians, or the students themselves. (Even the needs of the staff members who enter grades into computers are sometimes considered to be primary [Wiggins 1989].) One might hope that the true benefactors of any assessment would be our students.

In addition to selecting the intended public to benefit from any given testing process, another logical consideration is the intended outcome. Consequently, if an educator is chiefly concerned with assigning arbitrary rankings to students, other teachers, or institutions for the purpose of complying with mandates for records, summative evaluation is certainly indicated. If we are concerned with the pragmatic utility of assessment, however, we must establish whether summative evaluation can in fact be meaningful to the learning process. Of course, psychologists have concluded that such forms of evaluation can stimulate motivation by recognizing and rewarding intended performance. Unfortunately, we tend to expect more from summative forms of assessment, using them to provide the basis for high-stakes decision making (Popham 1995), even though beneficial results of the high-stakes assessment agenda have not been overwhelming.

Other purposes for assessment include student placement, which is better served by diagnostic assessment, and the improvement of learning, instruction, or an institution, which generally requires a formative procedure. Intended to measure progress and suggest areas for improvement, formative evaluation is considered to be especially valuable to the education process (Kellaghan and Madaus 1991; Wiggins 1989). This broad view of assessment and evaluation is more inclusive than Popham's (1995) position, which created an unfortunate distinction between teacher evaluation and student grading. He proposed that formative and summative assessments are the domain of teacher evaluation and implied that student assessment applies to students only in the context of grading. Nevertheless, students can improve their learning behaviors, especially if given helpful feedback in the assessment process (Stewart 1993).

Selecting Criteria

Once the purpose has been identified, specific criteria to be measured-criteria that will assist the educator to meet the selected purpose-should be determined. For both diagnostic and formative evaluation, rather finite measurement criteria may be useful on occasion. All four criteria groups (knowledge, skills, attitudes, and behavior) are desirable-especially for summative evaluation. Although the collection of data relating to student behavior and attitudes may be plagued by limitations (Maeroff 1991), constructing instruments to measure these attributes is still possible. In fact, even quantitative measurement of student attitudes and behavior has been attempted successfully (Benson 1993).

Likewise, multiple sources and types of data are preferable for summative evaluation. Both traditional and alternative methodologies can be considered, as long as no one measure is used exclusively for any single evaluation. The use of multiple sources of data-as in the case of faculty evaluation (Cashin 1990)-will yield a more objective and truer picture of the student's development than traditional procedures (Brandt 1989; Mitchell 1992; Shepard 1989). That is especially true when the educator wants to take varying learning styles and strategies into consideration (Pintrich and Johnson 1990).

Such a holistic approach to gathering assessment data can be compared with the qualitative researcher's quest for a more complete understanding of the context. Just as qualitative research is validated by an exhaustive picture of the environmental context, evaluation can be similarly confirmed by collecting as much assessment data as possible. If we are indeed bound to the process of grading, a more complete collection of information on each student would offer more clarity and precision to educators, students, and parents. Grades and school records, after all, leave out important explanatory information as well as details on the significant achievements of students (Dixon 1989).

What is needed, essentially, is a new paradigm of student assessment that incorporates the four criteria of knowledge, skills, behavior, and attitudes and emphasizes the multiple measurement approach, which is basic to sound evaluation and qualitative research. Such a comprehensive view of assessment is crucial to the improvement of teaching and learning (Wiggins 1993). Once the purpose of assessment and the criteria to be measured are determined, a specific assessment technique may be selected.

Alternative Assessment Techniques

A wide range of options is available to the teacher who wishes to improve student assessment. What makes the following assessment techniques meaningful to the learning process is that they offer students the opportunity to recognize their progress and, more important, to discover what steps they can take to improve. Performance assessment. Common to disciplines that typically focus on performance anyway, such as the arts and athletics, performance assessment can be used for any objective that is skill-based or behavioral. This type of assessment basically involves asking students to do something. Because much of education relates to skill development, performance assessment can be adapted and implemented easily.

Authentic assessment. Designed to reflect the behavior and skills required in real-world situations, authentic assessment focuses on student task performance, and adds an aspect of relevance to the learning procedure (Herman, Aschbacher, and Winters 1992; Popham 1995). Intended to meet the needs of the individual learner (Puckett and Black 1994), authentic assessment gives students the opportunity to share the ownership of both the assessment and the learning processes (Zessoules and Gardner 1991).Thus assessment can be used to improve learning, and students share in the formative development of their own learning.

Portfolios. Often included in discussions of performance assessment, student portfolios offer a range of flexibility that makes this technique attractive to all disciplines. Containing student-selected examples of individual work and accomplishments, the portfolio can be designed to reflect work in progress or to highlight the student's completed efforts (Popham 1995). Similar to the teaching portfolio (Seldin 1991), the student version can be employed in either improvement of learning or in measuring achievement. As with many of the alternative assessment techniques, portfolios enable educators to address individual student differences and place much control of the assessment procedure in the hands of the student.

Journal, or learning log. Journals have been used in the English curricula for many years as a tool for increasing student writing and motivation for writing. The fundamental purpose of the journal as an assessment tool is to allow the student to communicate directly with the teacher regarding individual progress, particular concerns, and reflections on the learning process (Herman, Aschbacher, and Winters 1992). Obviously, this communication can easily be subsumed by the portfolio, but it can stand by itself as well.

Interview. As the name implies, the interview involves direct personal communication between teachers and students, parents, or other teachers (Herman, Aschbacher, and Winters 1992). Although talking with students may not seem like assessment, much of this type of communication-which can provide information about student thinking processes, achievement, and mastery-is in fact a type of student assessment (Stiggins 1994). Clearly a time-consuming process, this type of assessment can be accomplished more easily through the application of the qualitative paradigm, which allows for limited, but purposeful, sampling. Not bound by standardization, each interview may be specific to the particular student's needs. However, successful interviews require participants to be open, honest, and focused on the purpose of the assessment (Stiggins 1994).

Attitude inventory. Attitude measurement is intended to identify both favorable and unfavorable student feelings about school personnel and activities (Stiggins 1994). Although some students may not take such opinion surveys seriously, teachers can enrich the process by emphasizing known student concerns and demonstrating a serious intent to correct difficulties. Although student motivation can be influenced significantly by specific student attitudes, the consistent use of attitude inventories in the classroom is not common (Stiggins 1994). Certainly, this type of assessment technique can provide substantial benefits.

Many of the alternative assessment innovations have been devoted primarily to student summative evaluation. Yet, each of the student assessment techniques can be equally useful in developing both student learning and instruction (Erwin 1991). Even objective testing can enhance the instructional focus on student development if the outcome of such testing underscores the necessary steps for correcting errors in judgment and thinking. Of particular benefit is the use of computer-assisted testing with the opportunity for immediate feedback. Naturally, teachers do not need to depend on computer availability to teach students the importance of learning from their mistakes.

Models specifically designed to assist in formative evaluation have also been developed and demonstrate a significant capability for improving both teaching and learning. One such model is classroom assessment (Angelo and Cross 1993). Originally designed for improving college teaching and learning, the classroom assessment model can be adapted easily to the secondary classroom. Because the purpose of this technique essentially is to foster communication about the learning process between faculty and students, the secondary classroom is an especially ideal environment for its use. Through this model, teachers can discover how students are learning and what instructional techniques work best for them. Thus, instruction and learning can be refined. Each of the fifty assessment techniques are simple to learn, adapt, and implement. Reactions from faculty who have used one or more of the techniques have been overwhelmingly positive (Angelo 1991).

A similar type of procedure for gathering instructional input was developed by Weimer, Parrett, and Kerns ( 1992). Consisting of activities and forms designed to reflect on instruction, this manual can be implemented by a teacher just as readily as Angelo and Cross's techniques. The inventories and questionnaires are designed to elicit specific information from students, colleagues, and instructors; the information can be used to improve instruction, classroom climate, and entire courses. Like Angelo and Cross, Weimer, Parrett, and Kerns have designed their formative tools for college faculty, although the secondary applications are similarly obvious.

Coffman (1991 ) offered yet another approach she called small group diagnosis. This simple, four-step process is also intended to gather feedback from students during the course of a semester. With an outside facilitator in place of the instructor, students meet in small groups to construct responses relating to what they like about a course, what needs improvement, and what changes they suggest. The groups then meet together to collect their input and transfer their comments onto one form. Later the instructor meets with the facilitator to discuss the comments and possible changes in approach. Once changes are decided on, the instructor confers with the class about the process.

These approaches to formative evaluation all emphasize dialogue with students. Because a frequent complaint of secondary students is a lack of voice in classroom procedures (Goldstein 1995), the dialogue fostered by these techniques may provide additional benefits.

References

Angelo, T. A., ed. 1991. New directions for teaching and learning: Classroom research: Early lessons from success, no. 46. San Francisco: Jossey-Bass.

Angelo, T. A., and K. P. Cross. 1993. Classroom assessment techniques: A handbook for college teachers. San Francisco: Jossey-Bass.

Belcher, M. J. 1987. Value-added assessment: College education and student growth. In New directions for community colleges: Issues in student assessment, no. 59, edited by D. Bray and M. J. Belcher, 31-38. San Francisco: Jossey-Bass.

Benson, M. D. 1993. Behavior and attitude testing of Chicago junior high students. Phi Delta Kappan 74: 492-94. Bickel, F. 1994. Student assessment: The project method revisited. Clearing House 68: 40-42.

Bloom, B. S., J. T. Hastings, and G. F. Madaus. 1971. Handbook on formative and summative evaluation of student learning. New York: McGraw-Hill.

Brandt, R. 1989. On misuse of testing: A conversation with George Madaus. Educational Leadership 46(7): 26-29.

Cashin, W. E. 1990. Assessing teaching effectiveness. In How administrators can improve teaching, edited by P. Seldin, 89-103. San Francisco: Jossey-Bass.

Chittenden, E. 1991. Authentic assessment, evaluation, and documentation of student performance. In Expanding student assessment, edited by V. Perrone, 22-31. Alexandria, Va.: Association for Supervision and Curriculum Development.

Cizek, G. J. 1991. Innovation or enervation? Performance assessment in perspective. Phi Delta Kappan 72: 695-99. Coffman, S. J. 1991. Improving your teaching through small-group diagnosis. College Teaching 39: 80-82.

Cohen, A. M., and F. B. Brawer. 1969. Measuring faculty performance. Washington, D.C.: American Association of Junior Colleges.

Darling-Hammond, L., and A. Lieberman. 1992. The shortcomings of standardized tests. Chronicle of Higher Education 29 (Jan.): B l-B2.

Dixon, A. 1989. Deliver us from eagles. In Disaffection from school? The early years, edited by G. Barrett, 13-24. London: Falmer Press. Erwin, T. D. 1991. Assessing student learning and development: A guide to the principles, goals, and methods of determining college outcomes. San Francisco: Jossey-Bass.

Feuer, M. J., K. Fulton, and P. Morison. 1993. Better tests and testing practices: Options for policy makers. Phi Delta Kappan 74: 530-33.

Goldstein, S. 1995. Understanding and managing children's classroom behavior New York: John Wiley & Sons.

Haney, W., and G. Madaus. 1989. Searching for alternatives to standardized tests: Whys, whats, and whithers. Phi Delta Kappan 70: 683-87.

Herman, J. L. 1992. What research tells us about good assessment. Educational Leadership 49(8): 74-78.

Herman, J. L., P. R. Aschbacher. and L. Winters 1992. A practical guide to alternative assessment. Alexandria, Va.: Association for Supervision and Curriculum Development.

Jervis, K. 1989. Daryl takes a test. Educational Leadership 46(7): 10-15. Kellaghan, T., and G. F. Madaus. 1991. National testing: Lessons for America from Europe. Educational Leadership 49(3): 87-93.

Maeroff, G. I. 1991. Assessing alternative assessment. Phi Delta Kappan 73: 272-81.

Mitchell, R. 1992. Testing for learning: How new approaches to evaluation can improve American schools. New York: Free Press.

Neill, D. M., and N. J. Medina. 1989. Standardized testing: Harmful to educational health. Phi Delta Kappan 70: 688-97.

Nuttall, D. L. 1992. Performance assessment: The message from England. Educational Leadership 49(8): 54-57.

O'Neil, J. 1992. Putting performance assessment to the test. Educational Leadership 49(8): 14-19.

Parnell, D. 1990. Dateline 2000: The new higher education agenda. Washington, D.C.: American Association of Community Colleges.

Pintrich, P. R., and G. R. Johnson. 1990. Assessing and improving students' learning strategies. In New directions for teaching and learning: The changing face of college teaching, no. 42, edited by M. D. Svinicki, 93-102. San Francisco: Jossey-Bass.

Popham, W. J. 1995. Classroom assessment: What teachers need to know. Boston: Allyn & Bacon.

Puckett, M. B., and J. K. Black. 1994. Authentic assessment of the young child: Celebrating development and learning. New York: Merrill.

Seldin, P. 1991. The teaching portfolio: A practical guide to improved performance and promotion/tenure decisions. Bolton, Mass.: Anker.

Shepard, L. A. 1989. Why we need better assessments. Educational Leadership 46(7): 4-9.

Stewart, D. M. 1993. Standardized testing in a national context. In Higher learning in America 1980-2000, edited by A. Levine, 344-59. Baltimore: Johns Hopkins University Press.

Stiggins, R. J. 1994. Student-centered classroom assessment. New York: Merrill.

Terenzini, P. T. 1989. Assessment with open eyes: Pitfalls in studying student outcomes. Journal of Higher Education 60: 644-64.

Weimer, M., J. L. Parrett, and M. Kerns. 1988. How am I teaching? Forms and activities for acquiring instructional input. Madison, Wis.: Magna.

Wiggins, G. 1989. A true test: Toward more authentic and equitable assessment. Phi Delta Kappan 70: 703-13.

Wiggens, G. 1993. Assessing student performance: Exploring the purpose and limits of testing. San Francisco: Jossey-Bass.

Worthen, B. R. 1993. Critical issues that will determine the future of alternative assessment. Phi Delta Kappan 74: 444-54.

Zappardino, P. H. 1995. FairTest: Charting a course for testing reform. Clearing House 68: 248-52.

Zessoules, R., and H. Gardner. 1991. Authentic assessment: Beyond the buzzword and into the classroom. In Expanding student assessment, edited by V. Perrone, 47-71. Alexandria, Va.: Association for Supervision and Curriculum Development.

Author Affiliation: Jon E. Travis is an assistant professor and the director of the Center for Community College Education in the Department of Secondary and Higher Education, East Texas State University, Commerce.