skip to navigation skip to content
WCER - Wisconsin Center for Education Research Skip Navigation accessibility
 
School of Education at the University of Wisconsin-Madison

ABOUT WCER NEWS Events Cover Stories Research News International Research Press WHAT'S THE RESEARCH ON...? PROJECTS All Active Projects All Completed Projects PUBLICATIONS LECTURE SERIES PEOPLE Staff Directory Project Leaders ERG - EVALUATION RESOURCES GROUP RESOURCES Conference Rooms Equipment GRANT SERVICES GRADUATE TRAINING SERVICE UNITS Director's Office Business Office Technical Services Printing & Mail EMPLOYMENT CONTACT INFO MyWCER WORKSPACE LOGIN

   
Home > News > Cover Stories >
Aligning State-Level Standards and Assessments
Aligning State-Level Standards and Assessments

At an orchestra concert, a music critic evaluates the performance on the basis the composer put to paper.

At an employee's performance review, a supervisor evaluates the employee's performance by how well the employee measured up to the expectations outlined in the job description. 

In a classroom in a standards-based education system, the teacher evaluates student performance using assessments that align well with the established standards. Standards and assessments work together to guide the system toward students learning what they are expected to know and do.

But correspondence between state-level standards and assessments tends to be only moderate, particularly in terms of depth of knowledge and range of knowledge, according to a recent study by WCER researcher Norman Webb and colleagues at WCER's National Institute for Science Education (NISE).

Terms used: Standards refers to the most general expectations for a grade and content area. Goal refers to the next level of specificity of expectations. Objectives further delineate expectations stated as a goal.

Webb and colleagues analyzed the alignment of assessments and standards in mathematics and science in grades 3 though 10 and in four states. Members of Webb's 17-person team are affiliated with national level organizations including the Council of Chief State Officers, the National Research Council, and the National Center for Improving Science Education. The team's June 1999 report, jointly published by the NISE and the Council of Chief State School Officers, identifies the match between standards and assessments using the following four criteria:

  • Categorical concurrence: To what degree do standards and assessments address the same content categories? This criterion is met if the same or consistent categories of content appear in both documents.
  • Depth-of-knowledge consistency: What degree of depth or complexity of knowledge do standards and assessments require? This criterion is met if the assessment is as demanding cognitively as the expectations standards set for students. Webb and colleagues judged depth of knowledge at four levels:
    1. Recall of a fact, information, or procedure.
    2. Skill in using information, conceptual knowledge, procedures, two or more steps.
    3. Strategic thinking, requiring reasoning, developing a plan or sequence of steps, involving some complexity, having more than one possible answer, generally taking less than 10 minutes to do.
    4. Extended thinking, requiring an investigation, time to think and process multiple conditions of the problem or task, and requiring more than 10 minutes to do non-routine manipulations.
  • Range-of-knowledge correspondence: Does the span of knowledge a standard expects of students correspond to the span of knowledge that students need to correctly answer the assessment items or activities?
  • Balance of representation: This criterion indicates the extent to which assessment items are evenly distributed across learning objectives within a standard.

How the alignments rated

Alignment between assessments and standards varied, without any discernable pattern, across grade levels, content areas, and states. Assessments and standards of two of the four states satisfied the categorical concurrence criterion, that is, if an assessment had at least six items measuring content from a standard. (The number six is based on estimating the number of items that could produce a reasonably reliable scale for estimating students' mastery of content on that scale.) Two of the four states lacked a sufficient number of assessment items measuring content knowledge for more than one-quarter of the standards. Even with a high number of assessment items being used at a grade level, some states have distributed their items unevenly so that one-fourth or more of the standards had fewer than six items measuring knowledge related to each of these standards.

Alignment was weak on depth-of-knowledge consistency. For this analysis, at least 50 percent of the assessment items corresponding to a learning objective had to be at or above the level of knowledge of the learning objective. A high percentage of the state assessments used items that were less demanding than those of the corresponding objectives.

The lowest degree of alignment was found on range-of-knowledge correspondence. At least 50 percent of the objectives within a standard had to have a related assessment item or activity to be judged acceptable. This cutoff for acceptance is based on the assumption that students' knowledge should be tested on content from over half of the domain of knowledge for a standard. Only one of the four state standards evaluated attained a high degree of range-of-knowledge correspondence for at least two of the analyses completed across all grades and content areas - grade 8 mathematics in that state had 86 percent of the standards meet this criterion and grade 10 mathematics had all of its standards meet this criterion. For the other states, items were generally clustered among a few of the objectives rather than covering the full range of objectives within a standard. As a consequence, many of the tests measured students' knowledge of only a small proportion of the full domain of content knowledge specified by the standards.

Most of the assessments and standards analyzed were aligned according to the balance of representation. Webb offers the explanation that the items were generally distributed among the corresponding objectives without a disproportionate number measuring any one objective.

Conclusion

A major goal of Webb's study was to develop a valid and reliable process for analyzing the alignment between standards and assessments. The process distinguishes among the different attributes of alignment and detects specific ways that alignment can be improved.

During this project reviewers were not given extensive training at the beginning of the institute because they were to actively engage in designing and clarifying the process as they analyzed the standards and assessments state by state and grade by grade. Afterwards reviewers said they could have benefited from more clarification of the four depth-of-knowledge levels (recall, skill in using information, strategic thinking, and extended thinking). They expressed a desire for further guidelines for identifying what knowledge is measured by an assessment item and what range of knowledge a student is expected to exhibit as expressed in a standard, goal, or objective. Reviewers wanted more specific rules and limits for coding an assessment item as being related to more than one objective.

Webb's study refined the procedures for determining degrees of alignment, making them more standardized and useful, so that states and districts can better use them to understand the agreement between their own standards and assessments.

For more information, see Norman L. Webb, "Alignment of Science and Mathematics Standards and Assessments in Four States," published jointly by the National Institute for Science Education and the Council of Chief State School Officers, June 30, 1999 (Monograph No. 18).