Vol. 3, No. 2 - August 1998

Measuring Progress Toward Equity in Science and Mathematics Education

By Marry M. Kennedy

 

 

 

 

Benefits of teacher professional development for students
have too seldom been examined

 

There is a much-maligned event in education called the one-shot workshop.  This event has been criticized by virtually every teacher who has ever participated in it and by virtually everyone else even vaguely interested in improving teaching.  Researchers and policy analysts, critical of the one-shot workshop, have generated a number of proposals for how continuing education programs for teachers should be organized, arguing that they be lengthy rather than brief, that teachers have a role in defining the content rather than having the topics imposed on them, that the scheduled meetings be interspersed with classroom practice rather than concentrated into a short period of time, or that they allow teachers to work in groups, rather than in isolation (Corcoran, 1995; Goldenberg & Gallimore, 1991; Little, 1993; Loucks-Horsley et al., 1998).

There is a common-sense appeal to these ideas.  It makes sense that, if you really want to alter teaching practice, you need more than a 2-hour workshop.  But the ultimate benefits of these recommended changes have seldom been examined. 

This brief examines these contentions by reviewing studies of professional development that examine benefits to students.  A major finding from this review is that program content, what is being taught (e.g., management strategies, knowledge of how students learn specific school subject matter), is an important predictor of benefit to students.  This finding should not be a surprise, but it is a surprise in light of the literature describing optimal professional development: This literature does not address content as much as it addresses program form and structure. 

 

 The Literature

From the total pool of 93 studies found that examined the effectiveness of various approaches to continuing teacher education in either mathematics or science, only 10 included evidence of benefits to students.  The paucity of evidence for how these programs ultimately benefitted students is itself an important finding. 

Table 1

Studies Included In This Review

 

Citation

 

Subject Matter Context

 

Grade Span of Participating Students

 

Source of

Participants

 

 

Form and Distribution of Contact Time

 

Total  Contact Hours*

 

Study Duration

In Months*

Category 1: Content Focus is on Teaching Behaviors that Apply generically to All School Subjects

Stallings and Krasavage (1986)

Math

Math 2-4

-4 school-wide projects

Distributed workshops

 

16 month

 

16 month Stevens and Slavin (1995)

Math

Math K-6

K-6 school-wide projects

Distributed workshops

 

8 months

 

Category 2:  Content Focus is on Teaching Behaviors that Apply  to a Particular Subject

 

Good, Grouws & Ebmeier (1983)

 

Math

 

4-12

 

individual volunteers

 

2 @ 1.5

 

3

 

4

 

Good & Grouws (1979)

 

Math

 

 4

 

individual volunteers

 

2 @ 1.5

 

3

 

4

 

Mason & Good (1993)

 

Math

 

4-6

 

individual volunteers

 

3 @ 1.5

 

4.5

 

5

 

Otto and Schuck (1983)

 

Science

 

8

 

individual volunteers

 

5 @ variable

 

16

 

2.5

 

Rubin & Norman (1992)

 

Science

 

6-9

 

individual volunteers

 

Univ. course

 (10 @ 3)

 

30

 

3

 

Lawrenz & McCreath (1988)

 

Science

 

1-8

 

individual volunteers

 

Univ. course

 (15 @ 3)

 

45

 

8

 

Marek and Methven (1991)

 

Science

 

1-5

 

individual volunteers

 

4 wk Summer Institute

 

100

 

8

  Category 3:  Content Focus is on Curriculum or Pedagogy Justified by How Students Learn

 

Cobb et al (1991)

 

Math

 

 2

 

individual volunteers

 

1 wk Sum. Inst. +  Distributed

 

150

 

8

 

Wood and Sellers (1996)

 

Math

 

2-3

 

individual volunteers

 

1 wk Sum. Inst. +  Distributed

 

150

 

16

  Category 4:  Content Focus  is on How Students Learn and How to Assess Student Learning

 

Carpenter et al (1989)

 

Math

 

 1

 

individual volunteers

 

4 wk Summer Institute

 

80

 

 

Table 1 groups studies according to program content.2  The four categories include, respectively:

          Programs that prescribe a set of teaching behaviors that are expected to apply generically to all school subjects.  These behaviors might result from process-product research or might include things like cooperative grouping.  In either case, the methods are expected to be equally      effective across school subjects;

Technically, category 1 programs are not aimed specifically at mathematics or science education but instead offer a set of ideas that are presumed to be applicable to all school subjects. However, because such programs constitute a large fraction of professional development, and because such programs typically include mathematics test scores in their portfolio of outcomes, two studies which illustrate this line of work are included.  

          Programs that prescribe a set of teaching behaviors that seem generic, but are proffered as applying to one particular school subject, such as mathematics or science.  Though presented in the context of a particular subject, the behaviors themselves have a generic quality to them, in that they are expected to be generally applicable across all topics in that subject;

           Programs that provide general guidance on both curriculum and pedagogy for teaching a particular subject, and that justify their recommended practices with references to knowledge about how students learn this subject, and

          Programs that provide knowledge about how students learn particular subject matter but do not provide specific guidance on the practices that should be used to teach that subject.

The programs being examined in these studies also differ along many of the dimensions that reformers care about: duration, intensity, focus on individual teachers versus school-wide focus and so forth. One difference that also needs to be attended to, however, is the duration of the study itself.  Some of the studies followed students for an entire school year or longer, while others followed them for only a semester or less.  Longer study durations can reduce apparent program effects because they increase the likelihood that other events ( staffing changes, fire drills or other traumas, other policy changes, etc) will disrupt program influences.  Thus as we examine the findings from these studies, we need to be wary of findings that are based on short-term studies: Their program effects may appear larger than those of longer-term studies, not because of differences in program quality but instead because of differences in the length of the study.

One difference among these categories is especially important, and that is their tacit model for how they expect their  programs to eventually influence student achievement.  Underlying these different approaches to continuing professional education are different assumptions about the path between the program and its eventual effects on student learning.   Figure 1 illustrates these differing sets of assumptions.  Programs in categories 1 and 2 expect their programs first to change teacher behaviors, and expect that these behavioral changes will, in turn, lead to student learning.  Those in categories 3 and 4, on the other hand, expect their programs to first change teacher knowledge; they tend to be relatively less prescriptive about teaching practices.  The category 3 program provides teachers with knowledge about how students learn mathematics, with some curriculum materials, and with some ideas about new practices that will better promote student learning.  The program in category 4 focuses even more narrowly on teacher knowledge, specifically knowledge of how students learn particular mathematical ideas.  These program developers do, of course, expect teaching practice to change, but instead of prescribing the details of the new practice, they assume that changes in teacher knowledge will stimulate teachers to devise their own new teaching practices which will, in turn, lead to student learning. 

These four categories of program content, then, reflect a continuum from more prescriptive to more discretionary, and from more focused on behavior to more focused on ideas.



 
Programs Aimed at Improving Student Learning in Mathematics

 

Figure 1:  Three Paths to Student Learning 

For both of the illustrative studies in category 1, contact time was extensive and distributed throughout the school year, teachers received in-class visitations, and the programs worked with whole schools rather than individual volunteers. Thus, these programs represent the kind of professional development that has been recommended.

The category 2 studies focusing on mathematics consist entirely of programs sponsored by Tom Good and his colleagues (Good & Grouws, 1979; Good, Grouws, & Ebmeier, 1983; Mason & Good, 1993) and all are variations of the Missouri Mathematics Model.  These  program are typically very brief, consisting of just two 1½ hour sessions during which the specific recommended teaching behaviors and their rationales are explained.  Teachers also receive a manual with more detailed discussion of the Missouri Mathematics Model.

The programs in categories 3 and 4 differ considerably from those in category 2, and are similar to one another in their theoretical orientations.  Both are interested in student cognition, both assume some form of constructivist theory of learning, and both are interested in increasing teachers’ attention to problem solving and reasoning in place of recall of computational procedures.  Category 3 studies both focus on a single program, the Problem-Centered Mathematics Program.  This program provides teachers with knowledge about student learning and thinking in mathematics, gives them mathematics problems that are designed to be challenging for students at the grade level they teach, and gives them a class discussion format that encourages thoughtful engagement with these problems.  There is just one study in category 4.  It examines the Cognitively Guided Instruction program, which is similar to the Problem-Centered Mathematics Program in its orientation to mathematics, but focused more on the particular mathematical content that students learn in the relevant grade levels and on the particular kinds of difficulties they are likely to have in learning this content.  It does not define what teachers should do with this knowledge.

Table 2 shows the size of program effects on student achievement in mathematics that were found in each of these categories of studies.  Each number indicates the size of the program effect in standardized units relative to a comparison group.   With the exception of category 1 programs, which worked with whole schools, all studies involved teachers who volunteered to participate and who were randomly assigned to experimental and comparison groups.

                                                                             

Table 2

     Average Standardized Effect Sizes Achieved in Mathematics Studies

  Study Basic Skills   Reasoning,
Problem
Solving
  Attitudes
toward Math
Category -.14 .10  
Category 2  .17   .05  
Category 3 .13 .50 .13
Category 4  .52 .40  

Researchers tended to measure three types of effects: basic skills, advanced reasoning, and attitudes toward the subject.  Basic skills were generally assessed with traditional standardized achievement tests, and researchers devised their own procedures for assessing advanced reasoning and attitudes.  Table 2 suggests that programs in categories 3 and 4 tend to demonstrate greater gains in reasoning and problem solving as well as comparable or greater gains in basic skills.  Even in basic skills, the smallest program effects were in category 1 and the largest appear in category 4.  This pattern of outcomes suggests that the content of  programs does indeed make a difference, and that programs that focus on subject matter knowledge and on student learning of particular subject matter are likely to have larger positive benefits for student learning than are programs that focus mainly on teaching behaviors. This pattern is particularly striking in light of the fact that the two programs in category 1 more closely approximate the ideal in terms of form and structure than do the programs in categories 3 and 4.

Why do the category 3 & 4 programs have greater effects on students?   Several hypotheses have been suggested.  One early hypothesis was that teachers in categories 1 and 2 could not improve their mathematics teaching because they did not have adequate subject matter knowledge.  However, the more successful programs in this review were not providing subject matter knowledge per se, but rather knowledge about how students learn subject matter knowledge.  No doubt teachers acquired some subject matter knowledge along the way in these programs, but this was not the central focus of programs in either category 3 or category 4.

Another hypothesis is that by giving teachers a greater understanding of how students learn, programs in categories 3 and 4 enable teachers to continue to develop and refine their own practices.  That is, it is the lack of prescriptiveness that makes this knowledge valuable.  In contrast, the Madeline Hunter program (category 1) and the Missouri Mathematics Model (category 2) both prescribe virtually invariant daily routines.  Though not so rigid, there is also a recommended pattern for classroom activities and a recommended set of learning activities in the Problem Centered Mathematics Program (category 3) as well.  The Cognitively Guided Instruction program provided teachers with the least amount of specific information as to what they should do in their classrooms and with the most specific information about the mathematics content they would be teaching and how students learn that content.

 

Programs Aimed at Improving Student Learning in Science

The four science studies that provided student outcome data fell entirely into category 2: They claim to offer teachers techniques that are uniquely suited to science teaching, but the techniques themselves still have a generic character, in that they do not depend on the particular science content being taught.  For instance, Rubin & Norman taught teachers to model particular science processes such as generating hypotheses, identifying and controlling variables, and defining things operationally.  During their  program, they used  generic lesson formats to train teachers in how to model each of these skills.  Modeling the skill of “identifying and controlling variables” consists of asking aloud such questions as, “What is the manipulated variable in this experimental situation?”

However, the category 2 studies in science reflect two different models of teaching.  To reflect this difference in program content, Table 3 provides two sub-groupings within its category 2 programs. 

Table 3

Standardized Effect Sizes Attained in Each Science Study

CATEGORY 2 Basic Skills   Reasoning,
Problem
Solving
  Attitudes
toward Math

Category 2—Focus on Behaviors that Apply to this Particular Subject

 (a) Modeling as a Teaching Strategy   .71  
(b) Learning Cycle as a Teaching Strategy   .43 .15

The effects shown in Table 3 are larger than their counterparts in Table 2, a difference that probably reflects greater alignment between instructional content and assessment content.  Because science curricula in American schools are less standardized than mathematics curricula, and because science content is not normally included in standardized achievement tests, science researchers are more likely to devise their own curriculum materials and their own outcome measures. This was the case in these studies.  Consequently, there is likely to be a greater articulation between the content taught in participating “treatment” classrooms and the content assessed by the science researchers than is the case in mathematics programs.

Like Table 2, Table 3 appears to suggest that program content matters.  It suggests that programs focusing on scientific processes had greater effects than those focusing on the learning cycle.  However, the two studies that taught teachers to model scientific processes were extremely brief, extending only 2 ½ and 3 months, respectively, whereas the two studies that taught students the learning cycle were full-year studies.  Consequently, differences in effects that appear to reflect program content differences could be a function of study duration rather than program content.

 

The Relevance of Program Form and Structure

The  programs examined in this small body of research represent a variety of program structures, and this variety enables us to examine the merits of several hypotheses about critical features of continuing professional education.  The patterns suggest that program content is a central predictor of benefit to students.  They also suggest that many other program dimensions are less reliable producers of benefits for students.   Briefly, in this small sample of studies, we can conclude that:

          Differences in total contact hours were unrelated to student outcomes.  Programs in category 1 provided far more contact hours than programs in category 2, and yet had smaller effects on student learning.  Similarly,  the category 3 program provided more contact hours than the category 4 program did, and yet did not yield a noticeable advantages for students.

          Evidence was mixed for the benefits of distributed time.   The studies in mathematics did not support this hypothesis, for the mathematics program with the most substantial overall influences on student learning consisted of a summer institute with no distributed seminars during the next academic year.  Conversely, the one program that demonstrated negative effects on student learning, the Madeline Hunter program studied by Stallings and Krasavage,  provided both seminars and in-class visitations throughout the school year.  Studies in science, on the other hand, offer some support: One of the studies that focused on the learning cycle provided a concentrated summer institute, while the other provided a university course with sessions distributed across a full school semester.  The distributed program appeared to produce greater benefits to students than did the concentrated summer institute.

          None of the programs that provided in-class visitations produced noticeably greater benefits to student learning.

          The fact that the category 1 programs—those working with whole schools—demonstrated the smallest influences on student learning among these studies suggests that providing services to whole school staffs may not be the most important feature of continuing professional education for teachers.  However, it is likely that whole school programs involve at least some teachers who did not volunteer to participate, and this fact may reduce the apparent program benefits.

 

Summary and Conclusion

Based on the studies reviewed here,, a strong case can be made for attending more to the content of continuing professional education and for attending less to the structural and organizational features of such programs.  In these studies, programs whose content focused mainly on teachers’ behaviors demonstrated smaller influences on student learning than did programs whose content focused on how students learn particular subject matter.  These more successful professional development programs were not simply courses in mathematics or science, but instead were about what to teach and how students learn that subject matter.  Cohen and Hill (1998), in their study of California mathematics reform, also find that the content of professional development is important.  The programs in categories 3 and 4 were very specific in their focus.  They did not address generic learning, but instead addressed the learning of particular mathematical ideas.

An equally important finding from this review is the lack of clear benefit of several popular structural program features.  The programs reviewed here differed in the total number of contact hours they provided teachers, in whether or how that time was distributed, in whether that time included in-class visitations, and in whether teachers participated as members of whole schools or as individuals.   The reason for the lack of clear benefit for these program dimensions is likely related to the important role of program content:  A program whose content is not valuable will not be improved by increasing the number of contact hours, distributing contact hours over time, providing in-class visits, and so forth.  Structural features alone provide no guarantee of improved teacher learning or of eventual benefit to students.  What is still unclear, however, is whether, given important content, these structural features of programs might further enhance the program's benefit to students.  The central message from these studies, though, is to attend to content first, before attending to structure.

While the findings presented here suggest that the focus of professional development, content, should be attended to first before form and structure, they also suggest that effective professional development in mathematics and science treats teachers as professionals.  Reform advocates have argued that teachers will profit more from knowledge and insights which they can develop in their own ways than from prescriptions that give them little practical leeway, and the pattern of program effects shown here suggests that these reformers are right.

 

[1] This Brief is a summary of material in a research monograph by Mary Kennedy (1998), Form and substance in inservice teacher education (Research Monograph No. 13). Madison: University of Wisconsin–Madison, National Institute for Science Education.

2 References for the 12 studies in Table 1 are provided at the end of this Brief.

 

                                                   Studies Reviewed