NSBA's Center for Public Education issues report on redesigning teacher evaluation systems

From the NSBA Center for Public Education

To see the full report, click here Teacher Evaluations

Building a better evaluation system: At a glance

The push to change teacher evaluation systems, and especially to include statistical measures of teachers’ effect on student learning, is here. In 2005, 13 states were able to link teachers to their students’ performance data; in 2010, 35 states were able to do so and the number is expected to grow. The Obama administration’s Race to the Top (RTTT) effort urged states and districts to use this teacher-student link in teacher evaluations in order to be eligible for grants. In response, 17 states reportedly changed their evaluation systems to improve their chances of receiving RTTT funds. Private foundations like the Gates Foundation have also used their resources to examine teachers’ effectiveness and encourage the use of such measures. Clearly, this is a fast-moving train that will likely affect many if not most school districts eventually.

In order to prepare, here’s what you should know:

-The current system is lacking. Current evaluation systems fail to identify the true variation in teacher effectiveness by rating all but a few teachers as “satisfactory.” One study of teacher evaluation systems nationwide found that only 1 percent of teachers are evaluated as “unsatisfactory.” In districts that use multiple evaluation levels, only 6 percent of teachers rate below the top two categories. Other research proves that there’s huge variability among teachers, even within schools, but it’s hidden by inadequate evaluation tools. Until now, most evaluation systems have not been linked to any measure of students’ real learning, partly because data systems have not been in place. A recent quote by Randi Weingarten, head of the American Federation of Teachers, sums it up: “As important as evaluation is to assessing teacher performance, what passes for teacher evaluation in many districts frankly isn’t up to this important task.”

-Improving teacher effectiveness can dramatically impact student learning. Research has shown that teachers have the single greatest impact on students’ performance, more than family background, socioeconomic status, or school. By improving teacher effectiveness, districts could improve student achievement and save money at the same time, because they would be able to identify ineffective teachers early and provide them with appropriate support, rather than having to replace struggling teachers who leave the profession because of a lack of assistance. Designing and implementing a quality teacher evaluation system – one that identifies strong teaching where it exists and targets interventions where they’re needed for improvement — would take additional funds and careful thought, but the benefits would be significant.

-Value-added models have flaws, but are much better than the system we have now. The fairest way to identify strong teaching is through a system that looks at student gains. Value-added models, which work to isolate the impact a teacher has on his or her students’ achievement from other factors, are the latest refinement of such a system. However, value-added models have come under intense scrutiny and criticism, and the criticism needs to be considered. Most importantly, value-added scores, while better than other measures, still fluctuate enough that people question their precision. For instance, multiple studies have found that among teachers ranked in the top 20 percent of effectiveness one year, about a third of them were still in the top 20 percent the following year, although the vast majority stayed in the top half. The wide fluctuation shows that some of the difference in year-to-year scores was due to statistical imprecision instead of an actual change in the teacher’s effectiveness. However, while imprecision is a concern, the variation in scores should be considered against the current evaluation system, which almost certainly misidentifies many ineffective teachers as “satisfactory.” One study that compared teachers’ instructional practices to value-added scores concluded “[Value-added scores]…seem to be capturing important differences in the quality of instruction.” Another study found that value-added scores were useful in predicting which teachers would be successful in the future. As long as they are used in concert with other methods of evaluation, value-added scores provide a useful insight into teachers’ impact.

-Statistical measures are used to evaluate people in other industries effectively. Using imprecise statistical measures in evaluations is a generally accepted practice in fields outside of teaching. Major League Baseball, for instance, bases its million-dollar salary decisions largely on a player’s statistics, which can vary from year to year about as much as teachers’ do in value-added models. Other professions evaluated on similarly imprecise year-to-year measures include realtors; investors’ rate of return; utility company repairmen; and others. Value-added models should not be compared to a criterion of perfection, but whether including value-added models as part of a comprehensive teacher evaluation system would be an improvement over what is in place now.

-There are ways to improve value-added models. The more years of data are used, the more precise value-added models become. For instance, the chance of misidentification drops by 10 percentage points when three years of data are used instead of one. Better state assessments, and aligning the assessments to what is taught, could also improve value-added models.

-Multiple measures are the way to go. Virtually all researchers advocate using value-added data as one of multiple measures when making decisions about teachers. Using traditional measures, such as classroom observation, along with value-added data will present a fuller, more accurate picture of a teacher’s true effectiveness. In current formulas that use value-added models, the value-added score generally accounts for 25 to 50 percent of the total rating. Which measures to use and how much weight to put on each are decisions best made locally based on data, resources available, and the district’s goals for the teacher evaluation system.

For board policy issues, click on the link above to the full report.

← BACK
Print This Article
View text-based website