Teacher Evaluation: Lesson One

Let’s begin at the end, with final scoring.

We all know that New York State created an evaluation system for New York City on June 1. And we also know that that there are three subcomponents in the new system: a state learning measure (worth 20 points) a local learning measure (worth 20 points) and an Other Measure, which, for ease, I’ll call observations (worth 60 points). We know also that the state laid out the broad framework for how learning measures and the new observation system will be implemented in our schools. But while teachers are becoming more familiar with the individual measures, they are far less familiar with how it all adds up once the year is done.

So let’s begin at the end with the cut scores the commissioner imposed. Cut scores are the numerical breaking points between levels (between Developing and Effective, for example). There are cut scores within each subcomponent as well as within the range for the final rating, which goes from 0 to 100. The subcomponent cut scores for NYC are different – and more favorable – than the cut scores the rest of the state has to use. First, let’s look at NYC’s cuts.

For NYC, the state has assigned a certain number of points to each level of performance in all three subcomponents. Thus, a teacher who is Effective in all three subcomponents receives at least receives 15 points for the state (Comparable) measure, 15 points for the local measure, and 45 for the observation measure. This teacher would wind up with an overall score of at least 75, and be considered Effective overall. If that same teacher had received, say, 53 points for observations, but fewer points on the learning measures (for example, an Ineffective/11 and a Developing/14), the overall score would still be Effective (78).

So, you may wonder, just how do teachers earn their 11, or 15, or 50 points? The short – but very important – answer here is that they do not earn them, or at least not directly. Rather, each subcomponent has its own way of expressing a meaningful result, and that expression has nothing to do with 0-20 or 0-60 score ranges. An observation result might show which of the four levels of the Danielson rubric best reflects your teaching. A learning result might indicate the percentage of your students who met their targets. Those different kinds of results from different aspects of teaching have to be converted into a common language, and that language is the 0-100 scale.

Comparing NYC’s Cut Scores To the Rest of the State

Understanding the scores as conversions from something else is crucial, especially in light of some blog posts that surfaced last week, wherein the writers – after comparing the NYC-imposed cut scores to the cut scores mandated throughout the state – came to the perfectly understandable, but entirely erroneous conclusion that NYC teachers were thoroughly and completely screwed.

The reality is just the opposite. The cut scores in force throughout the rest of the state are the problem; the NYC cuts are actually the fix.

So let’s compare. First, here are the statewide cuts. Note in particular the circled number.

And here, again, are our NYC scoring ranges:

As the circle indicates, a teacher in NYC who is rated as Developing receives at least 13 points for the state measure. The same is true for the local measure.

Anywhere else in the state, however, that same teacher would only receive 3 points.

In other words, teachers in New York City earn 10 more points than do teachers elsewhere for the same level of performance (that is, for the first rung of Developing).

How the Conversions Work

Of course, if you don’t understand that the 0-100 scale is a conversion, you might get a little freaked out, and that’s what happened with a lot of bloggers. What they wanted to know was why NYC teachers have to earn a whole 13 points just to get past Ineffective, when everywhere else in the state, they have to earn just three. It’s an understandable question but, again, teachers don’t earn points; rather, the results they earn can be converted into either 13 points (in our world) or only three (in theirs).

So, let’s say that a teacher’s learning results are based on the percentage of students who meet their learning targets. Many districts use such a method for determining learning measures, and NYC will be one of them. Other districts can negotiate exactly what constitutes performance at the first level of Developing, but no matter what they negotiate, the conversion remains at three points. Let’s compare NYC’s conversion to that of the four so-called model districts that use the same system:

District	If…	…then the teacher is…	…which converts to..
NYC	60% of students meet their target	Developing	13 points
Pembroke	65% of students meet their target	Developing	3 points
Syracuse	80% of students meet their target	Developing	3 points
Kings Park	50% of students meet their target	Developing	3 points
Jamesville	50% of students meet their target	Developing	3 points

In these districts, teachers will be rated as Developing if somewhere between 50 % and 80% of their students meet targets. In NYC, teachers with similar success get more points in the conversion.

Why the Special Scoring Bands?

So why was the state so generous with NYC? Actually, it’s got nothing to do with generosity. Like I said before, these cut scores are a fix.

You’ll see this if you look once more at the statewide cut scores, and particularly at the composite scores. A teacher needs at least 65 points in order to be rated as Developing. So look at what happens to a teacher who is considered Developing in all three subcomponents. Since teachers who are considered Developing in both learning measures can be assigned as few as six points (three in each) in the statewide cut scores, a teacher who is also Developing in the observation measure would need to be assigned 59 points (because 65 minus 6 equals 59) in order to be rated Developing overall.

In other words, teachers rated as Developing in all three measures would wind up as Ineffective overall if they received fewer than 59 of the 60 points in the observation measure, just because of a scoring anomaly. Districts could negotiate whatever they wanted for the 60 points, but 59 wasn’t possible. After all, if 59 points get assigned to a Developing teacher, how many points would go to teachers who were Highly Effective? But on the other hand, if the district dropped Developing to, say, 58? Then some Developing teachers would get only 64 points overall (58 plus six), and be labeled Ineffective.

And the problem wasn’t just at that Developing cut – it showed up throughout the scoring ranges. Ultimately, depending on the observation bands the districts used, teachers rated DDD in the subcomponents – or even EDD – could potentially find themselves thrown into the category of Ineffective overall, unless the cuts were fixed.

For smaller districts that might not matter, since the vast majority of scores would add up just fine, so long as other districts were smart about where they set the cuts. But in a city the size of NYC, hundreds or even thousands of teachers may have been labeled “Ineffective” regardless of their substantive results. The actual number would be impossible to predict, but in anticipation of that problem, the state gave us cut scores that guaranteed that the meaning of the Final Rating would reflect the meaning of the subcomponents from which it was derived.

One last question remains: Why did the state set the problematic cut scores for the rest of the state in the first place? Basically, the state wanted cuts that would make it impossible for teachers who were Ineffective in two different learning measures to overcome that rating, even if that teacher received all 60 of the observation points. The statewide cuts do that, but they also create the potential for many additional Ineffective that simply can’t be justified in any system. For NYC, all of those additional Ineffectives have been eliminated, but the state has still included language to the effect that a teacher who was rated as Ineffective both learning measures must be considered Ineffective overall, regardless of the score. That is an outcome that would have affected us in the previous system and would affect us in this one as well.

I started the previous paragraph saying that one question remains, but of course that is laughable. I know that this post is a far way from covering all the questions teachers have about the new evaluation system. Future posts are on their way.

But whatever pros and cons we may find in other aspects of the new evaluation, the final score cuts are an improvement. It was a change that only the state could make – and it did.

Latest Images

Trending Articles

Latest Images