Teacher evaluation is about relationships and learning, not about scores

Long ago, I co-wrote a policy paper advocating for a teacher evaluation system that acknowledges that evaluation is a conversation that requires the context of a professional learning community, with input not only from the administration, but furthermore one’s peers.

While a pre and post-conference is included in most current eval systems, the reality is that the focus is on 1) compliance (paperwork), 2) the stakes/consequences attached to that paperwork, and 3) the demands of a very subjective rubric, rather than on the practices and content that will move learning forward for students.

So it should come as no surprise that few teachers are rated poorly by their principals. These systems have become all about summative evaluation, rather than formative feedback, and thus have lost sight of the real purpose of the system in the first place — to improve teacher practice and student learning. Effective principals will use the system to have those conversations — but they won’t rate their teachers poorly on paper unless they are intent on pushing them out the building.

Research shows: Elect Democrats to fight segregated schools

Partisan tensions between individualism/choice and systems/regulation in action.

Andy Rotherham argues against safe spaces

“. . . challenging people to become bigger than themselves is at its core an act of respect and love. Shielding them from challenge, especially in their most formative years, is fundamentally deeply disrespectful to them and their education.”

He’s talking about higher ed. But this also applies–arguably, even more importantly–in K-12.

John King and Arne Duncan plead for sanity in regulations to protect students

“Protecting students and taxpayers shouldn’t be a partisan political issue.”

It shouldn’t. Unfortunately, however–in our country, in these times–it is.

The Problem with Robot Teachers

“I . . . worry that we’re slowly evolving toward a system where the affluent get that kind of education and the poor get automated schooling.”

A middle school in the South Bronx harnesses the power of testing & practice

This Bronx school is applying what we know from decades of research: repeated quizzing and practice of key skills and concepts, spaced out over time, transfers learning into long-term memory.

Kudos to MS 343. When you think about just how much of an outlier this approach is, it’s pretty disturbing. Most schools do not have a coherent and systematic approach to what they teach, nor consider how they are reinforcing what is most essential to learn across grades and classrooms.

Speaking of practice, here’s 10 teaching techniques worth practicing

This is a useful list of a few pedagogical methods worth spending time mastering from UK educator Tom Sherrington, which are based on Deans for Impact’s advice for deliberate practice.

NYCDOE is pressing ATRs into schools

Dan Weisberg writes an op-ed in The 74 against the move, claiming that “Principals would go back to hiding vacancies and would justifiably argue that they can’t be held accountable for student learning if they don’t get to pick their teams.”

His claim appears to be justified, as a recent Chalkbeat article reports:

“I’m going to make sure my school doesn’t have a vacancy,” said one Bronx principal who wished to remain anonymous due to the sensitive nature of the topic. “I’m not going to post a vacancy if someone will place an ATR there. I’ll be as strategic as I can and figure out another way.”

I think Weisberg’s suggestion makes much more sense: set a time limit on how long someone can be in the ATR pool.

Randi Weingarten calls Devos’s brand of choice what it is — but what is her union doing to fight segregation?

I think Weingarten is pointing out an inconvenient truth by calling vouchers a “polite form of segregation,” given their history and the folks that most typically foam at the mouth over them.

But I do wonder what exactly she and her union are doing to fight segregated schools. Public schools are doing plenty on their own to contribute to segregation without any consideration of charters nor vouchers.


School Climate Matters

A classroom in Guipuscoa

Chalk up more research confirming what-we’ve-been-saying-all-along here at Schools & Ecosystems: a school’s learning environment impacts student learning.

In case you don’t know, NYC has been collecting what folks call “school climate” data via surveys administered to teachers, parents, and students since 2007. It’s important information to have about a school–arguably more important, to my mind, than test scores (I believe both should be considered).

Last July, I had quoted Match Education’s Mike Goldstein asking an important question about all this data:

Is anyone aware of scholars and reporters digging deep into this data set?  Is there any other data set in the USA just as good?

I think it’d be hugely productive to identify NYC schools which have made progress in “Total Climate” — and then study why.

Well, Mike, you’ve got your answer.

NYU’s Research Alliance for New York City Schools published a study using NYC’s school climate information that demonstrates that a school’s learning environment not only impacts student learning, but furthermore teacher retention. As Chalkbeat NY’s Alex Zimmerman reports:

Each measure, the report found, is independently linked to decreases in teacher turnover. And gains on two of those measures, high academic expectations and school safety, were directly connected to better scores on state math exams.

The study found that if a school improved from the 50th percentile across the study’s four measures of school climate (leadership, expectations, relationships, and safety) to the 84th percentile, teacher turnover would decline by 25 percent, or 3.8 percentage points.

A similar percentile increase in measures of school safety and high academic expectations alone boosted math scores enough to account for an extra month and a half of instruction. (Improvements in school climate also boosted language arts scores on state tests, but those gains weren’t statistically significant.)

It’s important to note that this study confined its focus to the following aspects of school climate:

  • safety and order
  • leadership and professional development
  • high academic expectations
  • teacher relationships and collaboration

Missing in such an examination (and mostly from these surveys themselves) is a focus on the physical environment of a school. There are questions pertaining to cleanliness and conditions of a school, but as we’ve also been arguing on this blog, the actual design, and the incorporation (or absence) of access to natural light and greenery, colors, furniture, etcetera (all largely subconscious factors), all have an impact on learning and relationships in a school.

If your school is interested in collecting school climate data, the US Department of Education is sharing free surveys and information for collection of data similar to NYC’s. Check it out and share.

Accountability for the Long-Term


I receive a monthly newsletter from bcg.perspectives that I scan for any relevant connections to school systems. Their work often centers on business policy, but sometimes their work has either a direct or indirect connection to the education sector.

A recent post,”Gauging Long-Term Impact in the Social Sector” on developing a system of long-term evaluation for a large international nongovernmental organization (INGO), there are lessons well worth considering in developing systems of long-term accountability for schools.

The INGO discussed, named SOS Children’s Villages, works towards “improving the situation of children who are at risk of losing, or who have already lost, parental care” across 134 countries.

The assessment methodology that SOS Children’s Villages developed jointly with BCG evaluates two elements of the programs’ long-term impact: the nonfinancial (or all-around) impact on the individual program participants and the community and the financial impact on society. The determination of the long-term impact on individual participants is based largely on information gleaned from interviews of former program participants by external researchers. This is supplemented by qualitative research conducted through focus group discussions with former child participants and their caregivers. . . 

The programs’ long-term financial impact on society is gauged by the programs’social return on investment (SROI), a comparison of the programs’ total costs and benefits to society. . .  The calculation of societal benefits is based on easily quantifiable elements.

This combination of intensive qualitative and correlated quantitative data gathering seems to make great sense when considering systems for school accountability. Many school systems have been relying primarily on isolated testing data — but why not go straight to the source, and interview the ones we most seek to impact? The students and families and community. And then correlate that with longer-term impacts via “social return on investment”? What are the long term outcomes of students after they graduate?

Raising test scores is wonderful. But enriching one’s community and society over the long-haul is the true goal of education. Developing better combinations of quantitative and qualitative evaluation of our school systems that can help us determine long-term impact is key to not losing sight of that higher purpose.

Friedrichs v CTA, and Thinking Probabilistically

Yeah, that headline was a mouthful.

But here’s the thing. You’re going to hear a lot of ed folks declaiming on the potential outcome of the Friedrichs v California Teachers Association SCOTUS case over the next few days. For good reason, as this is a case that may well prove to be more determinative of the future of public education in this country than ESSA.*

I’ve been reading Daniel Kahneman’s excellent Thinking, Fast and Slow lately**. Kahneman’s book is all about ideas we’ve touched on before here, such as cognitive bias and uncertainty. We’ve also looked at how “probabilistic thinking” could be used to overcome bias. So when I fortuitously came across this article on how “superforecasters” use probabilistic thinking, as well as a “base rate,” or “reference class” in order to make more accurate predictions, it jibed well with my understanding, and I think there’s useful lessons to heed as Friedrichs case is heard over the course of this week.

Rather than ideologically proclaiming sweeping predictions, as the experts are wont to do, “superforecasters” are less certain about their predictions, which ironically makes them better predictors. Professor Philip Tetlock delineates between “hedgehogs” and “foxes,” and notes that superforecasters are more akin to foxes:***

According to Tetlock, foxes are more pragmatic and open-minded, aggregating information from a wide variety of sources. They talk in terms of probability and possibility, rather than certainty, and they tend to use words like “however,” “but,” “although” and “on the other hand” when speaking. . . 

Unfortunately, most of the predictions you see in the media lack the specificity necessary to test them, like a specific time frame or probability, Tetlock says. . . 

Instead, Tetlock advocates for something he calls “adverserial collaboration” — getting people with opposing opinions in an argument to make very specific predictions about the future in a public setting, so onlookers can measure which side was more correct.

What does this have to do with Friedrichs? Well, I would suggest asking education “experts,” who will write about their ideas on the case, to assign a probability to their predicted outcome.

Based on my own, extremely limited understanding of the case, I think there’s a 65% chance that Friedrichs will win. I could well be completely wrong. But you’ve got my prediction here, in writing, with a timestamp on it, so you can hold me accountable to this.

I’ll write more on my thoughts on the case soon, but in the meantime, my thinking on Friedrichs v. CTA in a nutshell:

I think public sector unions need to change and adapt much more rapidly to a changing workforce and economy, but I believe strongly in the necessity for unions to present a necessary counterbalance to government and private financial interests. If Friedrichs wins, as I’m afraid she might, then we will witness a drastic further decline in the power of unions in our country. I believe this will be to the detriment of the long-term interests of our nation.

The only commentator I’ve seen thus far who’s beginning to think ahead to this outcome is Dan Weisberg of TNTP. He doesn’t assign a probability to the outcome, but implies it when he says the following:

Unfortunately for the unions, at least five Supreme Court justices appear to be more sympathetic to the teachers’ arguments than I am. The Court practically invited this challenge when it stopped just short of striking down agency fees in a similar case a few years ago.

I’m hoping our unions are already preparing for the worst, because no amount of impassioned op-eds can influence the outcome at this point.

*my apologies to all non-US residents for the US-specific jargon in this post.

**thanks to Deputy Chancellor Josh Wallack, who recently bestowed the book on educators at a dinner hosted by the NYC DOE Office of Leadership.

***We've looked at hedgehogs and foxes here before:


UPDATE 2/13/16:

UPDATE 2/13/16:

Justice Scalia has just died, so that completely changes the odds. While I had first assigned a 65% probability to Fredrichs winning this case, my forecast has shifted closed to 40%. Read more on SCOTUSBlog: "The most immediate and important implications involve that union case.  A conservative ruling in that case is now unlikely to issue."


Classroom Observation Significantly Influenced by Context

“Despite the intense focus on the use of student test scores to gauge teacher performance, the majority of our nation’s teachers receive annual evaluation ratings based primarily on classroom observations (Steinberg & Donaldson, in press). These observation-based performance measures aim to capture teachers’ instructional practice and their ability to structure and maintain high-functioning classroom environments. However, little is known about the ways that classroom context—the settings in which teachers work and the students that they teach—shapes measures of teacher effectiveness based on classroom observations. Given the widespread adoption of high-stakes evaluation systems that rely heavily on classroom observations, it is critical that we have a clearer understanding of how the composition of teachers’ classrooms influences their observation scores.

. . . We find that teacher performance, based on classroom observation, is significantly influenced by the context in which teachers work. In particular, students’ prior year (i.e., incoming) achievement is positively related to a teacher’s measured performance captured by the FFT.” [Bold added]

—Matthew Steinberg, University of Pennsylvania and Rachel Garrett, American Institutes for Research, “Panel Paper: Classroom Context and Measured Teacher Performance: What Do Teacher Observation Scores Really Measure?

Public Debates on Education are Ideological, Rather than Sociological

“Yet it struck me that most of the tensions the struggling school experienced that year were sociological rather than ideological: They concerned the challenge of bringing together people of different races and backgrounds (most of the families were low-income and black whereas most of the teachers were young, white, and middle-class) around a shared vision of what education can and should be. Yet our public debate is centered squarely on the ideological rather than the sociological. We endlessly debate the overall “worth” of various institutions—from “no excuses” charter schools to teachers unions—with a political or ideological framing. But we rarely venture inside, scrutinizing the arguably more important question of how people relate, or fail to relate, within these realms. Venturing inside—at least in a meaningful way—takes time, trust, and an open mind.”

—Sarah Carr, “There Are No Simple Lessons About New Orleans Charter Schools After Katrina. Here’s How I Learned That.” on Slate

Pineapple Express: Tests Shortchanging Student Literary Analysis Skills

The infamous Pineapple Passage on the 8th grade NY state test is rightfully making the rounds on  netmedia. It’s a prime example of something that has surprisingly thus far gone relatively unmentioned*, which is that as test-makers attempt to make test questions “higher order” in the form of inference and reading-between-the-lines, they necessarily skirt the fine line between what is easily quantifiable and what must be qualified by interpretation.

I noticed on my 5th graders’ ELA exam this past week that many of the questions were so subject to interpretation as to be perplexing as a multiple choice question.

I was an English major in college. Though I can’t claim to have engaged extensively in it, literary analysis is something I am not a complete stranger to, and I know that literary criticism can be highly subjective (though not as subjective as non-fuzzy major folks may assert). Much like in the art critic world, consensus towards a perspective on a particular work is arrived at via a long-form process of back-and-forth akin to peer review. Papers are written, professors stake careers on counterpoints, and over time, paradigms shift and the critique of a given work merges with the living history of a society. (Not sure if this last part was stated very clearly by the way, but I’m trying to push this post out while the issue is still relevant, so let me know how I can re-write that.)

It’s via the process of dialogue, therefore, that perspectives on literature evolve. It’s qualitative. You can’t assign a number to it without putting it into context.

Yet test-makers, due to pressure from policymakers concerned foremost with the short-term and the political, are attempting to assign numbers to the process of deeper literary analysis, which simply can’t be done. Ostensibly, they are measuring reading comprehension, but this disassociated push for “higher order thinking” mistakes simple comprehension of plot, setting, and character for deeper interpretation of what the text might mean.

For example, on the 5th grade test, there was a story about a family who lives in a cabin, and a blizzard suddenly occurs. Through a misunderstanding, the mother (or grandmother, I don’t remember) gets locked in the cellar by the father, who doesn’t know she’s in there. When he finally realizes what has happened, he opens it up, and the mother comes out, rubbing her hands and stomping her feet, and she quips something along the lines of “If you were going to lock me up somewhere, it should have been in the barn.”

A question then asks (wording might not be exact): After coming out of the cellar, the mother MOST LIKELY felt:
A. amused
B. anxious
C. angry
D. relieved

I was perplexed by this one. As I train my students to do, I kept going back to the passage to re-read the section on when she came out for evidence. She could be argued to have been somewhat amused, because she cracks a joke the moment she pops out. She may possibly be angry. One can infer that she is relieved, because who wouldn’t be relieved to be released after being locked up in the darkness?

The answer they want, obviously, is that she is relieved. But given that one could make an argument, based upon the evidence from her statement and via making a deeper inference about her evident dry wit, that she was also amused, it seems highly suspect to give a kid 0 points for one answer, and 1 point for the other. In other words, if you are testing a kid’s inferencing ability, well, then both answers are plausible when applying that skill.

There were a number of questions like this throughout the test. They are most certainly challenging questions, and interesting from a purely academic standpoint. But they are not amusing to me as I watch my students with exceptional learning needs with their heads bowed for over 2 hours grappling with passages that are well above their reading level. I witness children who whisper “I can’t do this” and put their heads down in the middle of the test. As GothamSchools noted in a recent article, this is akin to torture, and this grand experiment by short-sighted adults who simplemindedly clamor for quick and easy data has real and very human consequences on children.

There is a value and purpose for multiple choice questions. It’s like sticking your finger into the air after licking it. It gives you a quick sense in what direction the wind blows. But we need to stop pretending that we’re getting a true picture of an individual child’s ability to analyze and infer. So while NY may have “canned the pineapple,” we need to can the tests.

*This great opinion piece in the NY Times, Teach the Books, Touch the Heart, makes a parallel argument on how tests are hardly culture neutral, and how we should scrap multiple choice tests altogether and host written exams based on passages and books children have read in class. Great advice, and this is a must-read. Thanks to @KellyDillon1 for tweeting out the link to this.