Mary-Lee Mulholland (Mount Royal University)

The Utopia of Peer Evaluations of Teaching – A Cautionary Tale

In The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy David Graeber (2015) reflects on how bureaucracy, through an economy of paperwork, evaluation, and performance reviews, has spread from the corporate sector to the government, not-for-profit, and education sectors.  Universities are not exempt from this bureaucratic boom, but rather are ripe for its proliferation. The corporatization of universities through administrative bloat, performance measures, audit culture, bureaucracies of virtue, and discourses of “excellence” has been analyzed and thoroughly critiqued by many anthropologists (Graeber 2015, 2018, Krautwurst 2013, Menzies 2010, Roseman 2010, Shore and Wright 1999). One such economy currently booming is the peer evaluation of teaching used in employment, tenure, and promotion decisions at universities.  

As Graeber notes, the rise of many of these bureaucratic systems stems from “good intentions run amok” (2015: 8).  This is certainly the case of the bureaucratization of excellence in teaching.  Since the 1990s, there has been an important and worthy shift in universities to recognize and invest in teaching as an important part of our work (Bernstein 2008, Hutchings 1995).  As a result, we have witnessed the growth of teaching support centres, awards, teaching grants, Scholarship of Teaching and Learning (SoTL), and an emphasis on student evaluations and student surveys such as the National Survey of Student Engagement (NSSE). This is even reflected within CASCA with the introduction of CASCA Awards for Teaching Excellence and the formation of the Critical Pedagogy in the Canadian Anthropology Network.

 I think the recognition and support for good teaching are incredibly important and these are things I am quite passionate about. It is vital that our teaching, like our research, is well-informed, engaged, and rigorous. After all, it is through our teaching that anthropology makes its biggest impact. However, this emphasis on teaching, at least in part, comes from the neoliberal understanding of universities as service providers (rather than knowledge producers), exemplified by the student-as-consumer phenomenon (Bunce et al., 2017).  In this context, “good teaching” practices are packaged under trendy terms such as HIPs (high impact teaching practices), active learning, or service-learning (Mulholland 2016), while the art of the well-developed lecture seems to be underappreciated (Shiva 2021).  This focus on good teaching at universities occurs at the same time instructors are burdened with higher course loads, precarious employment, increased class sizes, and mounting service duties.  Likewise, as our teaching and service loads increase, support and time for research are dwindling. This, despite the fact that small class sizes and ongoing engagement with developments in our field are two of the most important factors in “good teaching” (Cadez et al 2017, Mulholland 2016). 

Following the bureaucratic logic described by Graeber, as good teaching becomes more central to the condition of our employment, it must, therefore, be evaluated.  The first measure utilized to assess teaching was student evaluations in the 1960s and within a few decades universities began using student evaluations in various types of performance reviews, including tenure (Gelber 2020).  Almost immediately, there was a backlash by teachers and researchers warning against the pitfalls of student evaluations, including the problems of bias, statistical insignificance, and the inability of students to assess the expertise and pedagogy of their instructors (Heffernan 2022).  In fact, many unions and faculty associations today have secured the right for student evaluations to be excluded from formal assessments of teaching.   It is in this context that peer evaluations of teaching emerged as a viable alternative (Cavanaugh 1996, Hutchings 1995). 

In the early 1990s, scholars such as Pat Hutchings with the American Association for Higher Education (1995) began researching and advocating for “peer reviews” of teaching in universities and colleges. Importantly, these early studies focused on using peer reviews of teaching for formative, not summative, purposes. This distinction is key. Formative peer observations are characterized as self-reflective, inquiry-based, experimental, and collaborative (Centra 1993, Iqbal 2014, Yiend et al. 2014).  Moreover, highly effective peer observations are reciprocal in which faculty members observe each other’s classes (Yiend et al. 2014). The research on formative peer observations overwhelmingly and conclusively indicates that these are worthwhile and effective strategies for growth and development in teaching.  In contrast, summative peer observations of teaching are used for accountability, a measure of performance, and quality assurance in the evaluation of precariously employed or pre-tenure instructors.  The research is also very clear, that when used for summative purposes many of the benefits of peer observations are lost (Cavanagh 1996, Centra 2003).  This is due to peer bias and subjectivity, reluctance to critique vulnerable colleagues in this context, loss of reflexive and collaborative features, redundancy, and power imbalances.  In fact, many of these issues are the same as those raised with student evaluations.  

Despite these concerns, summative peer evaluations of teaching are becoming increasingly prolific at universities across Canada.  As a case in point, let me share the bureaucracy of peer evaluations of teaching at my own university. When I began at Mount Royal University in 2010, the tenure process was five years.  During this time, I was required to have three peer observations annually for a total of fifteen over the tenure process. A few years into my tenure, the university reduced this to seven observations over five years (three of these must be completed by the chair in the first, third, and fourth-year).  In addition, contract faculty are also required to have evaluations completed by the chair or chair-designate every three years (notice the substitution of peer with chair). The evaluations, although said to be both formative and summative, are largely summative and highly redundant: summative because they are used as criteria for employment (for the precariously employed) and tenure, and redundant because there is very little variation between the various observations as the vast majority are positive. Moreover, at Mount Royal University peer evaluations are the responsibility of tenured faculty and chairs (who in fact do many of these evaluations), which clearly undermines the principle of peer-to-peer observation. 

Quite simply, these evaluations are used to check a box indicating whether the person can or cannot teach.  These evaluations only become formative if the instructor is struggling in the classroom and requires significant development in order to meet the standard of a “good teacher.”  In addition, these peer evaluations have become a major (and I would argue unnecessary) service burden on tenured colleagues who are required to do them.  At Mount Royal University, tenured faculty must attend a training workshop on peer evaluations of teaching, attend a pre-observation meeting, observe a class, attend a post-observation meeting, and complete a multi-page form with five sections requiring written observations and analysis. As we are a small university with small departments, doing peer evaluations is a regular part of our service and a particular burden for chairs. 

This year, I chaired a committee for my faculty association that monitors the evaluation of faculty (this includes student and peer evaluations of teaching, annual reports, and tenure and promotion). And, yes, I do see the irony in a committee to evaluate evaluations.  We did a quick survey of other universities in Canada and discovered most require 2-3 peer evaluations of teaching for pre-tenure and the majority have no requirements for precariously employed faculty.  However, many of these universities are increasing the frequency of these observations (many seem to be moving toward annual observations) for pre-tenure, introducing them for precariously employed instructors, and building the bureaucracy that goes with them. Any quick search of a university’s website will find training videos, workshops, criteria, forms, reports, and requirements of peer observations of teaching. Peer teaching evaluations are quickly becoming an industry unto themselves. 

At this point, allow me to circle back to David Graeber and his article “Are you in a BS job? In academe, you’re hardly alone” (2018) where he states: 

In most universities nowadays—and this seems to be true almost everywhere—academic staff find themselves spending less and less time studying, teaching, and writing about things, and more and more time measuring, assessing, discussing, and quantifying the way in which they study, teach, and write about things (or the way in which they propose to do so in the future). 

With this I caution my colleagues who are looking to embrace peer evaluations of teaching as a counter to student evaluations or as a means to ensure and support good teaching – they are not the affirmation of good teaching you are looking for.  When used for summative purposes, these evaluations are largely bureaucratic and undermine the potential of collaborative and reflexive peer observations of teaching. In short, more summative peer evaluations of teaching will not lead to better teachers. Rather, they take our labour away from what really matters – teaching and research. 


