Teaching and the Two-Second Rule

Students only need two seconds to decide whether a teacher is good.

How do we decide a teacher is “good”? Are attractive teachers rated more favorably by their students? James Felton and colleagues reviewed RateMyProfessor.com ratings for 6,852 faculty members from 369 institutions. The best predictor of a good rating was “hotness”. In fact, Felton found a whopping 0.64 correlation between hotness and quality ratings.

If you are not high on the hotness scale, don’t despair. When we observe others, our brains process a lot of data, and only a small part is related to attractiveness. The amygdala assigns an affective valence to punishment and reward stimuli that is coordinated and processed via the frontal cortex during decision making (Damasio, 1996). Because social situations are highly associated with reward and punishment, we are able to rapidly intuit the essential information needed to navigate a social milieu.


From a review of the impression formation literature, Ambady and Rosenthal (1993) determined that most researchers had not controlled for the effect of mediating variables such as physical attractiveness when considering trait judgments from thin slices of behavior. They also found that criterion variables were typically measured by self-reports, which can be biased, rather than independent measures (Ambady & Rosenthal, 1993). Finally, they determined generalizability of the existing body of literature was limited due to the use of controlled perceiver-target dyads. This study of rapid impression formation and teacher effectiveness adds practical relevance to intuition research using a real-life context and functional criterion measure. Specifically, Ambady and Rosenthal determined whether people have the ability to form immediate impressions that are useful in predicting performance on a practical measure such as teacher evaluations. In addition, they wanted to identify specific nonverbal correlates of “good teaching”.

In the first study Ambady & Rosenthal explored the predictive validity of student raters’ thin slice judgments of college teachers against actual end-of-semester student evaluations. The 10s thin slice samples were taken from video tapes of 13 teaching fellows at a private university. Student judges were asked to view the thin slices and then rate the teachers on global traits (eg., confidence, warmth, empathy), and physical attractiveness. Judges were also asked to code the type and frequency of specific nonverbal behaviors exhibited.

Three 10 second thin slice videos were created from whole class videos of each teacher. Clips were taken from the beginning, middle and end of the class. The thin slices were then randomized and copied onto a master video file. The frequency and type of nonverbal behaviors exhibited during each 10s video were coded by independent raters who were blind to the hypothesis. Likert scales were used for rating attractiveness and personality traits for each of the targets. No training was provided to independent raters.

The reliabilities of the judges’ ratings of the global traits were computed by intra-class correlations for all 9 judges. Inter-rater reliabilities were moderate to high.  The global dimensions were inter-correlated using factor analysis, and one factor representing 71% of the variance emerged.  Because of the high inter-correlation among the 15 traits, a composite was used to determine there was no main effect on judges’ ratings from order of presentation of thin slices between judges. Because inter-correlations among specific behaviors were low, however, each behavior was considered individually as a mediator in the analysis. Reliabilities of judges’ ratings of these behaviors were high with the exception of “fidgets with objects”, which may not have occurred with enough frequency for good reliability.

Global trait judgments were then correlated with actual student ratings. Remarkably, 9 of the 15 dimensions as well as the composite variable reached significance.  Specific behaviors were also correlated with the composite trait variable. While teachers who looked down, leaned forward, fidgeted, and sat down were rated somewhat less favorably, there was no relationship between the composite of positive traits and behaviors such as smiling or shaking the head. This indicates that judges were not differentially responsive to specific body language cues. Rather, there was concurrence on global impression formation. Finally, the mediator physical attractiveness was analyzed. Inter-rater reliability for independent judges on a 5-point Likert scale was .80. When physical attractiveness was controlled, the impact on the relationship between the composite trait variable and the criterion (teacher effectiveness ratings) was negligible, indicating subjects were responding to global impressions and not the attractiveness of the teachers when making trait judgments.

In a second study a sample of teachers was drawn from a public high school. In this case, the criterion variable was principal ratings of teacher effectiveness on a scale of 1-5. All of the teachers chosen for the study received 4’s and 5’s from the principal. It is unclear as to whether there was some selection bias due to the volunteer process. This may have impacted the robustness of the analysis, as a range of teachers would be needed to fully evaluate a linear relationship between impressions and the criterion. Impressions based on thin slices predicted teacher effectiveness ratings; and these ratings were not mediated by nonverbal behaviors or the attractiveness of the teacher.  Ambady and Rosenthal do not offer evidence on the accuracy of principal ratings, but they do show that there was no relationship between attractiveness ratings and principal effectiveness ratings for the same judge.


Intra-class correlations show the degree to which observations in the same group resemble one another. One application is the assessment of consistency of the rating made by more than one rater (inter-observer variability) on the same trait or characteristic as well as the degree of variability of an individual judge’s score on a particular dimension. The authors used factor analysis to check the internal consistency of personality trait variables and obtained a single global factor.  The global factor was used to compute an overall effect size, which was significant. Effect sizes for the 10s, 5s, and 2s clip sequences were also compared for each sample and the strong predictive power of thin slices was consistent across clip-length. We only need two seconds to form an accurate impression!

Thin-slice Personal Brand from Red Gum on Vimeo.

Partial correlation is a way to determine the “overlap” of three variables, and then remove this shared variance to determine the hypothetical association between any two of the variables if the shared variance was removed. In other words, the third variable is held constant. In this case, the three variables are the criterion (dependent), physical attractiveness (independent), and behavior (independent). The result is an estimate the relative contributions of physical attractiveness and behavior respectively, so that we can partial out the actual predictive power of the judges’ first impressions (as measured by their 9-point positive trait ratings).

The discrete behaviors were factor analyzed to determine whether inter-correlations resulted in specific components that related well to one another. Raters recorded the type and frequency of each behavior but there is little explanation as to why specific behaviors were chosen   Behaviors were independent of one another rather than clustering around factors of “positive” and “negative”; thus analysis of each behavior and its relationship to overall teacher rating is an area for more study. An analysis of specific “teacher behaviors” that are associated with positive or negative student impressions would be interesting. Alternatively, a review of the literature on positive and negative body language might inform a coding scheme with higher specificity.

Ambady: “One would think that preparation and organization should count–and I’m sure it does to some extent, but behavior, charisma, and the factors that go into holding an audience count, too.”

Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431-441.

Felton, J., Koper, P. T., Mitchell, J., & Stinson, M. (2008). Attractiveness, easiness and other issues: student evaluations of professors on Ratemyprofessors.com. Assessment & Evaluation In Higher Education, 33(1), 45-61. doi:10.1080/02602930601122803




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s