David A. Kenny
April 4, 2021


Chapter 2 of Interpersonal Perception: The Foundation of Social Relationships.


First impression research is of two types zero acquaintance and thin slices. Zero acquaintance, a term coined by Linda Albright, Thomas Malloy and myself in 1988, refers to the condition in which the perceiver views the target, but the target is not interacting with anyone. Many zero-acquaintance studies create groups of strangers on the first day of class and have them make judgments of each other. Zero acquaintance has the least information about the target.

A thin-slice study, a term coined by Nalani Ambady and Robert Rosenthal, involves the perceiver viewing the target for a brief amount of time, usually no more than a few minutes. Many thin slices studies use videos to present target information.

To what extent are first impressions consensual (two perceivers have similar first impressions)? So are Jane and Helen's first impressions of Sally similar. The measure of consensus is the proportion variance due to the target. Alternatively, it can be viewed at the correlation between two different perceivers of a set of targets

The second question is to what extent are first impressions accurate? What is correlation of the target effect with the truth? An obvious but difficult question is what counts as the truth More on this topic when target accuracy is discussed, but here the truth is defined as the target's self-rating, ratings by peers, or the target's behavior.

The third question discussed here the stability of first impression. Is a first impression a lasting perception?

It might be thought at zero acquaintance that the perceiver does not know anything about the target. However, the perceiver knows the target's age, gender, race, dress, and physical attractiveness. Also known is how the target moves and the voice contain information. This information can be used to make judgments of personality through stereotypes. For instance, at least in western cultures, physical attractiveness is associated with judgments of extroversion whereas in China physical attractiveness is associated with judgments of intelligence.  For there to be consensus at zero acquaintance, there must be shared beliefs about links between physical appearance and nonverbal behavior with personality.  These shared beliefs about targets are called stereotypes. Appearance information and nonverbal behaviors are associated with different personality traits. Lay people believe that only bigoted people use stereotypes, but social psychological research has documented that everyone uses stereotypes and much of that use is unconscious.

The lay view is that stereotypes are inherently false which is the case that not all members of the social group have the trait. However, some stereotypes are true of some people creating what is called a kernel of truth.

Glen Cleeton and F. B. Knight in 1924 were interested in phrenology, the “science” of predicting personality by measurements of the head.   As participants, they had 20 women who were members of two sororities and 10 men who were members of a fraternity.  In groups of 10 same-gendered persons, they were placed on stage and were rated on eight traits by 70 people who were described as “casual observers” and were composed of “businessmen, school superintendents and students of personnel management” (p. 223-224).  The eight traits were Sound Judgment, Intellectual Capacity, Frankness, Will Power, Ability to Make Friends, Leadership, Originality, and Impulsiveness. The average level of consensus between two judges was .20, meaning that about twenty percent of variance is due to target. To assess the validity of these judgements, Cleeton and Knight used peer ratings as the measure of the truth and found an average correlation of xx.

The second major study of zero acquaintance that I know of was published by Warren Norman and Lou Goldberg in 1966. They had University of Michigan students rate each other's personality on the first day of class.  The study yielded two sets of very surprising results. First, students agreed with one another about each other's personality.  Second, and even more surprisingly, these ratings by virtual strangers agreed with the targets' self-ratings.  So if two people agreed that a third person was friendly, that person tended to see him or herself as friendly. Thus, these judgments at zero acquaintance are consensual and appear to be valid.

Perhaps because of its counter-intuitive nature, these findings demonstrated by Norman and Goldberg study were virtually totally ignored.  In 1988 at the University of Connecticut, Linda Albright, Thomas Malloy, and myself published three studies of zero acquaintance using the paradigm that Norman and Goldberg developed.  (You can download the three raw data sets from this study: MALZER, KENZER, and ZERO.)  We had students on the first day of class form a circle and rate each other. We replicated the results of Norman and Goldberg.  Across three separate studies with 259 participants, there was consensus on the dimensions of Extroversion and Conscientiousness and correlations between perceiver judgments with and self-ratings; we, however, used a more elaborate statistical model (the Social Relations Model).

Since the publication of the Albright, Malloy, and Kenny study, there have been numerous follow-up studies, done at the University of Connecticut and around the world.  Each of these studies shows evidence of agreement in ratings and correlations with self-ratings. In Kenny, Horner, Kashy, and Chu (1992), we studied zero acquaintance in more controlled settings.  Participants viewed 20 sec. video clips in a laboratory setting. We were able to replicate the basic results of consensus being highest for judgments of Extroversion and somewhat weaker for Conscientiousness.

Several important studies were conducted by Peter Borkenau and Annette Liebler (1992) in Germany.  They undertook special measures to ensure that the targets and judges are unacquainted. Additionally, they used a community sample of targets, not just college students as have been used in other studies. Borkenau and Liebler find even stronger levels of agreement or consensus, likely because a community sample would be more heterogeneous than a sample of college students. They also consistently found evidence of correlations between zero-acquaintance ratings and self-ratings. They have also done the most detailed investigation of cues used to make personality judgments at zero acquaintance. Here is a brief summary of what they found:
        Extroversion: friendly expression, attractive, made-up face
        Agreeableness: feminine, short strides, effortful reading
        Conscientiousness: effortful reading, not relaxed sitting, formal dress
        Emotional Stability: masculine, light garments, stiff walking
        Culture: informal dress, weak voice, soft voice

Maurice Levesque and myself (1993) examined more closely the finding that perceivers generally agree about who is and is not extroverted and included a behavioral measure of accuracy.  Participants formed groups of unacquainted women and without interacting each woman rated the other group members on traits indicating the Big 5 factors and made a series of behavioral predictions (e.g., talkativeness).  We found that not only was there consensus on rating of talkativeness but when the women were videotaped interacting one-on-one with each other women who were judged as more talkative actually talked more.

Only one published zero-acquaintance study involved non-western subjects (Albright, Malloy, Dong, Kenny, Fang, Winquist, & Yu, 1997).  This study was conducted at Beijing Normal University in China. Although some results were somewhat different, they were basically consistent with results from studies of persons from western cultures: Perceivers agree in their ratings at zero acquaintance and these rating correlate with self-ratings.

Research has consistently shown that agreement at zero acquaintance is strongest for Extroversion. Remember that people know how talkative someone is, even if they have never seen the person engage in conversation. Agreement is generally shown for ratings of Conscientiousness. Agreement for Agreeableness, Emotional Stability, and Culture tend to be less. As for validity, it seems to be greatest for Conscientiousness. Recall, however, that Borkenau and Liebler find even stronger levels of agreement and validity with a community sample.

Fortunately, there is a zero‑acquaintance study of liking conducted by Mitja Back, Stefan Schmukle, and Boris Egloff conducted in 2011.  They took all 74 of the students intending to study psychology at the Johannes Gutenberg University of Mainz in Germany.  They did find some agreement of .14 for liking. 

Although correlations of ratings at zero acquaintance with self-ratings suggest that the zero-acquaintance ratings are valid, there are alternative explanations. For instance, it might be that a person successfully conveys a false image through clothing and speech. For instance, someone who falsely thinks he or she is athletic wears athletic clothing and perceivers mistakenly conclude that the person is athletic. Stronger evidence that zero-acquaintance ratings are accurate is needed.

The Levesque and Kenny 1993 study done at the University of Connecticut provides to date the strongest foundation of validity.  Zero acquaintance ratings of Extroversion predicted how much each woman talked and how much each gestured. Many of these correlations were very strong.

The bulk of the research on accuracy based on minimal information is within the thin slices tradition.   In the original 1992 thin slice meta-analysis, Ambady and Rosenthal found an average correlation of .40.

Sam Gosling, Sei Jin Ko, Thomas Mannarelli, and Margaret Morris (2002) had individuals make judgments about a target after seeing the targets' office or bedroom.  They showed that strangers could reliable and validly judge targetsí personalities by looking at their bedrooms and offices.  Consensus and accuracy correlations were generally stronger than those found in zero-acquaintance research.  This study demonstrates that accuracy is possible without ever meeting (or even seeing) a person.

Nalini Ambady, Mark Hallahan, and Brett Connor (1998) have demonstrated consensus and better than chance accuracy at judging sexual orientation on the basis of 10 second and 1 second video clips, still photographs, and even a 10 second figural outline display.   Evidence suggests that gay men and lesbians are more accurate than heterosexuals at judging sexual orientation in some conditions (still photographs and 1 second clips) but not others (10 second clips).

In 2015, Alexander Todorov and colleagues have questioned the validity of thin slices and zero-acquaintance research.  They give a 1928 quote from Clark Hull which pretty much summarizes their views: “The results as a whole certainly look very bad for the judgment of character on the basis of photographs” (p. 119). To my mind their objections do not invalidate the accuracy findings reviewed above, but readers should draw their own conclusions.  Please note that the focus of Todorov and colleagues is on the ability of perceivers to make predictions about a target from the face, but they criticize studies like Borkenau and Liebler which did not just use faces.  They make five points, each of which is considered. 

The first is that many researchers have failed to properly control for gender, ethnicity, and age.  The suggestion is that when such controls are made, the accuracy would disappear. This is an issue in some of the studies, but for most of the studies it is not an issue. For example, in the 2002 Sam Gosling and colleagues' study when they controlled for gender and ethnicity, the basic pattern of results remained, essentially unchanged.

The second issue is that they view above-chance accuracy as a rather feeble benchmark which says little about the validity of these sorts of judgments.  The reader can decide this for him or herself.  To my mind, showing any validity of first impressions is impressive.  It is not clear to me why the bar needs to be raised, but the reader may feel differently. 

The third issue is that perceivers do not seem to be aware of any skill of this area.  To back up this claim they look at relationship between who is better at making these sorts of judgments and their awareness of the skill.  As discussed in the Individual Differences in Accuracy page, it is very likely that if there are differences that they are very small.  But even if there are differences, why is it important that perceivers be self-aware of their skill? 

The fourth issue is the inherent variation of facial images.   If a picture is taken of a target, one picture may look very different from another picture.  Consider the famous mug shots of Nick Nolte and Mel Gibson after their arrests.  Looking at them you do not see anything near a matinee idol.  However, the argument makes any correlation of judgments with the truth even more impressive, because of the inherent variability of faces.  For this to be an issue, the researcher would have “to cheat” by selecting photographs to confirm their hypothesis.  This argument would seem to have little merit.  From Cleeton and Knight to the zero acquaintance studies of the 1980s, live targets were used and so there was no selection of facial images by researchers to confirm their hypothesis.

The fifth issue is that even if there is accuracy, that accuracy does not imply a direct biological link between facial features and personality.  That link might be mediated by the self‑fulfilling prophecy which was discussed above.  I have some trouble understanding this objection.  Just because Todorov and colleagues want a biological link, why should I have to care?
In sum, I do not find the arguments made by Todorov and colleagues to be persuasive.

Certainly there are limits of the conclusion of the accuracy of first impressions. First realize that the accuracy correlations reported above refer to the average judgment (the crowd), not the accuracy of an individual judgment. Second, there have been studies that have looked for accuracy of impression based on minimal information, shoes.

There is the old saying that a first impression is a lasting impression.  In part this is due to the fact that if a target makes a poor first impression, the perceiver will likely decide to avoid the target, and that target never has a chance to make a better second impression.  However, what does happen if the target gets a second chance?  Is it the case that the second impression is not all that different from the first impression?

The 1992 paper that I wrote with Caryl Horner, Ling-chuan Chu, and Deborah Kashy in examined the stability of first impressions from zero acquaintance to interacting with the target for about ten minutes across two studies.  We examined Extroversion and found strong stability for the target effect, r = .90, but not nearly the same degree of stability for the relationship effect, r = .40.   This finding then is that if a person makes a poor first impression and perceivers agree, that component does not change much over time.  However, if the perceiver sees the target in an idiosyncratic way, that part does change quite a bit.

However, the more interesting question concerns evaluation.  If initially somebody evaluates a target negatively, can the target improve his or her impression after more interaction?  Two very different studies examined this hypothesis.  The first study is near and dear to the hearts of many of the readers as it concerns teacher evaluations.  If as a teacher, you make a bad first, can you overcome it if you are given a second chance?  In 2015, Jennifer Gross and colleagues asked 145 college students to evaluate 10 teachers based on a thin-slice of 6-minute video of 10 professors lecturing.  Then three to ten weeks later, students listened live to each professor lecturing for 40 minutes and evaluated him or her.  Both the target and the relationship components in these evaluations can be computed at each time and their stability over time can be assessed.  The target effect, how positively the teacher is evaluated, was highly stable, r = .86.  Thus, if a teacher made a poor impression in the short lecture to students, about 93 percent of the time he or she again made a poor impression when given a second chance.  However, the relationship effect was not nearly as stable , r = .24.

What about interpersonal liking?  If initially you dislike someone, what are the chances that it will change?  Returning to the study conducted by Mitja Back and colleagues in 2000 where liking was measured for 73 college students at zero acquaintance in the beginning of the academic year.  Liking was then re-measured for 54 students at the end of the academic year, about nine months later.  The stability correlation of the target effect is .42 and the correlation for the relationship is .23.

The conclusion is straightforward:  The consensual part of the first impression is very slowly changing, but the non-consensual part does change.

