I’ve seen several links and discussions today to this paper about judging classical music competitions.
The experimenter had people observe clips of musicians in competitions, then guess how well the musicians placed. Subjects guessed better when given video-only clips as compared to audio clips or audio+video clips. Conclusion: people care about looks far more than they think or admit they do.
But I think we can’t jump to such a conclusion based on this paper for a few reasons.
First, the clips were taken from the top three places at prestigious international competitions. These people are already the very best; there was probably very little variation between them. If we rate the auditory quality of the music they played out of 100, maybe they’re at 94, 95, and 96, or something. It’s not surprising that experts didn’t accurately judge who would win based on sound.
The failure of audio clips to predict competition placement is similar to how SAT scores aren’t very good predictors of the performance of Caltech students. If you took randomly-selected students from everyone applying to college and admitted them to Caltech, SAT score would be an excellent predictor of their success. But Caltech only admits people with very high SAT scores to begin with, so there’s not that much variation available to do the predicting.
Meanwhile, the variation in how the musicians move and express themselves physically could potentially be large – 50, 70, 90, for example. So even if judges base their scores mostly on the quality of playing, the visual aspect can still dominate the final rankings. The data don’t support the author’s claim “the findings demonstrate that people actually depend primarily on visual information when making judgments about music performance.” To show that, you’d need to show that visual information still trumps auditory information even when the players are not at about the same level. And it’s not like people with visual information did very well – they got to roughly 50% accurate. If you go from a distribution of 1/3 -1/3-1/3 to 1/2-1/4-1/4 you’ve reduced your entropy by about five percent.
Additionally, the clips used in this paper were six seconds long. So what we’ve shown is that you get a better quick, gut-instinct impression with visual than with auditory, but this doesn’t say a whole lot about the judges who were watching and listening to the entire performance. (Edit: as a commenter pointed out, the paper contains a vague description of the results holding with clips of up to one minute.)
Perhaps visual aspects of the performance are correlated with auditory aspects. Further, maybe six seconds is enough time to get a good feel for the visual aspects, but not the audio aspects (six seconds might not even be one entire phrase of the music). In that case, expert judgments during competitions could be based almost entirely on the audio aspect, but people would still predict those judgments better from videos.
It’s interesting that people were bad at predicting which choice (audio, visual, audio+visual) would give them the best results, but people have very little experience with this contrived task, so it’s not especially surprising. Further, I think the conclusions of the paper are probably true – visual impressions matter a lot in music performance, but I hold that belief based on my general model of how people work. The evidence in this paper is somewhat lacking, and it’s disappointing that a news source like NPR fails to state the important fact that the clips were not complete recordings, but very short, six-second impressions.