On March 7, 1907 — almost 111 years ago to the day — the English statistician Francis Galton published a peculiar observation.
At a county fair held in Plymouth, 800 visitors had participated in a competition to guess the weight of an ox. While most people's estimates were too high or too low — falling an average of 37 lbs. away from the true weight of 1,198 lbs. — the median of everyone's guess was off by only 9 lbs., or less than 1 percent of the true weight of the ox.
This example illustrates what has come to be known as the "wisdom of crowds" effect. In some cases, the average of a large number of independent estimates can be quite accurate, even when the estimators have no special expertise.
"The average competitor," Galton wrote of the ox competition, "was probably as well fitted for making a just estimate of the dressed weight of the ox, as an average voter is of judging the merits of most political issues on which he votes."
The wisdom of crowds capitalizes on the fact that when people make errors, those errors aren't always the same. Some people will tend to overestimate, and some to underestimate. When enough of these errors are averaged together, they cancel each other out, resulting in a more accurate estimate. That's why the effect benefits from a large and diverse "crowd." If people are similar in the sense that they tend to make the same errors, then their errors won't cancel each other out. A crowd with many overestimators will yield a global average that still falls too high; a crowd with many underestimators will yield a global average that still falls too low.
In more technical terms, the wisdom of crowds requires that people's estimates be independent. Studies have found that when people can observe the estimates of others, the accuracy of the crowd typically goes down. People's errors become correlated or dependent, and are less likely to cancel each other out. We follow our peers, to the detriment of the performance of the group.
But a new paper offers an interesting twist on this classic phenomenon. When crowds are further subdivided into smaller "crowds" that are allowed to deliberate about the right answer, they not only succeed in overcoming the costs of introducing dependence, but even outperform the group as a whole.
The new paper, published last month in Nature Human Behavior and authored by Joaquin Navajas and colleagues, reports the results of a large-scale study of estimation. More than 4,000 people attending an event were asked to provide estimates for eight values, such as the height of the Eiffel Tower. They were then subdivided into groups of five estimators and encouraged to discuss half of the eight values to arrive at a consensus estimate for the group.
The key finding was that the averages from these "deliberating crowds" of five were more accurate than those from an equal number of independent individuals. For instance, the average obtained from the estimates of four deliberating groups of five was significantly more accurate than the average obtained from 20 independent individuals. In fact, averaging four deliberating groups resulted in a more accurate estimate than averaging 1,400 individual estimates.
These benefits were not observed for the estimated values that were not discussed by the group, so they somehow derived from the group-level process itself. But what, exactly, were the groups doing to achieve this impressive effect?
In a follow-up study with 100 university students, the researchers tried to get a better sense of what the deliberating crowds actually did. Did they tend to go with the answers of those who were most confident about their estimates? Did they gravitate towards the answers of those least willing to change their minds? This happened some of the time, but it wasn't the dominant response. Most frequently, the groups reported that they "shared arguments and reasoned together." Somehow, these arguments and reasoning resulted in a global reduction in error, rather than introducing correlated errors that undermined the wisdom of crowds.
The new paper by Navajas and colleagues reports only two studies, one large and one small, and it focuses exclusively on estimates concerning trivia or general knowledge. As a result, many questions remain. But the potential implications for group decision-making and deliberation are enormous. If a small number of deliberating groups can outperform a much larger number of individuals, this suggests that procedures like "deliberative polling" could be a promising strategy for public and private communities to pursue.
Galton introduced his 1907 paper by noting that "[i]n these democratic days, any investigation into the trustworthiness and peculiarities of popular judgments is of interest."
More than 111 years later, this interest most certainly remains.
Tania Lombrozo is a psychology professor at the University of California, Berkeley. She writes about psychology, cognitive science and philosophy, with occasional forays into parenting and veganism. You can keep up with more of what she is thinking on Twitter: @TaniaLombrozo