Noise
Book Author | |
---|---|
Published | May 18, 2021 |
Pages | 454 |
Greek Publisher | Κάτοπτρο |
A Flaw in Human Judgment
What’s it about?
Noise (2021) is an exploration into the chaotic and costly role that randomness plays in human judgment. By uncovering the mechanisms behind how our minds and societies work, the authors show how noise – unwanted variability in decisions – is both inescapable and elusive. We can, however, with a few solid strategies, make our judgments less noisy and our world fairer.
About the author
Daniel Kahneman is an economist and psychologist and the author of the groundbreaking Thinking, Fast and Slow. Kahneman’s work earned him a Nobel Prize in Economics in 2002 and the Presidential Medal of Freedom in 2013. He’s currently a professor emeritus at Princeton University.
Cass R. Sunstein is a legal scholar and the author and coauthor of several books including Nudge which he coauthored with Richard Thaler. Sunstein served as a top administrator in the Obama White House and is the founder and director of the Program on Behavioral Economics and Public Policy at Harvard University.
Olivier Sibony is a fellow at Oxford University, a former senior partner at McKinsey & Company, and the author of You’re About to Make a Terrible Mistake!
Basic Key Ideas
Imagine you’re holding a stopwatch in your hand. Without looking, start the watch and then stop it after exactly ten seconds.
If you do this a couple of times in a row you’ll notice that hitting ten seconds on the dot is just about impossible. Sometimes you’ll be a little short, sometimes you go long. Sometimes you’re off by milliseconds. Other times you’re off by a whole second, or even more. Either way, through this little experiment, you’ll end up with a set of errors that have no discernible pattern and no apparent cause.
This is an example of noise, or random mistakes in judgment. And while your errors in this little stopwatch experiment are innocent enough, as you’re about to learn, variations in judgment like these can have far more serious consequences. Welcome to the strange world of noise.
In these blinks, you’ll learn
- what the weather has to do with your chances of getting into college;
- why you – and everyone else – are terrible at predicting the future; and
- why our narrative-seeking brains wreak havoc on our judgments.
To get a better grasp on the random and strange nature of the kind of noise we’re talking about here, let’s imagine you’re a high school senior, and you and your best friend are self-professed academic nerds. You’ve both earned straight As, nailed the SATs, and landed admissions interviews at the same Ivy League university.
You go to your interview and everything goes swimmingly. Your high marks impress the admissions officer and you cross the campus back to your car feeling great, the sun on your face and a cool breeze at your back.
Your friend has her appointment with the same admissions officer on the following day. Just like for you, her interview is a smooth ride. But when she leaves, the rain clouds that have gathered all afternoon break open into a downpour.
Weeks pass, and you each receive a letter from the admissions office. Turns out, they’ve rejected you but accepted your friend. Your mind reels. Why? What does she have that you don’t?
Here’s the first key message: Unrelated and unpredictable factors can have an alarming impact on human judgment.
According to a 2003 paper evocatively titled “Clouds Make Nerds Look Good” by behavioral scientist Uri Simonsohn, the weather might have made the difference. Simonsohn discovered that on cloudier days, college admissions officers pay more attention to grades and scores.
On sunnier days, on the other hand, admissions officers are more sensitive to nonacademic qualities, meaning that on the day of your interview, the officer might have been more interested in athletics and artistic talent than straight As and SAT scores.
Then again, perhaps the admissions officer’s decision had nothing to do with the weather at all, and more to do with the interviewees that preceded you. Perhaps those students were great candidates, and the admission officer just didn’t want to go on an acceptance streak.
But wait. Other irrelevant factors may also have influenced the decision. The admissions officer might have been hungry; he may have felt that sunny day was too hot, despite the air conditioning in the office; his hometown football team might have just lost an important game. Researchers have shown that each of these irrelevant factors can affect the decisions of bank loan officers, baseball umpires, physicians, and judges.
Importantly, in all of these scenarios, one person repeatedly confronts substantially the same situation, yet makes different judgments. Researchers call this variability occasion noise, and it’s one of the major categories of noise. But it’s not the only one.
Let’s do another thought experiment, one that takes place at a carnival. Specifically, at a shooting arcade.
You and a friend, BB rifles in hand, have just fired several metal pellets at paper targets hanging at the far end of the range. You’re both terrible shots, but in different ways.
On your paper target, the misses are scattershot. Taken at arm’s length, you can see there’s no pattern. Your shots are noisy.
But your friend’s paper target tells a different story. His shots cluster together, but not on the bullseye. His shots tend to be low and to the left. It’s as though he believes the real bullseye is actually down there. Or perhaps the barrel of his rifle is bent. What’s behind his systematic inaccuracy?
The key message here is: Noise and bias are different, though bias can lead to noise.
Whenever we make errors systematically, we call that bias.
Day to day, we employ the term to describe prejudice for or against certain groups of people. In the field of psychology, the term is often used to identify cognitive mechanisms that skew our judgments.
Take conclusion bias, for example, which causes us to bend our judgments toward a desired outcome, making us interpret evidence in a skewed way. Just consider one Miami immigration court, where the chance of getting asylum was found to swing from 5 to 88 percent depending on which of two judges heard the case. These two judges were most likely suffering from conclusion bias. Needless to say, this kind of bias can have life-changing consequences.
If asylum decisions were pellets in a BB gun, the two Miami judges would each produce paper targets with shots clustered off target. But if the asylum decisions of the entire Miami courthouse were mapped out, including the variable decisions of other judges, the courthouse’s paper target would show a scattershot mess.
This kind of variability, where judgments within a system are unjustifiably inconsistent from one another, is called system noise.
Remember the occasion noise of the Ivy League admissions officer? That too may have resulted from bias. But whether we’re sussing out occasion noise or system noise, we have to examine our paper target at arm’s length. If we hold the target too close to our eyes, the noise ceases to be apparent.
Now, let’s turn to another area prone to noise: predictions.
A judge deciding bail faces a mighty responsibility. Should she keep a defendant in jail pending trial, or let him go?
If she wrongly denies bail, the defendant will lose his freedom and perhaps his job. His family might even lose their home. None of this loss will have served justice. On the other hand, if she wrongly grants bail, he may flee and worse, commit another crime.
Weighing these consequences, a bail judge calls upon her experience and the record before her to predict what the defendant would do if released.
Unfortunately, humans – and that includes judges – are terrible at making accurate predictions.
The key message here is: When we make predictions we’re easily led astray by what feels right.
In 2018, a team led by computation and behavior researcher Sendhil Mullainathan created an algorithm that produced bail judgments. They fed it the results of approximately 760,000 real-life bail hearings and found that the algorithm would have simultaneously lowered the jail population and crime committed by released defendants by 24 to 42 percent.
Other studies have found that a rudimentary formula that considers only two factors – the defendant’s age and the number of court dates they’ve missed – also outperforms human judges.
This begs the question: Why would experts with years of training and experience fall a distant third to algorithms and back-of-the-envelope math?
The answer is simple. Judges are human.
When we attempt to predict the future, we’re seeking closure. It’s an attempt to solve a mental puzzle, and when we come up with an answer, we experience an internal signal that says, Yes, that’s it!
A satisfying prediction coheres to the way we see the world, and the power of this emotional reward often blinds us to a basic limitation of predictions: objective ignorance.
We don’t know what we don’t know, and what we do know might be wrong, incomplete, or misleading.
Rules and algorithms are also ignorant. More so, in fact. But they’re free from internal signals, perceptions of the world, and emotional rewards. In short, they outperform us because they’re free from noise.
You might have noticed that blinks often begin with a story.
We sketch out a time and place, an event that involves a character, someone with a goal and obstacles to overcome. We do this because the human mind loves a story, and information that works within a story tends to stick.
So far we’ve looked at noise and some of its consequences in the justice and college admissions systems. As we’ve said, anywhere people make judgments, there’s bound to be noise. But if noise is everywhere, why haven’t we heard more about it?
The key message here is: We ignore noise because it doesn’t make for a good story.
Much of the psychological insight of recent decades has explored our deep attachment to narrative. The human mind, we’ve learned, understands the world by making stories to explain what we observe.
For example, psychologists have identified what’s called the fundamental attribution error, our habit of crediting or blaming people for outcomes that are better explained by mere circumstance. In other words, we see characters and plots everywhere. When our reality is challenged, the psychological mechanism of naive realism, the self-reinforcing belief that we perceive reality just as it is, bolsters our narratives by ruling out troublesome counter-narratives. And when the truly unexpected occurs, the mind works to bring it within what the authors call the valley of the normal, where the strange is made understandable by attributing a cause in hindsight.
That brings us to the main point here: noise resists narrative. Noise isn’t causal and it doesn’t cohere to our patterns of understanding. To the degree that it’s a story at all, it’s a frustrating and apparently meaningless one.
Without a story to accommodate noise, it escapes our notice. We either miss it entirely, edit it out of our awareness, or perceive it as an instance of bias.
Bias, after all, does well in a story. It has causal force.
Noise, on the other hand, can only be observed statistically. The variability of bail denials, college admissions, asylum hearings, and hiring decisions may feel like bias when experienced firsthand. And as we’ve said, bias may be a contributing factor in some cases. But when we back up and view these phenomena in the aggregate, their random, chaotic nature becomes apparent.
Before moving on, let’s just quickly recap what we’ve learned so far, which is this: Wherever there’s human judgment, there’ll be noise and, sometimes, that noise can have alarming, life-changing consequences.
So, that naturally leads to the question: What can we do about it? How do we turn down the noise?
To begin answering this question, let’s go back to the fall of 1906 when Francis Galton, a polymath and cousin to famed evolutionist Charles Darwin, visited a county fair in Plymouth. While strolling through the stalls, he came across an ox-weighing competition. Galton, who among other things, was a theorist of intelligence, listened with curiosity as close to 800 villagers gave their best estimates of the ox’s weight. No one got the correct answer: 1,198 pounds. When the competition was over, Galton asked the organizers to lend him the tickets for statistical analysis.
When plotted on a graph, the estimates were scattershot. Over here and under there by variable, unpredictable amounts. The villagers were noisy. But, when Galton calculated the mean of the villagers’ estimates, he noticed something surprising. It was nearly perfect, only off by a single pound.
The key message here is: We can cancel out noise by averaging multiple, independent judgments on a single question.
Galton had stumbled across a phenomenon now known as the wisdom-of-crowds effect. In hundreds of different circumstances, getting independent judgments from multiple judges and then averaging their responses has been shown to bring you close to the truth.
When you ask people to guess the number of jelly beans in a jar, the distance of a randomly chosen city, or the temperature a week from today, their answers will vary. They’ll be noisy. However, when averaged out, the noise in one response counteracts the contradictory noise in another. The noise cancels itself out.
But the wisdom of the crowd comes with some essential caveats.
First, each judge must be independent of the others. When you ask a group a question all together, the individuals respond as much to the group as they do to the question itself.
Also, the wisdom of the crowd only bears out when each individual considers the exact same case. Asking each person a different question will get you nowhere.
Finally, the wisdom of the crowd doesn’t guard against bias. If the crowd shares a bias, a systematic error in judgment, the mean of the members’ responses will only distill and amplify that bias. For example, if a hiring committee suffers from bias against women, that bias will only appear more salient when we aggregate and average its judgments on a female job candidate.
So far, we’ve spent a fair bit of time talking about judges and the random and occasionally inexplicable variability in the sentences they hand out. The random quality of this injustice didn’t escape the notice of Marvin E. Frankel, a US District Judge.
Early in his career, Frankel realized that he could, for example, send a convicted bank robber to prison for up to 25 years, or he could choose a sentence of one day. The judgment, Frankel saw, ultimately depended on his own views, predilections, and biases.
In 1973 Frankel published a book showcasing disparities in sentencing for substantially similar crimes. One case in point was a small-time check counterfeiter who got 30 days, and another that got 15 years for essentially the same crime.
Individual anecdotes, however, can be explained away. So, Frankel set out to create a more durable, systematic portrait.
The key message here is: To fight noise, you’ve got to first make it visible through a noise audit.
In 1981, Frankel led a research team that asked 208 federal judges to sentence criminal defendants from 16 fictional vignettes.
Frankel’s team presented the scenarios to each judge individually and then mapped out the variation between the judges’ proposed sentences in each case. The study, and several others like it, statistically proved that shocking variability, not consistency, was the norm in criminal sentencing.
Frankel showed that an audit can determine the level of noise in any institution where a pool of experts takes on substantially similar cases. Here’s how it’s done.
First, determine your bullseye. Decide how much variability in judgment is acceptable. For an insurance executive, the question might be, what’s a tolerable difference in the payout recommended by separate claims adjusters inspecting the same flooded basement?
Next, gather your judges, or claims adjusters as the case may be, and present them with scenarios. Make sure to provide them with a numerical expression for each judgment, such as years in a prison sentence or dollars in an insurance payout.
Finally, map out the judges’ answers in relation to your bullseye. And voilà, you now have a diagnostic portrait of the noise in your institution. Now, what can you do about it?
Imagine you’re on an operating table, about to go under the knife. Just before you nod off under the influence of a general anesthetic, the lead surgeon steps over to the sink. She lathers her hands up with soap and then places them under the hot water. With that one simple act, she has prevented an unknowable number of pathogens from entering your body.
We can do much the same with noise. By adopting principles we’ll call decision hygiene, we can make our world substantially less noisy.
The key message here is: Decision hygiene can reduce noise, and like regular hygiene, it’s about discipline and prevention.
The first step to developing decision hygiene – the equivalent to the surgeon washing her hands with soap – is learning how to stop and think statistically before making any important decision.
We learned in a previous blink that our narrative-seeking minds make a story of everything. We dive in and imbue the particulars of a case with cause and meaning. While this is a natural tendency, it’s an invitation for noise. Instead, you should aim to take what Kahneman calls the outside view: frame every case in reference to a body of other similar cases.
For example, let’s say you’ve got a new CEO at work and you’re wondering whether she’ll be successful. The CEO’s education, reputation, and performance history might give you some clues, but it’s a mass of complex and potentially misleading information.
A less noisy approach would begin by exploring outcomes from similar situations. For example, you could find out the average rate of turnover of CEOs in your industry, or explore how often new CEOs result in rising share values.
While building a strong statistical framework, you should resist premature intuition.
We all love a judgment that feels right, but you want it to feel right for the right reasons. Stow that gut feeling you may have about the CEO’s alma mater or that hunch on what went down at her last job. Instead, save the emotional reward for a judgment that jibes with a well-founded view of what’s most likely.
On a related note, creating a judgment that’s both complex and coherent is emotionally rewarding, but that reward can mislead you. If possible, break tough cases into separate questions and hand them off to independent judges.
For example, connecting the dots between your CEO’s tenure and the company’s stock value might be a fun mental puzzle, but it could also be irrelevant.
We have one more critical principle of decision hygiene, and it comes in the form of a cautionary tale.
In 1984, Judge Frankel was victorious. Congress enacted the Sentencing Reform Act and soon after instituted strict sentencing guidelines based on an analysis of 10,000 real cases. Under the new rules, judges could only consider the crime and the defendant’s criminal history. The judge would assign a numerical value to each and the resulting score dictated the range of possible sentences.
As a result, noise in sentencing plummeted.
For example, before the act, a man convicted of dealing drugs might have faced a variability of years in his sentence depending purely on the judge that happened to hear the case. Afterward, that variability went down to a range of a few months.
Judges across the country complained bitterly. They’d fine-tuned their sense of justice through years of study and experience, and now their discretion was gone, replaced by a crude math problem.
The key message here is: For noise reduction to stick, it’s essential that judges buy in.
In 2005, the Supreme Court struck down the Sentencing Reform Act for technical reasons. A few years later, Harvard law professor Crystal Yang conducted an analysis of 4,000 criminal cases sentenced after the act’s demise. The disparity between harsh sentences and the average had doubled. Personal values had emerged as a basis for sentencing. Noise had returned.
In hindsight, Judge Frankel and his allies missed a crucial step in their campaign for noise reduction. They’d failed to make the judges agree on the ultimate purpose of judgment.
The goal of judgment should be accuracy, not personal expression.
Unlike literary criticism, competitive sport, filmmaking or any other field where diversity in opinion and style breeds richness and growth, variability among experts judging substantially similar cases is a problem. When two radiologists independently read the same x-ray and arrive at different conclusions, one of them is wrong. In other words, when judges step up to the shooting arcade, they must first agree on the same bullseye.
Once the judges agree that accuracy is the highest priority, the auditors should invite them to create the test scenarios. Failing to do so assures the audit will meet with hostile scrutiny. Next, the judges have to see the extent and costs of the noise.
Kahneman, for example, conducted an audit in an insurance company and discovered that underwriters averaged a 55 percent difference in the premiums they set for customers. For the underwriters to appreciate the importance of reducing this noise, they had to understand that losses from over- and underpricing ran in the hundreds of millions of dollars.
Finally, when instituting decision hygiene, judges must take part in formulating practical, system-specific rules that balance the reducing noise with other costs. For example, a backlash against rising crime led some US states to adopt a “three strikes” rule that mandated life imprisonment for defendants convicted of three felonies. The rule reduced noise, but did so without accounting for a defendant’s history, the severity of her crime, or her capacity for rehabilitation.
Audits, hygiene, rules, habit, and prevention. Noise reduction isn’t very glamorous work. But by now, you’ll likely agree that noise exacts a terrible cost. It wastes resources, spurs injustice, and results in personal tragedy. It erodes faith in institutions of law, medicine, education, and work. Now that we’ve uncovered it, it’s our job to reduce it.
The key message in these blinks:
Random and unwanted variability in human judgment is everywhere, and whether we see it or not, we pay a heavy price. The good news is we can reduce noise if we change our mindset and adopt principles of prevention.
Actionable advice:
Tap into the wisdom of the crowd within. As you learned in blink five, averaging multiple, independent judgments of a single question can counteract the noise of those judgments and leave you with a strikingly accurate answer. Thing is, if you ask yourself the same question multiple times, you can achieve the same effect.
Try it. Over the next few days, ask yourself the following question: what share of the world’s airports is in the United States?
The average of your responses will be relatively noiseless and surprisingly close to the truth.
Spoiler alert: The answer is 32 percent.
SECOND REVIEW FROM SHORTFORM
About Book
Why do two similar people, convicted of the same crime, receive drastically different sentences? Why does one job candidate get hired when an equally qualified candidate isn’t even interviewed? The answer, according to Daniel Kahneman (Thinking, Fast and Slow), Olivier Sibony, and Cass R. Sunstein (Nudge), is noise—unexpected and unwanted variance in human judgments. Kahneman, Sibony, and Sunstein argue that understanding and counteracting this kind of noise is key to improving the judgments that affect some of the most important aspects of our lives, including our justice system, medical care, education, and business decisions.
Drawing on decades of research and their own experiences as noise consultants, the authors explain what noise is, where it comes from, and how we can reduce it to make our world more fair and consistent. In this guide, we’ll elaborate on the concept of noise by connecting it to the authors’ previous work and to similar ideas in fields ranging from financial trading to baseball scouting.
Noise, by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, is about how to improve the judgments that affect some of the most important aspects of our lives, including our justice system, medical care, education, and business decisions. As the title suggests, the book focuses on noise, which the authors define as unexpected and unwanted variance in human judgments. The authors argue that if we can understand what noise is, then we can reduce it—and thereby drastically reduce unfairness, loss of money, and even loss of life.
Noise draws on the authors’ expertise across multiple fields. Kahneman is a Nobel Prize-winning psychologist and the author of the award-winning Thinking, Fast and Slow. Sibony is a professor of strategy, a business consultant, and an author of business strategy books. Sunstein is a legal scholar and co-author of the award-winning Nudge. Noise also draws on decades of research and the authors’ own experiences as noise consultants in business settings. The book aims at a general audience, but it may be of particular interest to anyone in charge of an organization that depends on human judgment.
This guide is organized into two major parts. We begin by explaining what noise is and why it’s such a problem. To understand it better, we break noise down into different types and analyze the psychological tendencies that produce it. Then, we explore strategies designed to reduce noise. Throughout the guide, we expand on Noise’s arguments by connecting them to similar ideas from other works and to contexts ranging from financial trading to baseball scouting.
What Noise Is and Why It Matters
Before we can tackle the problem of noise, we have to understand what noise is, where it comes from, and why it’s a problem worth solving. This section begins by looking at how noise introduces error into judgments and how these errors lead to unfairness, financial loss, and physical harm. Then, we’ll look at how noise arises as a result of the way our minds work.
What Noise Is
The authors define noise as one of the two main errors in human judgment (the other being bias). To understand noise, we must first define judgment.
Judgment
A judgment is an attempt to mentally assign a value to something in order to choose a course of action. The authors break judgments down into predictions and evaluations.
Predictions aim to come as close as possible to some correct value or answer. The authors point out that insurance underwriters make predictions when they prepare quotes, and they’re aiming at a theoretical goldilocks number (just right). If the premium is too low, the company loses money. If the premium is too high, the company loses customers. (Shortform note: A similar predictive calculation comes into play in any field that, like insurance, requires balancing risk against potential profit. In a worst-case scenario, noise in these calculations can lead to a full-blown financial collapse.)
Likewise, doctors make predictions when they diagnose patients; they are trying to find the correct cause(s) of the patients’ ailments. The authors point out that you can measure the accuracy of a predictive judgment by comparing the prediction to the correct answer once it’s known. (Shortform note: Although in principle you can check a prediction’s accuracy by comparing it to the result, measuring predictive accuracy is a complicated task in its own right. Many predictions are too vague or too qualified for us to really judge them.)
Other judgments are evaluations; they have no correct answer, but instead require the decision-maker to balance pros and cons as best as possible. The authors point out that judges make evaluations when they decide how to sentence criminals, or whether to grant asylum. Similarly, teachers make evaluations when they grade essays. The authors contrast evaluative judgments with predictive judgments by pointing out that since there is no “correct” answer to an evaluative judgment, you can’t measure the accuracy of an evaluation in the same way you can with a prediction.
Critiques of Noise’s Statistical Basis
Noise uses relatively complex statistical concepts and formulas to argue that reducing noise is always beneficial because, mathematically, doing so reduces overall error. These statistical concepts rely on comparing erroneous values to a known, correct value—which doesn’t exist in evaluative judgments. As a result, some reviewers have questioned whether it’s accurate to call variances in evaluations “noise.” Meanwhile, other reviewers have critiqued the authors’ use and explanation of statistical concepts and terminology.
Readers should be aware of these criticisms. With that in mind, this guide forgoes these statistical foundations in order to focus on the larger, actionable principles that run throughout Noise.
As a general rule, judgments are neither purely factual nor purely opinion-based. A doctor reading the results of your blood panel isn’t making a judgment (though if she sees an anomaly, she might make a judgment about its cause). Likewise, your preference for one band over another isn’t a judgment either.
Given how important professional judgments are in so many areas of our lives, we would certainly hope and expect that these judgments are accurate, consistent, and error-free. We expect a certain amount of deviation from one judger to the next and from one case to the next, because judgments take place in situations where we expect some room for disagreement among qualified, well-informed, reasonable people.
(Shortform note: This is especially true in the case of evaluative judgments, which occur in situations where individual subjectivity comes into play. We know and accept that some teachers and some judges are more or less lenient than others. All the same, we also expect that our schools, courts, and other public institutions should be fair and consistent. One teacher might give a paper a B+, whereas another might give that same paper an A-. But if the same student paper receives an A from one teacher and an F from another, something has gone wrong.)
(Shortform note: This guide occasionally uses the term “judger” or “judgers” as a generic way to describe anyone who makes a professional judgment as defined above. We use this term to avoid confusion with judges as the people who preside over courtrooms (though by our definition, these judges are also judgers).)
Noise (and Bias)
To improve judgments, the authors argue that we need to reduce error as much as possible by correcting for noise and bias. The authors use the following metaphor to explain noise and bias: Think of a judgment as a target at a shooting range, where inconsistency = noise and inaccuracy = bias.
- If you see a target with all the shots grouped tightly around the bullseye, you know that the shooters were both accurate and consistent (no bias, no noise).
- If you see a target with all the shots grouped tightly some distance away from the bullseye, you know the shooters were consistent but not accurate (no noise, but biased).
- If the shots are generally centered on the bullseye, but not tightly grouped, you know the shooters were generally accurate but inconsistent (not biased, but noisy).
- If you see a target with the shots spread widely and not centered on the bullseye, you know the shooters were both noisy and biased.
The authors make the further point that if you flip over the targets so that you can see the shots but not the bullseye, you can no longer detect bias or accuracy, but you can still easily see noise, because noise refers to the spread of the shots. That means that you can detect and correct for noise without knowing the correct answer to a prediction. It also means that you can detect noise in evaluative judgments, which, as we’ve seen, are situations where there is no correct answer by which to measure the quality of the judgment. In both cases, you can also address noise without needing to know whether the judgments were biased or not. That’s because noise is the degree of inconsistency between one judgment and the next.
Don’t Mistake Low Noise for Accuracy
It’s important to keep in mind that reducing noise is not a matter of improving accuracy per se—it simply means reducing variation. This is one way in which the target shooting metaphor could be misleading. We might be tempted to think that reducing noise would get us closer to the bullseye, when in fact, reducing noise only means bringing our shots closer to each other. They might still be off the mark.
For example, in the image above, it’s possible that after reducing noise, all of the shots might converge around the shot marked “A” in the graphic, rather than around the bullseye as we would hope. In that case, we’d still need to improve our overall aim by reducing bias or finding other ways to be more accurate.
In The Signal and the Noise, Nate Silver provides further caution against conflating noise reduction with accuracy. Silver uses a graphic almost identical to the one in Noise, but in his case, the spread of the shots represents precision—the tighter the grouping, the more precise the forecast. The problem, he says, is that when we look at forecasts, we tend to mistake precision for accuracy with sometimes devastating results—as in the 2008 financial crisis. High accuracy (in other words, low noise) creates the illusion that forecasters are on target and thereby masks both uncertainty and bias.
While noise and bias contribute equally to overall error, the authors stress that noise is a more pertinent problem than bias because it’s less recognized and less understood. They argue that as a society, we realize that bias is an issue and try to prevent or correct for it. The same isn’t true of noise. They also suggest that noise is harder to understand than bias because it occurs on a statistical level (you need a certain number of shots before you can see how spread apart they are) which is different from how we think. We’ll explore this idea at length later.
We might be tempted to think that noise averages out over time. The authors argue that it doesn’t. They point out that if the target being aimed at is a just sentence, an accurate diagnosis, or a prudent business decision, then every miss is costly, and these costs don’t cancel each other out—they compound each other.
Is “Noise” a New Idea?
The authors argue that noise, as they define it, is a new and unexplored idea. Some of Noise’s critics suggest that the book’s arguments are just a repackaged version of previous work by other authors or of common-sense ideas.
To be sure, the ideas in Noise do draw on similar work from other sources. While the definition of noise as variance specific to human judgment does seem to be novel, the broader idea of noise as statistical variance isn’t. For example, Fischer Black identifies noise (which he contrasts with information) as a fundamental component of several economic models. Similarly, Nate Silver defines noise as junk data (as opposed to signal, by which he means useful information or meaningful patterns) in his treatise on how to improve our predictions.
Moreover, Noise builds extensively on ideas that Kahneman previously explored in Thinking, Fast and Slow. As we’ll see, many of the underlying sources of noise trace back to errors and biases outlined in that book. In fact, we could even think of Noise as an exploration of what happens when the thinking errors from Thinking, Fast and Slow manifest on a larger scale in systems and organizations.
Three Types of Noise
Because we have defined noise as the amount of variance in outcomes, it would be easy to think of noise as random. But it isn’t. Once we know what to look for, we can see that noise comes in three main types: level noise, pattern noise, and occasion noise.
1) Level noise occurs when one person’s average judgments differ from the average person’s average judgments in a consistent way. For example, some teachers grade more or less harshly than others over time. Similarly, some economic forecasters are more or less optimistic than others over time. The key idea here is that level noise refers to the overall patterns of each judger compared with the average overall patterns of all judgers. (Shortform note: Level noise may not be consistent over time; in fact, it may be noisy itself. One study has shown that graders inflate scores over time because they mistake their growing comfort with the act of grading for an increase in the quality of the materials being graded. An effect like this doesn’t invalidate Noise’s point so much as it demonstrates how complicated the problem is.)
2) Pattern noise is the deviation that occurs when a judger is unusually affected by a specific situation for one reason or another. For example, a forecaster might typically be more optimistic than most, but a specific scenario (for example, evaluating a startup company) causes her to be more pessimistic than most of her peers would be about the same case.
Pattern noise occurs as the result of people’s personalities and unique experiences. Some of this noise is stable over time. Conversely, some pattern noise is transient—the result of current or recent circumstances.
- For example, maybe our forecaster lost money on startups earlier in her career, and now she is always cautious about them. This is an example of stable pattern noise.
- Or, maybe our forecaster read an article this morning about a different startup that failed spectacularly, and so she’s feeling cautious about startups right now; yesterday she might have felt differently. This is an example of transient pattern noise.
(Shortform note: There’s some overlap between the concept of transient pattern noise and the concept of occasion noise outlined below. Though the authors don’t say so explicitly, the difference seems to be that transient pattern noise results from factors specific to an individual, whereas occasion noise consists of more universally applicable factors that affect everyone in similar ways.)
3) Occasion noise describes the variability within a single person caused by numerous seemingly random factors. The authors point to studies showing that judgments can be affected by any of the following:
- The weather: One study shows that college admissions officers weigh academic credentials more heavily on cloudy days and non-academic factors more heavily on sunny days.
- Mood (and sports): Several studies show that judges sentence more harshly following a loss by their local football team and more leniently following a win. Presumably, their sports-induced mood is influencing their judgment.
- Time of day: Studies have found that doctors are more likely to prescribe opioids toward the end of the day than earlier in the day. It’s possible that when doctors are tired, stressed, and rushed, they make diagnostic mistakes and possibly reach for a quick fix in pill form.
- The order in a series of judgments: When judges have granted asylum to several people in a row, they become increasingly likely to deny asylum (and vice versa), likely due to an unconscious attempt to maintain balance.
- The order that information is presented: If you hear that a politician is smart, driven, charismatic, and ruthless, you probably form a different picture than if you heard about a politician who is ruthless, charismatic, driven, and smart.
How to Fight Occasion Noise
Occasion noise is tricky. Like pattern noise, it can be hard to predict it in advance or even to notice it while it’s happening. Plus, you can’t exactly standardize the weather to make sure everyone gets the same judgments. That said, there are a few techniques that might help minimize the influence of occasion noise:
Some types of occasion noise (such as that arising from the order of information) can be controlled with proper decision-making strategies like the ones listed later in this guide.
Other occasion noise (like tired doctors overprescribing opioids) might be prevented by watching out for information overload and excessive stress; though that’s easier said than done in many professions.
We might be able to mitigate factors like weather, mood, and time of day by having multiple judgers assess a situation independently before comparing notes. The idea is that different judgers will be subject to different occasion factors and will balance out. Ideally, the judgers will also follow a procedure that minimizes occasion noise by telling them what information to pay attention to (see the sample hiring procedure later in this guide to learn how).
The Three Types of Noise in Action
Though the authors break noise into three types for analytical purposes, it’s worth pointing out that in practice, any or all of the three types can be at play in a given situation. To get a sense of how that works, imagine the following. You’ve been convicted of shoplifting. Judge Thompson will decide your sentence.
Judge Thompson, on average, delivers much lighter sentences than his colleagues. This is level noise—and reason for you to be optimistic about your punishment.
His parents own a small retail business that has had serious problems with shoplifting over the years. He therefore sentences shoplifters much more harshly than do his peers. This is pattern noise—and bad news for you.
He just returned from a relaxing holiday and is in a great mood this morning. This is occasion noise—maybe you’ll get a break after all.
If your case had gone to a different judge, all of these variables would be different—and so would your sentence. This is one way the justice system is noisy. But if you’d committed a different crime, or even if you’d caught Judge Thompson on a different day, your sentence would likewise be different. That’s another way the justice system is noisy. And the same holds true for any system relying on human judgments.
Where Noise Comes From
Once we understand what noise is and why it matters, we can move toward finding ways to reduce it. But to do so effectively, we need to look more closely at where noise comes from. We’ve already seen a few sources of noise–such as personal biases (preferences, backgrounds, affiliations, beliefs, and so on) that lead to level noise and pattern noise, as well as the more random factors (mood, weather, the order or timing of decisions or information, and so on) that contribute to occasion noise. In addition, noise occurs because of the way our minds see the world and because of the way we act in group situations.
(Shortform note: A lot of the ideas in this section reflect Kahneman’s previous work in Thinking, Fast and Slow. Noise acknowledges these connections but mostly glosses over them. We spell them out more clearly below since they help clarify the ideas in this section.)
Psychological Source #1: Cause and Effect Thinking
The authors argue that a major reason why our judgments are noisy is that we think about the world in terms of cause and effect. This is also why we don’t notice noise and have a hard time understanding it when it’s pointed out: noise is statistical and made up of many cases, whereas our minds tend to consider one case at a time.
This preference for causality is misleading because it’s shaped through the lens of hindsight. Once an outcome is known, we examine what we know about the situation and attribute one or more of those factors as the cause of the outcome. The authors point out that most events are neither completely surprising nor completely expected. We don’t give much consideration at all to these “normal” events, and as a result, it seems like they would have been completely predictable–when in reality, we couldn’t have reliably predicted them had we tried.
Because we think the causes of everything that happens around us are obvious and inevitable, we think we can predict things. We don’t see just how arbitrary and contingent most events are (at least from the perspective of prediction). The authors explain that this is because the relevant causes often only become known at the same moment the outcome is known.
The Narrative Fallacy
These ideas are related to the narrative fallacy from Thinking, Fast and Slow, whereby we explain occurrences as though they fit a coherent story, when in fact they may have been completely random. Imagine that a company conducted a round of layoffs by firing employees whose names were drawn out of a hat. If you didn’t know how the layoffs were decided, and if one of the fired employees was your friend, you might recall an argument your friend recently had with her boss and assume the boss had it out for her.
If, on the other hand, the employee in question was someone you didn’t like much, perhaps you’d attribute the firing to some deficiency in skill or character. In either case, you would probably conclude that the firing made logical sense (even if it was unfair) and believe that it was foreseeable; in reality, it was entirely random.
Psychological Source #2: Matching Operation
Another source of noise arises from an intuitive matching operation by which we attempt to predict or evaluate something by comparing it to similar things we have more information about. This operation introduces noise because of the oversimplifications inherent in the procedure, as well as the limit on our ability to discern quantitative differences with any great resolution. (Shortform note: This is a type of heuristic, a concept explored in Thinking, Fast and Slow. A heuristic is an operation our mind performs to solve a difficult problem quickly. Specifically, the mind tries to substitute something easier or more familiar to generate an answer.)
For example, you can tell if it’s sunny or cloudy out. That’s a qualitative judgment: Are there clouds in the sky or not? You can generally tell if it’s hot or cold, too. But if you were exposed to a series of different temperatures and asked to rank them from coldest to warmest, you would quickly make mistakes. (According to the authors, studies have found that we can rank things into about seven levels of quality or intensity before we start to make ranking errors.) You’ll do okay if you can directly compare one item to another, but given a set of items and asked to rank or categorize them, you’ll make errors more easily than you’d think.
Finally, the authors point out that any type of judgment that requires assessing things on a scale becomes noisier as the scale becomes less defined. Without proper context and a shared frame of reference for what values mean and how they should be assigned, judgers are forced to guess in a way that makes the judgment arbitrary. Since each person guesses differently, the scale becomes noisy. (Shortform note: We’ll explore ways to improve rating scales later in this guide.)
Social Sources of Noise
Not only do we each produce our own noise through the way we understand the world, but when people work in groups to reach judgments, social factors add new sources of noise.
For one thing, popularity (real or perceived) affects how people view information. An idea that receives public support or popularity early on is more likely to succeed, regardless of the idea’s inherent merit. This phenomenon is called an information cascade. When one person shares an opinion, the next speaker is more likely to agree with that person unless they have good reasons not to. The effect becomes stronger with each person who gets on board with the initial opinion. When a group is making a judgment, many people might start out undecided or with mixed feelings. Their decision, therefore, is usually determined by the opinion that began the information cascade.
Because most people are in agreement at the end of the process, we think the outcome was inevitable—but it wasn’t. Given a different starting point, a different outcome could have occurred. This isn’t obvious to us because each real-world situation (like this one) only plays out once.
Groups are also susceptible to polarization, which means that members move to a more extreme version of their initial opinions. If each member of a hiring committee feels mildly enthusiastic about candidate A, by the end of the meeting, they might now feel passionately excited about candidate A. Conversely, if some members feel mildly enthusiastic about candidate A while others feel mildly enthusiastic about candidate B, then the polarizing effect could lead to a stalemate in which half the committee strongly supports A while strongly opposing B, and vice versa.
Overcoming Groupthink
Information cascades and polarization can also feed into each other. Through random chance, the first speaker influences the group in a certain direction (starting an information cascade), and then the polarization effect ensures that the group moves decisively in that direction—even if no member of the group felt particularly decisive about any direction when they came into the meeting. The interaction between these two phenomena might be one source of what’s traditionally been called “groupthink.”
In Originals, Adam Grant argues that in a corporate setting, groupthink also directly results from the company’s attitudes toward dissenting voices. In these settings, being transparent, inviting dissent, and choosing leaders who genuinely welcome criticism can all help reduce the risk of groupthink when making decisions. These principles are worth keeping in mind when we discuss the wisdom of crowds later in this guide. You can only tap into crowd wisdom when a group consists of people with different viewpoints and when those people feel free to voice their ideas.
How to Reduce or Eliminate Noise
Now that we understand what noise is and where it comes from, we can look at steps to reduce or eliminate noise from judgments. The authors of Noise offer a few solutions, including mechanical judgment tools (models and algorithms) that can replace or augment human judgment as well as a set of suggestions for reducing noise in human decision making.
Detecting and Measuring Noise
Typically, the first step in reducing noise is figuring out how much noise there is in the first place. This step is necessary because administrators tend to believe that their organizations make judgments consistently, and until they can see the problem firsthand, they may be resistant to change.
To determine how much noise is present in a company, organization, or system, the authors outline a noise audit process they use when consulting with businesses. The book includes an appendix with detailed guidelines for conducting a noise audit. The general gist is that an organization would give a set of sample cases to all of its members whose job it is to make judgments about such cases. For example, an insurance company would give a set of sample claims to all of its adjusters. The judgers being audited complete their judgments independently, and then the results are compared to see how much variability there is throughout the organization.
Mechanical Judgments
Once noise has been detected, there are several options for reducing it. One option is to remove human judgment from the equation altogether. To do so, decision-making can be handled via statistical models or computer algorithms.
Though they touch on several mechanical judgment methods (which we’ll explain briefly below), the authors are more interested in reducing noise in human judgment rather than replacing human judgment with mechanical judgment. This is in part because, as the authors point out, mechanical predictions currently can’t do anything humans can’t do—they just do it with better predictive accuracy. The authors argue that this improved accuracy mostly results from the elimination of noise, and so we might see the efficacy of mechanical judgments more as a demonstration of the benefits of noise reduction rather than as a blanket solution to the problem of noise.
(Shortform Note: Although the authors come down in favor of improving rather than replacing human judgments, they perhaps don’t make this point as clearly as they could, given the way some reviewers focus on the dangers of algorithms as a major criticism of the book’s recommendations. Indeed, the authors spend a lot of time explaining models and algorithms and defending them from potential criticism, which perhaps creates a misleading impression of how central they are to Noise’s proposed course of action. To keep the focus on ways to improve human judgment, we’ve kept the following discussion of models and algorithms brief and to the point.)
Statistical Models
One way to make predictions is by using a statistical model. A statistical model is a formula that uses weighted variables to calculate the probability of an outcome. For example, you could build a statistical model that predicts the likelihood of a student graduating college by assigning weights to factors like high school GPA, SAT scores, number of extracurricular activities, whether the student’s parents graduated college, and so on.
Studies have shown that simple statistical models that apply weighted averages of relevant variables consistently outperform human predictive judgments. In fact, the authors provide studies suggesting that any statistical model, whether carefully crafted or cobbled together at random, can predict outcomes better than humans can.
The authors argue that this superior performance is simply because statistical models (and by extension, algorithms) eliminate noise. Even the crudest or most arbitrary model has the advantage of being consistent in every single case. And while human judgers can weigh subtle subjective factors that a model can’t take into account, the authors suggest that this subjectivity tends to add more noise than predictive clarity. As we saw earlier, we’re not very good at recognizing which factors are relevant to our predictions.
Computer Algorithms
Another more recent and more complex form of mechanical judgment is the computer algorithm. The authors explain that computer algorithms build on the basic idea of statistical modeling, but they also come with additional benefits that improve their accuracy. Because they take into account massive data sets and can be programmed to learn from their own analysis, algorithms can detect patterns that humans cannot. These patterns can form new rules that improve the accuracy of the judgments.
The authors acknowledge that algorithms are not perfect—and that if they are trained using data that reflects human bias, they will reproduce that bias. For example, if an algorithm built to predict criminal recidivism is built from a data set that reflects racial biases in the justice system, the algorithm will perpetuate those racial biases. (Shortform note: For example, after years of development, Amazon discovered that its recruitment algorithm systematically favored men over women. Likewise, Facebook’s advertising algorithms have come under fire for helping to spread everything from fake news to hate speech.)
Combining Mechanical and Human Judgment
Because the authors are most interested in finding ways to improve human judgment, they don’t give much attention to the option of combining human and mechanical judgment. This hybrid approach has real-life precedents and may sometimes be the best way to tackle a problem.
For example, after the success of Michael Lewis’s Moneyball, some baseball teams began favoring rigorous statistical analysis over traditional scouting when deciding which players to acquire. At the time, there wasn’t a great statistical way to measure players’ fielding skills, so some teams neglected defense in favor of more easily measured offensive skills. In practice, these teams gave up so many runs that they offset the benefits of their new statistical approach.
In more recent years, most teams have adopted statistical modeling techniques, but the most successful teams have combined these models with old-fashioned human scouting. This hybrid approach works because scouts can account for things that models can’t, such as the mental factors needed to succeed in professional baseball.
Baseball provides a counterargument to Noise’s cautions against human subjectivity. But the key here is that teams have learned how to combine human and mechanical judgments in ways that maximize the strengths and minimize the weaknesses of each.
Decision Hygiene
Despite the potential advantages of mechanical judgments, the authors are most interested in finding ways to reduce noise in human judgments. They say that the best way to improve human judgments is by implementing “decision hygiene”—consistent, preventative measures put in place to minimize the chance of noise. Decision hygiene consists of a loose set of suggestions, practices, and principles which we explore below. (Shortform note: With one exception (see the Sample Hiring Procedure below), the authors don’t lay out a specific, systematic course of action. Presumably, organizations should strive to implement as many of the following suggestions as are relevant and practicable.)
Think Statistically
Recall that our normal, causal way of thinking is prone to errors and biases that manifest as noise. To make our thinking more accurate, we have to take a statistical view. The authors suggest that instead of treating each case as its own unique item, we should learn to think of it as a member of a larger class of similar things. Then, when predicting how likely an outcome is, we should consider how likely that outcome is across the whole class. Returning to an earlier example, if we’re trying to predict the likelihood that a student will graduate from college, we first need to know what percentage of all incoming college students end up graduating from college.
How to Think Statistically
Our failure to think statistically is a major theme in Thinking, Fast and Slow. In that book, Kahneman offers a more detailed look at thinking errors of this type and suggests ways to overcome them. As is also suggested in Noise, the basic idea is to take base probabilities into account.
In The Signal and the Noise, Nate Silver suggests another approach to statistical thinking based on a statistical formula known as Bayes’ Theorem. When making a prediction using Bayes’ Theorem, you start with a preliminary guess about the likelihood of an event. Ideally, this guess is based on hard data, such as a base probability. Then you make some calculations in which you adjust the starting probability in the face of specific evidence relating to the thing you are trying to predict. Finally, you repeat this process as many times as you can, each time starting with your most recently updated probability.
This approach has two advantages. First, it explicitly accounts for the noise in human judgment by building human estimates and predictions into the formula. Second, it calls for repeated testing of a prediction or hypothesis in order to improve accuracy in response to updated evidence. Interestingly, Silver argues that a Bayesian approach would have prevented the replicability crisis that has recently plagued the sciences—including some of the studies in Thinking, Fast and Slow.
Choose (and Train) Better Judgers
The authors argue that it’s possible to improve the quality of human judgers. We can do so by finding better judgers in the first place and by helping judgers improve their techniques and processes.
There are two factors to keep in mind when identifying good judgers. Some fields deal with objectively right or wrong outcomes; in these cases judgers can be measured by their objective results. However, as the authors point out, other fields are based instead on expertise, which can’t be measured with a metric. But judgers in any field can be assessed by their overall intelligence, their cognitive style, and their open-mindedness; these traits are correlated with better judgment skills. The authors emphasize, however, that intelligence alone doesn’t make someone a good judger. The other two traits are just as important, if not more.
The authors also note that some members of the general population are superforecasters, and their predictions are consistently more accurate than those of the average trained expert. Ideally, these are the people we should hire or appoint as judgers. The authors identify several traits exhibited by superforecasters that we can use to choose better judgers, or to better train the judgers already in place:
- They are open-minded.
- They are willing to update their opinions and predictions when new evidence arises.
- They naturally think statistically; unlike most of us, it does occur to them to consider factors like base rates.
- They break down problems and consider elements using probability rather than relying on a holistic “gut feeling” about the answer.
Hedgehogs and Foxes
Noise’s discussion of superforecasters draws on the work of Philip E. Tetlock and Dan Gardner. In Superforecasting, Tetlock and Gardner offer a particularly colorful description of what makes superforecasters so super: They tend to be foxes, not hedgehogs. The basic idea is that a person with a hedgehog personality tends to see the world through the lens of one big idea, they make snap judgments about things, and they’re extremely confident in their predictions. By contrast, a person with a fox personality tends to collect little bits of information about a lot of things, approach a problem slowly and from multiple angles, and be cautious and qualified about his or her predictions.
As you might guess, Tetlock and Gardner suggest that foxes make better predictors than hedgehogs. Luckily, the rest of us can practice fox skills, too. We can learn to recognize and avoid our own cognitive biases. We can generate multiple perspectives on a problem. And we can learn how to break down problems into smaller questions.
If these techniques feel familiar, that’s because they are essentially the same as many of the recommendations in Noise.
Sequence Information Carefully
Because judgments are subject to influence from information, contextual clues, confirmation bias, and so on, it’s important to carefully control and sequence the information that judgers receive. The authors provide a few guidelines for implementing this strategy:
- As a basic rule, judgers should only be given what they need when they need it.
- We must make sure independent judgments are in fact independent; if the person verifying the result knows the first person’s conclusion, he or she is more likely to verify it.
- Finally, the authors suggest that judgers should document their conclusions at each step in the process; and if new information leads them to change their decisions, they should explain and justify why.
(Shortform note: It’s also important to consider how much information judgers receive. Both Malcolm Gladwell and Nate Silver point out that information overload leads to bad decisions, either because we don’t focus on what’s most important, or because we get overwhelmed and fall back on familiar patterns and preconceived notions.)
Aggregate Judgments
Another way to reduce noise, and to actually turn it into a positive, is by aggregating judgments. You can collect several independent judgments and then compare or average them; or you can assemble teams who will reach a judgment together. According to the authors, these techniques harness the wisdom of crowds, a demonstrated effect by which the judgments of a group of people tend, as a whole, to center on the correct answer.
This technique works best if you assemble a team whose strengths, weaknesses, and biases balance each other out. The idea is to get as many different perspectives on a problem as you can in hopes of finding the best answer somewhere in the middle.
(Shortform note: The authors say elsewhere that noise doesn’t average out, but that’s for a bunch of noisy decisions in a system; here we’re talking about averaging out opinions before a final decision is made and before any action is taken.)
One practical way to aggregate judgments within a typical meeting setting is the estimate-talk-estimate procedure:
- First, each member of the group privately makes an estimate—some kind of forecast, prediction, or assessment.
- Then each person explains and justifies his or her estimate.
- After the discussion, each member then makes a new estimate based on the discussion. These second-round judgments are aggregated into a final decision.
Because this procedure requires that each person start with an independent judgment, it reduces the noise that comes from information cascades and polarization. At the same time, it balances individual psychological biases by encouraging outlier opinions to move toward the middle. (Shortform note: This estimate-talk-estimate procedure has drawbacks as well. For example, because its goal is to build consensus, it can discourage dissent and lead to a false sense of agreement much like the information cascades it is meant to avoid. Alternative approaches like policy delphi and argument delphi avoid this pitfall by aiming not at consensus, but at generating a wide range of dissenting perspectives.)
How to Make Better Judgments on Your Own
Most of the authors’ suggestions for noise reduction are targeted at organizations, but what if you want to improve your own judgments as an individual? Some of the suggestions in this section are simple enough to adopt as an individual. For example, you can practice thinking statistically or breaking down problems on your own. But how can you aggregate judgments if you are working alone instead of in a group?
The trick is to generate as many perspectives as possible before you make a decision or a prediction. One way to do that is to read as much as you can about the problem at hand. Find as many different perspectives and opinions as possible—remember, you are trying to replicate the benefit of crowd wisdom, which only works when you bring together a diversity of viewpoints.
Another way to generate alternate perspectives is to deliberately search for information that would disprove your prediction or your preferred course of action. This technique is called negative empiricism, and it gives you more perspective on a problem while also avoiding some of the logical fallacies you might otherwise fall prey to.
Break Judgments Into Smaller Components
The authors suggest that it’s easier to avoid noise when you break an overall decision into a set of smaller, more concrete subjudgments. Standardized procedures, checklists, guidelines, and assessments help here. For example, educators can reduce noise in essay grading by using rubrics. Asking the grader to assign individual scores to the paper’s originality, logical clarity, organization, and grammar before computing a final grade makes judgment easier. Breaking down a judgment in this way also helps make sure that every judger is following the same procedures and paying attention to the same factors.
The authors concede that this strategy isn’t perfect. They point out that in the field of mental health, the DSM—a book meant to aid and standardize mental diagnoses—has hardly reduced diagnostic noise. One reason is that psychiatrists and psychologists are likely to read signs and symptoms through the lens of their training and background. In other words, different theoretical understandings of the mind and of these kinds of disorders shape how different professionals understand the facts with which they are presented.
Are Some Fields Just Noisy?
The authors suggest that mental health diagnoses are inconsistent because of the different training and theoretical orientations of different mental health professionals. That’s true, but there’s also reason to think that mental health might be an inherently noisy field.
One reason for this is that mental health conditions overlap and influence each other: if you suffer from depression, there’s a good chance you also suffer from anxiety. Likewise, it can be difficult to separate mental health from physical health. Moreover, professionals disagree on the best practices for diagnosing and treating mental health issues, including basic questions such as whether a given set of symptoms is a disorder or just a difference.
These factors suggest that some fields might be more prone to noise—and more resistant to noise reduction—than others. That’s not to say that mental health care, for instance, can’t be made less noisy. Doing so just might require analysis and reform that is beyond the scope of the noise hygiene techniques we’re exploring here.
Use Rules and Standards
One way to break judgments into smaller parts is to implement rules and/or standards. (Shortform note: The authors introduce rules and standards as part of a larger discussion about the pros and cons of implementing noise reduction. We think it’s worth looking at rules and standards as noise hygiene strategies, which is why we’ve included them here.)
- Rules offer explicit guidance typically tied to objective measures. For example, there is a maximum allowable blood alcohol content above which a driver can be charged with drunk driving.
- Standards are suggestive guidelines that require some amount of subjective interpretation and implementation. For example, law enforcement officers are trained to recognize potential signs of impairment (e.g., erratic driving) and to issue field sobriety tests.
In deciding between rules and standards, the authors say we should first determine which will lead to more errors. They also point out that sometimes it isn’t possible to implement rules because the people making the rules can’t agree (for example, because of political or moral differences) or because the people making the rules don’t have the information needed to write an appropriate rule.
The authors further suggest that in some cases, the best approach is to combine rules and standards. Mandatory sentencing guidelines take this approach, setting a minimum and maximum sentence for a given crime (rule) and otherwise asking judges to determine a just sentence for each individual case (standard).
Second-Order Decisions
Rules and standards are examples of what Sunstein and Edna Ullmann-Margalit call second-order decisions—strategies we use to reduce our cognitive burdens when decisions are too numerous, too repetitive, too difficult, or too ambiguous to make one by one. Other second-order decisions include:
Presumptions, which are rule-like guidelines that allow the possibility of exceptions in some cases.
Routines, such as always brushing your teeth right before bed.
Taking small, reversible steps, such as pet-sitting for a neighbor’s dog before making the commitment to adopt a dog of your own.
Picking at random rather than choosing deliberately, such as throwing a dart at a map to decide where to go on vacation.
Delegating, such as allowing your partner to choose dinner tonight.
Heuristics, such as the matching operation described earlier in this guide.
Use Better Scales
As noted earlier, a lot of noise comes from our attempt to judge things using scales. If the scale is unclear, too complex, or inappropriate for the task, there will be noise. If the scale requires judgers to interpret or calibrate the scale themselves, there will be noise. Therefore, in cases where scales are useful or necessary, we need to design better ones.
The authors argue that as a general rule, comparative scales are less noisy than absolute scales. The authors give the example of job performance ratings, which are noisy in part because the traditional numerical scales are unclear and are interpreted differently from one reviewer to the next. What constitutes a “6” in “communication skills,” or in “leadership?” Without explicit guidance about what the numbers mean and how they correlate to the qualities they measure, each person will have a different understanding of how to score an employee.
Instead of evaluating employees in terms of an absolute number, the authors say it’s better to rank employees. For example, on an employee’s communication, ask whether their skills fell in the top 20% of the company, or the next 20%, and so on. As noted earlier, we are generally better at comparing things than at quantifying them in the abstract.
(Shortform note: Recall the earlier discussion of matching operations and the way our minds substitute an easier question in place of a more complex one. Without clear guidance, something similar probably happens with a vague rating scale, as we replace the question “How does X’s communication rate out of 10?” with something like “How impressed am I with X’s communication?” or “How clear do I find X?”)
A comparative scale also provides concrete anchor points and clear descriptions or markers for each point. A good anchor point correlates a specific value on the scale with a relevant example of the thing being evaluated (if you’re grading a paper and you know that a “C” grade represents average work, that’s your anchor point). To minimize noise, anchor points should be provided ahead of time so that each judger starts with the same frame of reference.
(Shortform note: Anchoring is another concept drawn from Thinking, Fast and Slow. The basic idea of anchoring is that an initial piece of information (for example, a suggested donation amount) has a major influence on the actions we take (in this case, how much we decide to donate). By suggesting that scales come with clear anchor points, the authors of Noise seek to take advantage of this psychological effect by using it to calibrate judgers’ assessments.)
Example: A Sample Hiring Procedure
To get a sense of how to apply these decision hygiene practices in a real-world setting, the authors provide an overview of Google’s hiring process. In brief, the process is as follows:
- Determine what skills are most important to the position you are hiring for.
- Develop scales to measure each candidate on each skill determined in step 1.
- Interview each candidate multiple times with different interviewers (Google uses four interviews). The purpose of the interviews is to rate the candidates on their skills. Interviews must be conducted independently of each other (interviewers can’t compare notes yet).
- The hiring team meets to discuss their results, review the data they have amassed, and, finally, to share their impressions and make an overall decision.
Putting It All Together
The above process synthesizes many of the suggestions we have explored in the second half of this guide. The authors point that out, but not in a clear and methodical way. The following analysis shows how Google’s procedure incorporates several decision hygiene techniques:
In step 1, the company breaks down a bigger judgment—Whom should we hire?—into smaller components.
In step 2, the company uses the insights into building better scales to ensure that they collect high-quality, consistent data
In step 3, the company carefully sequences information, making sure interviews are truly independent of one another. This sequencing is aided by setting clear rules and standards that govern the actual interviews and keep them consistent and on point.
In step 4, the company aggregates judgments by asking multiple interviewers to reach a decision together.
On a larger level, the process as a whole also sequences information by asking interviewers not to consider their subjective impressions until this last step. By doing so, the company leaves room for its hiring team to have personal reactions to candidates, but it makes sure those reactions are mediated by the hard data collected throughout the rest of the process and by the group wisdom of the multiple interviewers (each of whom has had a chance to independently form an opinion about the candidate).
While this procedure specifically describes a hiring process, the authors point out that the process can easily be adapted to other types of business decisions, such as whether to make an investment or whether to acquire or merge with a rival company. (Shortform note: The authors provide a detailed hypothetical scenario to show how to do this. To keep it simple, refer back to the principles outlined above: Break down the problem, figure out how to gather the data you need, keep careful control over the information-gathering process, and then aggregate the data and the resulting judgments.)
Shortform Takeaway: Improving Evaluative Judgments
As you’ve seen throughout this guide, a lot of the ideas contained in the book have been explored elsewhere, including in the authors’ own previous works. Yet there is at least one important takeaway from the book that does seem like a new idea, and that is the argument that we should treat evaluations the same way we treat predictions.
To explain that point further, recall that Noise breaks judgments down into two types: predictive (e.g., forecasting a stock’s future value) and evaluative (e.g., grading an essay). We’re accustomed to accepting that evaluative judgments are inherently subjective—there is no correct answer by which to measure their quality, and so it seems that there is no way to improve the quality and accuracy of evaluations. Certainly there is less literature on how to improve evaluations than there is on how to improve predictions.
Yet, if we accept Noise’s arguments that evaluations and predictions are the same kind of thing (judgments), that both suffer from noise to the same degree and for the same reasons, and that reducing noise is a good thing, then it follows that we can improve our evaluative judgments. We can do so by subjecting our evaluations to the same advice that many authors have already offered for making better predictions. It’s a point that Noise makes early on but warrants a highlight, because this insight appears to be truly original.