Edge Master Class 2015: A Short Course in Superforecasting, Class IV

Edge Master Class 2015: A Short Course in Superforecasting, Class IV

Skillful Backward and Forward Reasoning in Time: Superforecasting Requires "Counterfactualizing"
Philip Tetlock [9.15.15]

 


Edge Master Class 2015 with Philip Tetlock
— A Short Course in Superforecasting
 —

| Class 1Class 2 | Class 3 | Class 4 | Class 5 |


Philip Tetlock:    What I want to do today is three things: I want to clear up some confusion from last time about counterfactual reasoning and how it’s intertwined with superforecasting; I want to link up some of the things that are in the final sets of slides on sacred values and taboo cognition and superforecasting; Then, I want to give you some examples of superforecasting in action and talk about condensing it all into four big problems and one killer app solution.

First, there is some confusion about why thinking about possible paths and counterfactuals would be relevant to superforecasting. Let me give you two concrete examples of how superforecasting is correlated with how people think about the past and possible pasts—ways things could have been—and why that’s intricately intertwined with the skill sets that are essential for long-term performance as a superforecaster.

A famous economist, Albert Hirschman, had a wonderful phrase, “self-subversion.” Some people, he thought, were capable of thinking in self-subverting ways. What would a self-subverting liberal or conservative say about the Cold War? A self-subverting liberal might say, “I don’t like Reagan. I don’t think he was right, but yes, there may be some truth to the counterfactual that if he hadn’t been in power and doing what he did, the Soviet Union might still be around.” A self-subverting conservative might say, “I like Reagan a lot, but it’s quite possible that the Soviet Union would have disintegrated anyway because there were lots of other forces in play.”                                                              

Self-subversion is an integral part of what makes superforecasting cognition work. It’s the willingness to tolerate dissonance. It’s hard to be an extremist when you engage in self-subverting counterfactual cognition. That’s the first example. The second example deals with how regular people think about fate and how superforecasters think about it, which is, they don’t. Regular people often invoke fate, “it was meant to be,” as an explanation for things.

When someone wins a lottery, they know that there is a huge component of chance in lotteries, but they really don’t know it. When someone wins a lottery because they chose the magic ball number on the basis of the birth dates or death dates of loved ones, they’re inclined to say that it was meant to be; it had to work out the way it did. Many regular people do that. Superforecasters think that’s total nonsense. It was statistically inevitable that someone was going to win, and someone won.

Superforecasters take it even further into other realms where even very sophisticated people don’t take it. I don’t know how many of you ever saw the movie, Sliding Doors, with Gwyneth Paltrow, in which her fate hinges on whether or not she makes a subway connection, hence the Sliding Doors title. In the world in which she doesn’t make the connection, she goes back home and discovers that her husband is a jerk and her life gets onto a much better track. In the world in which she does get onto the subway, her life continues on a much more miserable course. This is the idea of what Danny called in one of his earlier papers, “close-call counterfactual.” A lot of movies have close-call counterfactuals, things that almost happen, cliffhangers. It’s a very common theme in Hollywood.                                                              

When you ask people to think about ways in which they might not have met their loved one—you might not have gone to that party, or you might have missed that subway connection, or you might have taken a different job somewhere else—we all know that there are countless contingencies in life that underlie our meeting and our connecting with particular people who are near and dear to us now.                                                              

When you bring that to the attention of ordinary people, that makes them all the more convinced that the love of their life was meant to be; it was fated in some mysterious, teleological fashion. Don’t push me on what they mean by “fated” because they don’t really know. They sense it was fated, that it was meant to be. When you ask superforecasters this, they look at you as if you’re crazy. That’s just not the way the world is. They’re willing to tolerate the dissonant thought that, yes, I could easily be married to somebody else, I could easily be with somebody else doing very different things. Superforecasters’ minds are wired in different ways, and how they think about possible paths sheds valuable light on that. I hope that helps to clarify some of the confusion about why we were talking about counterfactuals. They’re important diagnostic tools for assessing people’s mental models of the past, and those mental models very much connect to how they think about possible futures. You can’t dissociate them.                                                              

The other thing I want to bring up from last time was something Dean Kamen brought up toward the end about the legal system as a forecasting system. I responded by talking about a famous article by Barack Obama’s constitutional law professor, Larry Tribe, “Trial By Mathematics.” He warned against thinking of the legal system as a forecasting system, as making explicit tradeoffs between false positives and false negatives. Most people in the legal system resonate to Tribe’s argument that it would delegitimize the system if they were to explicitly acknowledge that they’re making those sorts of tradeoffs.                                                              

When you talk with superforecasters about this, they tend to dislike Tribe’s argument. When you talk to law professors, a lot of them say, “Yeah, that’s a very important function of the legal system. Even if it means that we have to degrade our accuracy, it’s better to maintain this legitimation myth that ‘beyond a reasonable doubt’ really is beyond a reasonable doubt. And no, we’re not going to talk about 88 percent or 97 percent probabilities of guilt to try to become granular about it. It’s going to be this way or that way, and when it’s that way, it’s just that way. There’s a finality to it all.” Superforecasters tend not to like that, even when they understand the legitimation argument, they tend not to resonate to it.                                                              

When I asked around the table how many of you agreed with the Tribe thesis that it’s a bad idea to be explicit about those sorts of tradeoffs, a significant number said that, yes, it was probably a bad idea, and a significant number said, no. An even more significant number of you didn’t know or were otherwise engaged. How is that relevant to superforecasting?                                                              

Superforecasters have less respect for taboo boundaries on cognition. They chafe at the boundaries of the thinkable imposed by the social order. They’re more likely, for example, to be puzzled by why people would get annoyed at Larry Summers when he made some of his more famous controversial remarks in the course of his career. One of the early ones that many people have forgotten now, when he was the Chief Economist at the World Bank, was that he wondered out loud in a memo why Central Africa was so under polluted, under polluted.                                                              

It was a rigorous economic argument that there’s a lot of effective demand for clean air in rich countries, so why don’t they move their dirty industrial operations to Central Africa where they desperately need jobs and they’d raise appallingly low per capita incomes? Everybody would be better off. This is a Pareto improvement. Anyone taking Econ 101 should get the point, but a lot of people were outraged by Econ 101. Superforecasters aren’t, and that’s one of the reasons why Larry Summers did not become the Chair of the Council of Economic Advisors in the first Clinton Administration. There were people in the Gore team and elsewhere in the Clinton team who said, “Do we really want this guy being the top spokesperson for the Administration on Economic Policy?” They thought again.                                                              

He also ran into trouble at Harvard when he speculated about the causes of why women are underrepresented in STEM disciplines. He wasn’t drawing any conclusions, he was generating a lot of hypotheses. If there’s anything superforecasters do not have a problem with, it’s generating hypotheses. Hypothesis generations would never be a taboo topic for superforecasters. There are lots of ways in which superforecasters run against the social grain.

I’ll give you one more example inside the intelligence community, and it’s the same basic issue we’re talking about with Larry Tribe and the legal system. A wonderful movie recently came out, The Good Kill, with Ethan Hawke. It’s a good movie. It’s about these guys who operate drones out on the bases in the desert of Nevada, and they make life and death decisions about when to press the button. They have lots of guidelines about when to do it and they have supervisors buzzing over them all the time, but there are no explicit tradeoffs about how much collateral damage is successful in order to get a terrorist of a certain level of importance. There are no explicit guidelines for how much collateral damage is acceptable to save a certain number of American lives. These are tradeoffs that you don’t want to have expressed. If someone were to convince the people in the Department of Defense and CIA in charge of these operations that we could achieve Pareto improvements—we could reduce the level of collateral damage, reduce side casualties, and increase our hit rate for terrorists—if we made people more aware of how exactly they’re making their tradeoffs, and we trained them in certain ways to navigate the tradeoff space, the Department of Defense would then have a paper trail that says X number of civilian lives is worth X number of terrorist kills or saving American lives.                                                              

Danny once quoted the former President of France talking about the Phillips curve saying, “That number should not exist.” There are certain kinds of tradeoffs you don’t want to have existing, and superforecasters balk at that. Phillips curve is an unemployment-inflation tradeoff framework in macroeconomics. Again, a politician doesn’t want to be associated with, “I’m willing to throw this many people out of work to get the inflation rate below X.” Those are explicit things you just don’t want to say. Superforecasters are much more likely to say them, which may be one of the reasons why superforecasters don’t rise very high in deeply politicized organizations. Does that make it a little clearer how counterfactuals are relevant to superforecasting? Good, okay.

What I’m going to suggest is that we take a quick look at slides in sessions five and six toward the very end. If you see anything in there that you would like to talk about further, I’d be glad to talk about it. I want to set aside most of our time this morning for you talking rather than my talking. I am particularly interested in getting your ideas about where work in forecasting tournaments should go next.                                                              

While you’re looking at those slides, I’ll just say one other thing briefly. I sent by email an article that Aaron Brown wrote. He is the Chief Risk Officer of AQR—American Quantitative Research—and it’s an example of how a smart senior executive could use the superforecasting ten commandments to solve problems that he or she cares about.                                                              

As I said yesterday, there is no substitute for getting on the bicycle and trying to ride it. That’s what Aaron Brown does every day. He is/was a world class poker player, he loves to gamble, he is a serious Wall Street guy. He likes continually testing his probability judgments; he’s immersed in the habit of doing that over and over again.

Is there anything in the sessions five and six that catches your imagination? A couple of the things I’ve already mentioned.                                                     

Rodney Brooks:    What’s the Lincoln Bedroom story?                                                     

Tetlock:    In the mid-1990s, there were accusations that the Clinton Administration was auctioning off access to the Lincoln Bedroom to the highest political donors. The market clearing price seemed to be around $250,000, and there was outrage. The Clinton Administration responded, “We’re not auctioning it off. This is politics. Friends do favors for friends.”                                                              

If you frame it as friends do favors do friends, that’s called an “equality-matching social schema,” and it becomes much more acceptable. When you frame it as an auction, it’s a market pricing social schema, and that makes it much more offensive, indeed illegal. Republicans were very suspicious throughout, but Democrats were much more likely to cut Clinton slack when he gave an equality-matching excuse. It’s an interesting example of how even though we claim to have a great concern for protecting sacred values, we often can switch it off tactically. There is a debate in the research literature about whether sacred values are pseudo-sacred. A lot of superforecasters would think the whole notion of sacred values is silly. Of course, when people claim that they’re sacred, they’re pseudo-sacred because you can’t attach infinite importance to anything in a world of scarce resources.                                                     

Margaret Levi:    I want to move that into thinking about hedgehogs and about the whole ecosystem that you’ve been talking about in terms of the intelligence community. You’ve been focusing on the foxes here, and there’s also a role, as you say, for hedgehogs. Obviously, hedgehogs can think maybe in a longer time dimension, they probably have more difficulty making the tradeoffs around those values, but they must bring some other things into the ecosystem that make it valuable.                                                     

Maybe you could elaborate a little bit about that. The secondary piece of that is, to what extent can somebody be both a fox and a hedgehog? Is that an impossible set?                                                     

Tetlock:    Isaiah Berlin certainly thought hybrid creatures are possible. He either said Tolstoy was a hedgehog who wanted to be a fox or a fox who wanted to be a hedgehog.                                                     

Levi:   But could he be both?                                                     

Tetlock:    He implied yes.                                                     

Rory Sutherland:    It was the second one.                                                     

Tetlock:    Here is another slightly difficult thing to wrap your head around, and that is that hedgehogs represent a core scientific value. Any idiot can explain a lot with a lot. Parsimony is a core scientific value. The capacity to integrate a wide range of facts within an elegant set of internally consistent explanatory principles is regarded as a very high achievement in science. You want to give Nobel Prizes for things like that. It’s really good stuff. It’s a style of thinking that is ranked very high in the scientific status pecking order.

More fox-like thinking is regarded by many famous scientists as being somewhat second rate. Those are the guys who come up and clear up the anomalies, the complications, the messy stuff—the cleanup crew. It is interesting that the world seems to be structured into a somewhat perverse fashion that a style of thinking that works best in basic science may not work very well when it comes to applied forecasting in messy real world events. It’s a bit of a perverse twist of the knife.                                                     

Levi:    But they, obviously, for you in the ecosystem, work together. I’m still looking for how they work together in the intelligence community ecosystem. The emphasis has been on the fox.                                                          

Tetlock:    There is a multiplicity of ways they can work together. Tom Friedman, for example, has a certain hedgehog-iness to him. He was often characterized as a globalization hedgehog. The world is flat; that’s a big idea, that the world is flat. Tom Friedman is a good question generator. There is no evidence he’s a good forecaster. In fact, there were some slight indications he might not be very good at it, but in some ways, he’s an extraordinarily insightful question generator. Forecasting tournaments require insightful question generation. That’s one deep complementarity between hedgehogs and foxes.                                                     

Stewart Brand:    Are superforecasters good question generators?                                                     

Levi:    That’s what I’m trying to find out.                                                     

Tetlock:    We don’t know how correlated these important skill sets are. Is there a general factor, g, a general intelligence thing that you can be a superforecaster, a superexplainer, superquestioner—you just do everything well? There are some people who can do it all, but I don’t think there are many of them.                                                     

Levi:    It’s not a null set, but you don’t think it’s a very big set.                                                     

Tetlock:    I don’t think it’s a very big set.                                                     

Sutherland:    In the business world, which is structured differently from academia, you have different job functions. The CEO probably is required to be more fox-like, the CFO is required to be a hedgehog. There’s an interesting literature on whether Chief Financial Officers make good CEOs subsequently.                                                     

Tetlock:    That’s an interesting hypothesis. I could imagine a whole seminar organized around that.                                                     

Sutherland:    Academia is unusually pyramidical in its structure.                                                     

Tetlock:    Yes, but that’s not the only way in which hedgehogs and foxes could be complementary in the forecasting tournament ecosystem. There are other ways they could be as well. I said yesterday that foxes have a somewhat parasitic approach. They’re looking for the best bits and pieces of truth in different hedgehog frameworks. They see a good idea on the liberal side, a good idea on the conservative side, a good idea on the doomster side, a good idea on the boomster side. They cross these ideological and disciplinary boundaries in a more fluid way.

They don’t have that much loyalty. Hedgehogs tend to have some loyalty to a community of co-believers. Indeed, the important hedgehogs create communities of co-believers. But hedgehogs tend to be loyal to a community of co-believers and their core tenets, whereas foxes and also superforecasters tend to be boundary transgressors. They’re more likely to wander across economics and politics.                                                     

Levi:    But would they be as good without there being some hedgehogs up there that they can draw things from?                                                     

Tetlock:    Probably not.                                                     

Levi:    I’m just trying to populate the ecosystem.                                                     

Tetlock:    The answer is probably not.                                                     

Salar Kamangar:    You mentioned that there’s an argument about the degree to which cognitive debiasing works. Counterfactual thinking, clustering, and these others seem like cognitive debiasing techniques. Have you been able to quantify which ones are the most important in predictions? What would Danny say if he disagrees that cognitive debiasing doesn’t work as well?                                                     

Tetlock:    Great question. I don’t think Danny would make a categorical statement like that. We don’t know enough to answer your question very well because, in the first four years of the IARPA forecasting tournament we were involved in a horserace. Our job was to improvise and develop the best tools we possibly could to win that horserace. We would throw everything in the kitchen sink into it if we thought it would work. Our training is quite multidimensional.                                                              

If you ask me which components of the training regimens we developed are most critical for boosting accuracy, that would require us to run a series of controlled experiments with random assignment conditions in which we assess how well forecasters perform when they do or do not get exposure to one or another element of training. That would require a much larger tournament. This was already a very expensive and large tournament, but it would require a much larger and more expensive tournament still to do that. I may have some hunches about it, but they would be nothing more than hunches.                                                                                   

Kamangar:    Is this something you’d like to do at some point?                                                     

Tetlock:    It’s something that’s worth doing, especially if you’re in an organization where you care about augmenting accuracy and you want to do it in the most cost effective way possible.                                                     

Kamangar:    What’s a criticism of cognitive debiasing? When people say it doesn’t work, why would they say that?                                                     

Tetlock:    Different psychological theorists have somewhat different views of how cognitive biases arise, and of what even constitutes a cognitive bias. I’m going to turn this over to Danny in one second. The more you think that cognitive biases are rooted in basic perceptual cognitive processes over which people have virtually no conscious or introspective access, the more pessimistic you’re likely to be about cognitive biases. The paradigm case of a perceptual bias that Danny uses in Thinking, Fast and Slow is the Müller-Lyer illusion. Unfortunately, we don’t have a screen to flash it up. The Müller-Lyer illusion is an illusion even after it’s explained to you. You see the lines of different length even though they’re the same length. You can pull out a ruler and you can say, “Yes, they’re the same,” so you’ve got the System 2 override on that, but you still see it as different. That’s a particularly tenacious type of bias.                                                              

To what extent are the biases that impair the accuracy of subjective probability estimates of possible futures like Müller-Lyer? To what extent are they biases that are more amenable to conscious control? Overconfidence is, for example, to some degree toward the other end. Danny, what do you think?                                                     

Daniel Kahneman:    I don’t have a clear answer to that question, but I can ask another one, which is going to be related. I’ve talked a lot about System 1 and System 2. System 1 is an associative system that contains a representation of the world, and System 2 does more sequential type logical reasoning. I have been wondering about your superforecasters and to what extent their skill is that they have a rich representation of the complex world—which is something that I would call an intelligent System 1—as against the ability to enumerate possibilities, to systematically go over things, which is much more System 2 operation. Today, it seems to me you were emphasizing System 2, but maybe I’m wrong. The ability to enumerate, to classify, to evaluate, as against having just a very rich and accurate model of reality. We know that’s an important dimension of individual differences. Some people that we call intelligent have that, and others have a different kind of intelligence where they’re not very good at getting a detailed and rich representation of the situation, but they are good at thinking logically and in-depth about problems. What’s your view about superforecasters and that? It’s related but not identical.                                                     

Tetlock:    One of the more interesting things that happened in the forecasting tournament toward the end was the conversation that got started between you and Barbara about the temporal scope sensitivity of forecasting. This is a little bit of a backup here.                                                     

About twenty years ago, Danny was involved in debate with a bunch of economists over a method known as contingent valuation—asking people how much they value cleaning out the lake in Ontario, or saving some fish, or various environmental public goods. The contingent valuation method is a survey for doing that. People might say, “How much would you value cleaning out this lake?” and they respond, “$10.” Danny was suspicious of this method and he ran some experiments on the scope sensitivity of the answers people were giving in contingent valuation surveys. How much would you be willing to pay to clean up all the lakes in Ontario? There are about 1000 of them; Ontario is a big place. Maybe there are more than 1000. The answer wasn’t very different from $10. Or, how much are you willing to save 100 ducks that are oil-soaked ducks, or how much are you willing to save 10,000 oil-soaked ducks? Again, the answer seemed to be very similar.

What were people doing when they answered a question like this? They don’t seem to be engaging in a very careful cost-benefit analysis, do they? They seem to be answering in a more emotive attitudinal way saying, “Yes, I’m against this sort of thing. I want to make a statement.”                                                              

What do ducks have to do with whether Assad is going to survive in Syria? Our superforecasters got questions about how likely Assad was to vacate the office of President of Syria in the next six months or the next twelve months. Danny mentioned the term propensity yesterday, as opposed to probability. If you’re thinking just in terms of causal propensities, you’re going to say something like, “What’s going on there? There’s a lot of attrition in the Syrian army, but they do have Hezbollah backing them up, and they have the Iranian revolutionary guard. But then there’s ISIS, and then there is the Nusra Front and then the Israelis are involved. There are a lot of things going on there.” You have a complex model of reality, but you’re not thinking probabilistically, you’re thinking in terms of propensities. You say, “On balance, I think he’s somewhat fragile. I’d say there’s a 65 percent chance he’ll survive the next six months.”                                                              

But people who are thinking that way are also likely to say 65 percent for about twelve months as well. They’re not responding differently to the time dimension. Now, that can’t be true, can it? That’s crazy. Barb and Danny had a conversation about whether superforecasters would be scope sensitive or not, and it turns out the supers are not perfectly scope sensitive but they are a lot better at it than most people. Scope sensitivity is a very hard thing to pull off.

By the way, this is how you test it. Any idiot can be scope sensitive if you do it in what’s called “repeated subjects design.” If I say, “How likely is [Assad] to survive six months versus twelve months?” and I ask the questions back to back, almost everyone will see the need to say that something becomes likelier the longer out we stretch the time frame. But if I randomly assign half the forecasters the task of making a six-month judgment and I randomly assign half the forecasters the task of making a twelve-month judgment, then we can determine if that population of forecasters is thinking in a probabilistic scope sensitive way. [If they are,] they are going to give different answers to six and twelve months, but if they’re thinking in a purely causal propensity fashion, the answers are likely to be close to interchangeable.                                                              

What’s the truth of the matter? The superforecasters are not perfect probabilistic thinkers. Superforecasting is an art that’s gradually being perfected. They are getting better at it. Some of them are so committed to it, they just do get better and better. What Barb and Danny found—what Barb found in particular, but Danny was advising on this—was that superforecasters are partially scope sensitive and they’re certainly way more scope sensitive than ordinary people are, which suggests that they are thinking, to some degree, in a genuinely probabilistic way, not just in a causal propensity way.                            

Kahneman:    It’s very interesting because you could imagine that somebody who is truly intelligent has an intelligent System 1 and has a sensitivity to time built into it, so it’s propensity all the time. It is very helpful. My question, and it should be easy to test—Barb, you should know—is whether they do it explicitly or implicitly. That is, whether they are aware that they’re responding to six months and they have an idea of the rate per month and they multiplied roughly by six, or whether they are so sensitive that even in the between subjects when they see twelve months, it feels longer to them than it would if the same people had seen six months. Which is it, Barb?                                                     

Barbara Mellers:    It’s this hybrid. We have this picture of human nature from your work that people make all kinds of mistakes. They think about probabilities in terms of propensities. Superforecasters are running the mental simulations: “Okay, the question is about six months, but what do I really think if it was twelve months or two years?”                                                     

Kahneman:    They are asking themselves?                                                     

Mellers:    I think they are. We have given them lots of Kahneman and Tversky-like problems to see if they fall prey to the same sorts of biases and errors. The answer is sort of, some of them do, but not as many. It’s not nearly as frequent as you see with the rest of us ordinary mortals. The other thing that’s interesting is they don’t make the kinds of mistakes that regular people make instead of the right answer. They do something that’s a little bit more thoughtful. They integrate base rates with case-specific information a little bit more.                                                     

Tetlock:    They’re closer to Bayesians.                                                     

Mellers:    Right. They’re a little less sensitive to framing effects. The reference point doesn’t have quite the enormous role that it does with most people. It’s a more intelligent picture, it’s not a perfect picture, but it’s hopeful, in terms of a view of where people could go.                                                     

Tetlock:    This was a purely research conversation. To what extent do superforecasters show or not show particular K-T—Kahneman-Tversky— kinds of effects? It could easily evolve into a training program in its own right. You could give people feedback, and we have done this now. I would venture to say that if we were to rerun our scope sensitivity studies and some of the other K-T experiments that were performed on supers now, they would do better. They’re learning more each time. This sounds a little hokey, but superforecasters get a little more super each time they invest in this process. Not all superforecasters are equal; there are some who are truly deeply dedicated to this as a skill cultivation exercise, and there are others who are just smart people who are doing well on the side.


Part 2                                                     

Kahneman:    This raises an interesting issue as to what is the boundary between System 1 and System 2? It is like riding a bicycle. That is, maybe you become sensitive to base rates without even having to think about it. When you see a problem, you look for a reference class. Maybe when you see a problem that has a date associated with it, you’re sensitive to the distance in the same way, by the way. If I ask you about getting somewhere, you make a calculation very quickly of how far it is and how long it will take you to get there. It doesn’t have to be explicit, but you have the sensitivity to the right variables built into it, and you know that it’s going to make a difference whether you walk or ride in the car and you don’t have to think about it.

You described me as a pessimist, but I am inclined to be convinced by what you are showing to the extent that it’s trainable. Then I have been too pessimistic about what can be done.                                                     

Robert Axelrod:    Can you tell us what we know about the transfer from 2 to 1? For example, in mathematics you get used to the idea that A plus B is B plus A, and you don’t think about it anymore. But at first, it’s deliberate. Maybe your question about the boundaries might have to do with the transfer from 1 to another.                                                     

Kahneman:    That’s right. That’s true with all skills. You start out deliberate and then eventually, when you get good at it, you don’t have to think about it anymore. The idea of how many of those are there, which at the base rate question occurs to you immediately when somebody asks you to make a forecast, or how long of a period that we’re talking about occurs to you immediately. You incorporate that in the causal model so it becomes part of the propensity. It’s a more complex propensity.                                                     

Axelrod:    Can we teach things that are currently Type 2 in a way that transforms them in Type 1?                                                     

Kahneman:    This is really the claim that Phil and Barb are making.                                                     

Axelrod:    Why do you put it on them? Is that to say you don’t support it?                                                     

Kahneman:    Phil describes me accurately as a pessimist. I started out as a skeptic, and it’s hard to stay completely skeptical when you see this evidence. It’s very clear that they have identified a population of people who are different from the basic model of progression that many of us, that at least I, have offered.                                                     

Axelrod:    Why do you say different as opposed to just parameterize those who are better at something?                                                     

Kahneman:    They are not qualitatively different, but they are better at quite a few things.                                                     

Mellers:    Over the years, we have lassoed everybody we know into this tournament. The single piece of the training that people say is the most important, and nobody has contradicted this, is the suggestion to consider reference classes. First of all, before you do anything else, think about different reference classes that you could use: African dictators who have been around for thirty years, if there are, or even if there are two of them in this set. I don’t think people naturally think about reference classes. They think about the case-specific information that Danny’s talked about in his research, the juicy stuff that we experience ourselves, not the statistical aspects of it.

Just a few hints with these smart people who also, I should say, have this other aspect that we haven’t talked about, and that is an incredible generosity with their time. How we ever managed to get these wonderful people to give ten, twenty, thirty hours of their week to us for X number of years is amazing to me. When we talk to them about that, they say, “Oh, this is just fun. It’s my hobby. I just really enjoy doing this. I don’t want to let my teammates down. It’s my favorite thing to do when I’m not doing my day job.” But wow, what an incredible subject population.                                                     

Brand:    Barb, would you say something about what you were telling me on Friday about people who are engaged in this forecasting exercise becoming less ideological and more open-minded?                                                     

Mellers:    It’s a wonderful finding that’s very recent as in the last month or two. Each year that people entered the tournament, we gave them a lot of psychological tests, and one of them is a test known as actively open-minded thinking. For those of you who use the jargon, it says, how much do you avoid the confirmation bias? How much do you look for information that goes against your priors, and your favorites views? We had that baseline for everybody when they came in. Then, at the end of year four, we gave the test again to everybody who survived. What we saw was a significant increase in open-mindedness depending on whether or not you were in the tournament.                                                     

This is not any strong evidence for an argument that tournaments make you more open-minded. It could have been the case that all the close-minded people dropped out early on and it was just simply a sampling effect. Another piece of the evidence that I love is you can plot these psychological equivalents of dose-response curves. If you’re in the tournament for one year, you get this much of an increase in open-mindedness, two years, three years, four years, and it keeps going up like this.                                                     

The other effect is there’s a reduction in polarization of political orientation. You ask people if they’re staunch liberal, staunch conservative, and after participating in the tournament, you see decreases at these extremes. They’re taking on a more nuanced perspective, and that’s hopeful. That’s a wonderful sign that maybe focusing on accuracy, and this is probably the mechanism, once you have to focus on something else besides yours truly—winning this thing—you’ve got to jettison those partisan biases and say goodbye to everything, the emotions and all the rest, and focus on that one goal. That may help increase open-mindedness, reduce group polarization, and make the world a better place in which to be.                                                     

Axelrod:    You said the word emotions, and you implied that you’re better off putting it aside.                                                     

Mellers:    Your which?                                                     

Axelrod:    That you should put aside emotions for the sake of accuracy, but what about the possibility that emotions guide you?                                                     

Mellers:    Oh, sure. There’s a lot of information in emotions.                                                     

Tetlock:    They’re not very useful for answering the IARPA questions though, for the most part.                                                     

Mellers:    If you want a Palestinian state, for example, for the UN to vote in a particular way, and you want it so badly that it gets in the way of what’s really happening, you’ve got a problem and you’re not going to do as well.                                                     

Axelrod:    I understand that part, but is there another part? For example, if you’re working on a probability, and as you come up with something, you say, “It just doesn’t feel right; I don’t think I’ve quite got to the bottom of this yet.”                                                     

Mellers:    Oh, that’s extremely useful. I completely agree.                                                     

Axelrod:    That’s also the kind of emotion. Maybe not calling it that; it’s not love-hate, but it’s satisfied-dissatisfied.                                                     

Mellers:    Yes, it’s some kind of intuition that says, “Don’t believe it yet.”                                                     

Kahneman:    A lot is known about the sense of cognitive unease, and cognitive unease is something that you don’t want to dismiss. That is a signal.                                                     

D.A. Wallach:    You sent out the article by the fellow at AQR. If you look at the most successful quantitative financial traders, are there lessons to be learned from the algorithms that they have been developing over the past twenty, thirty years that are transferrable to the way you would recommend people approach forecasting?                                                     

Tetlock:    The conversations I’ve had with people in Wall Street firms suggests that there is quite a sharp institutional partition between the quants and the people who are doing more behavioral sorts of things. I don’t think there has been a lot of crossover like that. I know there are some algorithm people who have tried to exploit biases documented in behavioral economics literature. Danny, you would know more about that than I would because Dick Thaler and other people have been involved in that. There have been efforts to do that. I don’t have a lot of expertise on that, but I know it has happened to some degree but not a lot.                                                     

Wallach:    I guess what I’m wondering is to what extent are the learnings you take from these experiments ones that would guide us towards behaving more algorithmically? Is the idea that the best forecasters are effectively behaving like more faithful algorithms with fixed processes and procedures that they perform on information they take in?                                                     

Tetlock:    A lot of human beings take a certain umbrage at the thought that they’re just an algorithm. A lot of superforecasters would not. They would be quite open to the possibility that you could create a statistical model of their judgment policies in particular domains, and that statistical model might, in the long run, outperform themselves because it’s performed more reliably. They’re quite open to that. Most people do not like that idea very much. That’s another way in which superforecasters, not all of them, as Barb is pointing out, as they’re a somewhat heterogeneous group, but the best among them would be quite open to that idea.                                                     

Wallach:    Do you perceive any creativity in what they do? In other words, is there anything you’ve learned about what they do that makes them different fundamentally than an algorithm?                                                     

Tetlock:    Creativity is one of those buzzwords. I’m a little hesitant to talk about creativity. There are things they do that feel somewhat creative, but what they mostly do is they think very carefully and they think carefully about how they think. They’re sensitive to blind spots, and they kick themselves pretty hard when they slip into what they knew was a trap. They slip into it anyway because some of these traps are sneaky and they’re hard to avoid.                                                              

That was true for a few of the superforecasters who slipped into the Yasser Arafat polonium trap—the question wasn’t, “Did Israel kill Arafat?” That’s true for superforecasters who slipped into a trap with a question about the Arctic Sea ice mass. Some of them took that to be a proxy for a global warming question. Arctic Sea ice is a somewhat unpredictable thing, and you’ve got to careful about what kinds of predictions you make for which particular timeframe.                                                              

When they fall for things like that, they learn from it, they grow from it. There are elements of creativity, but it’s a very subjective thing. They are certainly creative in the sense that they identify comparison classes for starting their initial probability estimates that other people wouldn’t have identified. They’re creative in ferreting out signals. For example, is it creative to notice that a politician is not doing what he should be doing if he were going to do something else, say, if he were going to run for President of Russia?                                                     

They do lots of things that are not big C creative in that sense of Picasso or Einstein, but they’re creative in a very functional workaday way.                                                     

Sutherland:    You said there’s a very strong correlation between these people and people who like to do puzzles. Is this right?                                                     

Tetlock:    This is a somewhat subjective assessment on my part. Barb and I were watching this wonderful documentary called Wordplay, with Will Shortz, the guy who creates the New York Times crossword puzzle. They have these competitions every year among the best crossword puzzle solvers. They meet in Connecticut every year. The fastest for solving a New York Times style crossword puzzle is about three minutes, and that’s a world champion.                                                     

We met the superforecasters in three different conferences—one in Berkeley and two at Wharton. In each case, if we were clinical psychologists, our clinical sense was that there are some marked similarities among these people.                                                     

Sutherland:    Just an interesting question. Sherlock Holmes is always portrayed by his creator as this embodiment of rationality, but he isn’t. There is a high degree of creativity in noticing the fact that a dog didn’t bark, and so on. Would you, Daniel, say that puzzle-solving is System 1 or System 2? Or is it the ability to flip between the two very rapidly?                                                     

Kahneman:    We were talking about that. If you define creativity as the ability to see remote associations—the remote association test is very much like a cryptic puzzle, as Rory was describing it, and what it is is a very rich associative world, an ability to form connections, see them, and check them. The checking operation is vital. You can have too many associations, but you’ve got to be able to sort them out and pick the best. My sense is that if you wanted a division, it would be between the creation, the formation, the detection of those hypotheses or remote connections, and the ability to evaluate them. And that is more System 2 and that’s more System 1, if you want to use that language.                                                     

Tetlock:    The remote associations test, for those of you who have never heard of it, is something that runs something like this: moon, mouse, blue, what’s the common associate?                                                     

Jennifer Jacquet:    Cheese.                                                          

Tetlock:    Psychologists are not allowed to answer. Oh Jennifer, I’m sorry. Yes, okay, cheese. You’re very creative. That’s just a quick example of how that works.                                                     

Wael Ghonim:    What percent of the decision-making process by those superforecasters could be replaceable by algorithms. Let’s say in a world where you could ask the right questions, a lot of them rely on the history, which is something that algorithms could be very well doing a good job in assisting that.                                                     

Tetlock:    Yes. These are important questions. They’re questions that to some degree require us to move out of the world of forecasting tournaments and move into the world of controlled experimentation. You can create simulated environments in the laboratory. One paradigm is known as multiple-cue probability learning task, and Barb has done some work with me and a professor at Texas on this. Let’s say you are trying to predict how well someone would do in a certain kind of job or something like that. There were three cues, and the multiple R-squared was either .35 or .6 or something like that, so we are experimentally manipulating how much predictability there is in the world, manipulating the degree to which that predictability can be canned (or formulized). That is to say, there is an algorithm that [says that] if you follow this particular rule of thumb, you will be able to achieve the maximum R-squared of .35 or .6, but sometimes, it’s not cannable; it’s not easily canned. The question is how do people perform in these types of worlds?

Let’s go back to what we talked about in the intelligence community. They believe in process. They want their analysts to be accountable for following the best analytical processes. Analysts feel, and many of their managers feel, that it would be palpably unfair to hold them accountable for predicting outcomes when there is a big stochastic component to the resolution of those outcomes. All you’d wind up doing is giving credit to the lucky and blaming the unlucky, and that’s going to demoralize people. It’s not going to work out very well.                                                     

One of the things we find in experimental simulations—like multiple cue probability learning tasks—is that when you hold people accountable for the process and the process is pretty good, people do pretty well. Not too surprising. But when you hold people accountable for process and there is some variance in the multiple-cue task that’s not captured by the organization-approved algorithm, then they don’t find it. The outcome accountable people are more likely to find it, so when there is something there to be found, that the official organizational algorithm cannot capture, holding people outcome accountable causes them to find it. However, the price is that when there is not something there to be found, the outcome accountable people will look for things that don’t exist and it will degrade their accuracy.                                                     

You, in a sense, have to make a judgment call about how much predictability there is in the domain-specific environment you care about, how adequate your official organizational algorithms are, and if it’s worth incentivizing people to go beyond those algorithms by holding them outcome accountable.                                                     

Kahneman:    There are some people here who are experts on deep learning. The nature of algorithms is changing very rapidly, and now you have deeply nonlinear algorithms. We’ve been thinking mostly of linear systems. I was just wondering what you think because there is a lot of data about historical and strategic and military traits. If you had a very large database, what you can find with deep learning can surprise people. I was wondering what your associations are.                                                              

Brooks:    It’s like the substitution thing: “Deep learning is going to solve everything.” I don’t think the amount of data they’ve got in these domains is anywhere near enough for deep learning. It takes massive amounts of learning.                                                     

Lee:    For the specific case of the remote association test, deep learning would crush it. It would outperform any human.                                                     

Brooks:    But a very specific aspect, and it’s not going to be generalizable. People mistake performance of deep learning for competence. Deep learning, wherever you apply it, is going to be very restricted, a circumscribed little piece. When it performs well on that little piece, people apply that to the way that if a person performed that way, they would have a much wider competence around there, and they overgeneralize it. I just get allergic every time everyone says “deep learning” and “there’s a lot of data,” because I think it’s very rare. Peter may disagree with me.                                                     

Lee:    No, I agree.                                                     

Brooks:    It’s become the panacea, the imagined panacea.                                                         

Lee:    It highlights the difficulty of coming up with tests that fairly distinguish.                                                     

W. Daniel Hillis:    I have a specific example that shows the kind of place where they do work. I’m involved in a little company that does weather prediction, and it does weather prediction in an odd way that assumes that there are lots of models out there which are the equivalent of hedgehogs. It looks at all the hedgehogs and decides which ones are useful or tend to work in which situations. It does a meta learning on top of this data, but it gets lots and lots of feedback. It’s constantly getting feedback as to what’s working when, and so on. There’s a learning algorithm on top of everybody else’s more reasoned models and physical models and things like that, that outperforms on a class of near-term weather prediction and outperforms any other models.                                                     

Tetlock:    I’d love to see that. Is that in the literature?

Brand:    It’s called Dark Sky.                                                     

Hillis:    It does a better job at forecasting the weather. It does work because it’s a very constrained area and with lots of data. It was the bestselling app on iPhone last week.

Lee:    I was introduced to this remote association test yesterday over lunch, so I did ask someone in my labs to try to run that experiment. What I predict will happen is that the deep learning system will produce not only one answer but 100 answers.                                                     

Kahneman:    The selection is going to be the problem.                                                     

Lee:    Yes. Ranking or rating the ones that would resonate, or connect, or be comprehensible to humans in the most obvious ways and the most amusing ways, would be beyond the sphere of competence that Rodney is talking about.                                                     

Kahneman:    That would make it interesting because finding the associations, you certainly can, but whether you can develop a deep model of what people will find amusing or interesting, that selection is the real challenge.                                                     

Kamangar:    On this question about the role of data, it seems like you’re framing the question about what makes a good forecaster in one way, which is how ideologically do they interpret information versus a more crafty approach where they borrow from all the theories? If you had to write a computer program, you would think about it in a different way. You would say, “How much data do you have, and then, how well do you interpret that data?” You might look at it through that lens, and some of the examples you use when you talk about researchers who are good at collecting information, emailing sources for direct information, and the super teams which collect more diverse data, has to do more with information-gathering. I’m curious why you wouldn’t break up the framing, or the problem, and ask the question of how much data are you able to gather and how well are you able to interpret that data?                                                                    

Tetlock:    That’s a good idea.                                                                    

Mellers:    It’s interesting, different superforecasters on a team seem to take different roles. Some people are good, they’re like Watson, they go off and find all this obscure stuff and then present it to their teammates. Other people build little models to think about supply and demand, or whatever the question is. And other people are the cheerleaders on the sides saying, “Phil, we haven’t heard from you for a while, do you have any thoughts about yada yada yada?” If there was Watson sitting there on the team providing all of the data anybody needed in the world, and the discussion was then on what’s important, how should we think about this, how do we put it together, that’s the immediate future on how to make this better. At least get the data for everybody so they’re not spending time doing that.                                                                    

Christian:    Thinking about algorithms—learning algorithms for gathering data—connects interestingly to this notion of taboo thinking. In computational linguistics, if you’re doing machine translation, one approach is to build this massive database of anagrams; you digest this massive corpus and you just record the number of times you see each word pair.                                                                    

Of course, there is going to be an enormous amount of zeros in this table. One of the important techniques in many applications is to remember to basically renormalize this database so that it has nonzero probability to things that you’ve never seen. To me, there is something nice in this idea that you need to remember this critical step of all of the things that you’ve never seen or never experienced, give them some epsilon probability, don’t give them literally zero probability where it’s going to ruin your model. It’s this quantitative taboo avoidance, which you need.                                                                    

Tetlock:    Yes. That’s one of the taboos that superforecasters might resonate to: don’t use zero or one on the probability scale.                                                                    

Wallach:    Maybe an interesting thing that could come out of this conversation would be some roadmap for pairing up your superforecasters with top deep learning folks. A lot of things I am encountering, I’m not an expert in the field, but in radiological imaging, for example, there are all these companies that are now going to read scans better than people do. All of them seem to be outperforming the doctors, but the only way they get there is by sitting there with top radiologists for weeks and weeks. Have you guys had any conversations with folks about getting your superforecasters to be coaches?

Tetlock:    The superforecasters would enjoy doing something like that. This is a particularly challenging domain for some of the reasons that Dean was emphasizing yesterday, which is that we’re not dealing with unique events, we’re dealing with events that are more toward the unique end of the continuum. It makes it more challenging. Each human body is unique but not that unique.

Kamen:    The coaching that you see, is it of other superforecasters or is it of people who then become excited and become more involved and become superforecasters?

Tetlock:    I’m sorry, would you mind repeating?                                                                    

Kamen:    You talked about coaching as being a critical part, and you pointed out that computers can do things better only after they deal with the best of the best and essentially do what Danny was suggesting, which was take all the hedgehogs and put them together and glean from it. Is coaching something you’ve given appropriate thought to as to whether it’s the real reason your program works, or whether it’s an artifact, or whether it’s critically important, or an interesting side note?                                                                    

Tetlock:    Some people are more coachable in superforecasting skills than others, and both the basic training we provide and the encouragement teammates on superforecasting teams provide each other, brings them up to another level. Coaching is an important part of the future of this. It hasn’t been formalized as much as it should have, but that’s an important direction to go. This is a skill that requires an enormous amount of patience. Any demanding skill requires a lot of deep practice. Learning to play the piano requires a lot of deep practice. Anders Ericsson and other people who study expertise—acquisition of expert level performance—emphasizes the deep practice aspect. One of the differences between this and learning to play the piano, is the piano keys don’t randomly change on you occasionally. In this domain, they do. It requires grittiness cubed. It requires a degree of social support—coaching. You have to hold each other up because sometimes, the environment can be so demoralizing in its capriciousness.