Edge Master Class 2015: A Short Course in Superforecasting, Class V

Edge Master Class 2015: A Short Course in Superforecasting, Class V

Condensing it All Into Four Big Problems and a Killer App Solution
Philip Tetlock [9.22.15]

 


Edge Master Class 2015 with Philip Tetlock
— A Short Course in Superforecasting
 —

| Class 1Class 2 | Class 3 | Class 4 | Class 5 |


Philip Tetlock:  If you turn to session six, slide 117-118, you’re going to see a little piece on the seductive power of scenarios. Imagine you’ve got one of these between subjects designs in which half of the people read the top slide, half of the people read the bottom slide, then they make a judgment about the plausibility or probability of this outcome.

The first one, on slide 117, is the likelihood of a flood anywhere in North America in the next thirty years killing 1000 people or more. The next one is the likelihood of a flood anywhere in North America, triggered by an earthquake causing a dam to collapse in the next thirty years killing 1000 people or more.

Description: Slide117.jpg

 

Description: Slide118.jpg

You can imagine randomly assigning half of the people read one version, half read the other. A moment’s contemplation reveals that it would be very odd if people judged the bottom slide—the more detailed one—to be more probable than the top one.

It’s probably obvious to everybody around this room because this is an analytically high-powered group, but it’s not obvious to most people. It’s an important part of forecaster training, and it’s an important part of becoming a superforecaster—being aware that there is this similarity across three superficially very different things. There’s a similarity in how the subjects in Danny’s contingent valuation experiments judge the value of ducks and lakes in Ontario—scope sensitivity, a similarity in the problems regular forecasters have in distinguishing the likelihood of Assad falling in six months versus twelve months, and the dam scenarios. Is it clear how these things are related?              

This also ties in somewhat to this vexing problem of people’s judgments of explanations and forecasting accuracy because we are tempted by rich narratives. It’s easy for us to transport ourselves imaginatively into the more detailed narrative that’s more interesting, more engaging. It’s a better story. And of course, it’s less probable. You don’t want to get suckered by attribute substitution. You don’t want to replace the question, is this an accurate forecast? with, is this interesting story? It is tempting in many situations to do that. I make that point on slide 119.

Description: Slide119.jpg

Description: Slide120.jpg

On slide 120 I talk about something we talked a bit about yesterday, which is that scenarios can be a source of bias. They can cause you to attribute too much probability to too many possibilities. You can violate the axioms of probability. Probabilities can start adding up to more than one; that’s a warning sign that you’re incoherent. But you can fight fire with fire, and you can use scenarios when you think backward in time. I mentioned that yesterday in the connection with counterfactuals and hindsight bias. Getting people to imagine counterfactual alternatives to reality is a way of counteracting hindsight bias.                                 

Hindsight bias is a difficulty people have in remembering their past states of ignorance, or remembering what they thought before. Counterfactual scenarios can reconnect us to our past states of ignorance and that can be a useful, humbling exercise. In the long run, it’s good mental hygiene and it’s useful for debiasing.                                 

Our most recent training modules for superforecasting emphasize this bicycle riding metaphor and balancing metaphor and balancing offsetting errors. Some errors are more likely in some environments than others. Some errors may be more likely in general than others. Each of the errors on slide 121 are logically and psychologically possible, and it’s helpful that people be aware of them. You have the perils of over-adjusting to evidence or under-adjusting to evidence. Over-adjusting to evidence is when people see the big crowds forming in Moscow in early 2012, and they think Putin is finished. Under-adjusting to evidence can also occur in many situations.

Description: Slide121.jpg

Going a little bit further back in history, people were very slow in the U.S. foreign policy establishment to recognize the reality of the Sino-Soviet split (as a student of American politics is well aware). They had this mental model of the Communist monolith, and it was somewhat misleading. There is a tension between under-confidence and over-confidence, there’s a tension between over-predicting and under-predicting change, and a tension between over- and underestimating the uniqueness of situations. All of these are possible mistakes. We urge people to look for comparison classes, so we’re weighted somewhat toward alerting people to ways in which the situation is similar to other situations. Both errors are clearly possible.

What are we trying to do in the biggest picture sense? The last slide is a quote from an English professor at Yale, retired, whose name is Harold Bloom. He was a Shakespeare scholar. The quote, on slide 122, is, “One learns from Shakespeare that self overhearing is the prime function of soliloquy. Hamlet teaches us how to talk to oneself, not how to talk to others.” That’s a strange quote. What does it mean?

Description: Slide122.jpg

What we’re trying to encourage in training is not only getting people to monitor their thought processes, but to listen to themselves think about how they think. That sounds dangerously like an infinite regress into nowhere, but the capacity to listen to yourself, talk to yourself, and decide whether you like what you’re hearing is very useful. It’s not something you can sustain neurologically for very long. It’s a fleeting achievement of consciousness, but it’s a valuable one and it’s relevant to superforecasting.                                 

There are two big themes in this set of presentations. One of them is about self-improvement, and the article I circulated from Aaron Brown is purely in this self-improvement mode. It’s how to use these commandments to make yourself a better gambler, a better investor, smarter, and richer; that’s a clear self-improvement focus.                                 

The thing that’s been most on my mind in the last several months looms very large in my consciousness right now. It’s less focused on self-improvement—how to make you smarter, richer, et cetera—but rather, it’s focused on how to make society a better place and how to improve the quality of political debate and dialogue. That’s what this last handout is about: condensing it all into four big problems in a killer app solution.                                 

Superforecasting is relevant here and we can explain how, but it’s also relevant to this larger agenda of improving the quality of high stakes policy debates. The first big problem I see is that in virtually all high stakes policy debates we observe now, the participants are motivated less by pure accuracy goals than they are by an assortment of other goals. They’re motivated by ego defense, defending their past positions, they’re motivated by self-promotion, claiming credit for things (correctly or incorrectly), they’re motivated to affirm their loyalty to a community of co-believers because the status of pundits hinges critically on where they stand in their social networks. If you’re a liberal or a conservative high profile pundit, you know that if you take one for the home team, they’re going to pick you up and keep you moving along. There’s a story about Dick Morris, on FOX News, saying how there was going to be a big Romney surge, and of course, there wasn’t. If you’re a survey expert, or a pollster, it’s not good that you’re associated with a prediction of that sort.                                 

The explanation he offered was quite astonishing from a forecasting tournament point of view, and that was that he knew that the Romney campaign was falling apart. He knew that they were losing, but wanted to buck up the home team. He admitted it. He admitted he was not playing an epistemic game at all. There was no accuracy component. He didn’t believe it, he just said what he thought he needed to say. That’s an extreme case in which pundits are attaching zero value to accuracy.                                 

In many cases, pundits do know in the back of their minds that if they’re bullish on something that their community of co-believers is promoting aggressively in the policy domain, that if their predictions start to look suspect, they’re still going to be in good standing in their community. This is what I call functionalist blurring. You can’t read people’s minds when they make a forecast or a quasi-forecast in a high stakes policy debate. You can’t say, “Look, now they’re playing .6 accuracy game, .2 self-esteem game, .2 affirming loyalty to community of co-believers game,” you can’t parse it out exactly like that. Although, you can do experiments when you can parse this out a little bit.                                 

Even the pundits themselves are not aware of what they’re doing. If you asked them to introspect, they probably couldn’t give you a clear answer. Functionalist blurring is a very common phenomenon in policy debates. That’s point one. Point two is this attribute substitution among debaters— high stakes partisans.              

Stewart Brand:    One point on [functionalist blurring]. You say it is impossible to fix by introspection.              

Tetlock:    Yes. It’s extremely difficult. You can run experiments in which you manipulate the goals that people are supposed to have and see how that affects what they do, but people have limited introspective access to what they’re doing in these situations. One way you can tell if you’re not playing a pure accuracy game is if you feel your probabilities are shifting when you move into the tournament. If you introspect on that shift, you’ll get some intuitive sense of the degree to which you’re playing a pure accuracy game.              

Robert Axelrod:    There’re some institutional things you could do. A person could talk to a predictor alone and say, “Okay, what do you actually think?”              

Tetlock:    The second big point—we’ve already talked about attribute substitution. High stakes partisans want to simplify an otherwise intolerably complicated world. They use attribute substitution a lot. They take hard questions and replace them with easy ones and they act as if the answers to the easy ones are answers to the hard ones. That is a very general tendency. We talked about it quite a bit yesterday. The third thing we talked a bit about yesterday is rhetorical obfuscation as an essential survival strategy if you’re a political pundit. To preserve their self and public images in an environment that throws up a lot of surprises, which, of course, the political world does, high stakes partisans have to learn to master an arcane art: the art of appearing to go out on a predictive limb without actually doing it, of appearing to be making a forecast without making a forecast.                                 

They say decisive-sounding things about Eurozone collapse or this or that, but there are so many “may’s” and “possibly’s” and “could’s” and so forth that turning it into a probability estimate that can be scored for accuracy is virtually impossible. A, they can’t keep score of themselves. B, there is no way to tell ex post which side gets closer to the truth because each side has rhetorically positioned itself in a way that allows it to explain what happened ex post.              

Rory Sutherland:    That’s possibly a media bias as well as a pundit bias in the sense that if you’re too noncommittal you never get invited back.              

Tetlock:    Absolutely.              

Sutherland:    The press has an instinctive aversion to anybody who qualifies too much.              

Tetlock:    Yes. Attribute substitution, point four, is not just going on among the debaters, it’s going on in the audience as well. Audiences are remarkably forgiving of all these epistemological sins that debaters are committing. There is a tendency to take partisan claims more or less at face value as long as the partisans belong to their community of co-believers.                                 

Does my side know the answer? is the really hard question. The easier one is, whom do I trust more to know the answer, my side or their side? I trust my side more to know the answer. Attribute substitution is a profound idea, and it allows us to think we know a lot of things that we don’t know. The net result of attribute substitution among both debaters and audiences is it makes it very hard to learn lessons from history that we weren’t already ideologically predisposed to learn because history hinges on counterfactuals. Counterfactuals are invisible; you get to make up the data as a kind of ideological projection test.                                 

The beauty of forecasting tournaments is that they’re pure accuracy games that impose an unusual monastic discipline on how people go about making probability estimates of the possible consequences of policy options. It’s a way of reducing escape clauses for the debaters, as well as reducing motivated reasoning room for the audience.                                 

Tournaments, if they’re given a real shot, have a potential to raise the quality of debates by incentivizing competition to be more accurate and reducing functionalist blurring that makes it so difficult to figure out who is closer to the truth.

The problem is, and this pushes me more toward the pessimistic end of the continuum again, is the potential of forecasting tournaments to improve the quality of debates could be untapped for many decades, or even centuries because there are such powerful forces that are arrayed against adopting forecasting tournaments.                                 

The rhetorical question at the end is, why should high status incumbents agree to play on a level playing field competition in which the best possible outcome is that they retain their alpha pundit status within their community of co-believers. Why do that? If the answer is, Tom Friedman just doesn’t return the phone call, none of them return the phone call, you don’t have them engaged in forecasting tournaments. Is there a way around that? If people have creative ideas for how we can use tournaments to disrupt stale status hierarchies, and we have a lot of stale status hierarchies in policy debates, that would be a big contribution to our society.              

Axelrod:    A small contribution to that might be that you have different communities. There is the community that reads the New York Times. So Friedman is not going to do this, but you could set up the separate community, like you have, where the status is related to the quality in the competition.              

Tetlock:    But it’s a tiny community and even if it gets twenty times more prominence than any other work I’ve ever done, it’s still tiny compared to these other things.              

Axelrod:    It could be done within organizations, too. There already are economic competitions that predict unemployment and interest rates six months in advance. Those people don’t necessarily write op-ed pieces, but they do have a reputation within their communities that is based on accuracy.              

Tetlock:    My gut tells me that just as superforecasters seem to come mostly from more techie kinds of environments, the forecasting tournament model is going to need to be initiated from the techie world. I’m not expecting it to come from the New York Times or the Wall Street Journal.              

Brand:    What you’re trying to affect is the quality of public discourse, it sounds like.              

Tetlcok:    Yes, exactly.              

Brand:    The question is, and it sounds like you guys have gone partly into it, what would happen if you had real public access, real public visibility, public transparency, maybe even certain public participation in one of these tournaments? By engaging the public with this process and participation, do you think that would be a second generation tournament, to go public?              

Tetlock:    Mass engagement is very important. I don’t think it’s sufficient for doing sophisticated second generation tournaments that help to improve the quality of policy debates, but it starts to get people’s attention. It becomes much harder for the high status incumbents to ignore it when very large numbers of people are engaged, including some high status people. Once the high status people become engaged, then it would start to cascade. Mass engagement is important, but engaging in high quality questions is equally important. Where do high quality questions come from? Sometimes they come from hedgehog-y pundits, the people who are not so indirectly being criticized here.              

Dean Kamen:    On the specific issue of how you kick-start this thing to be broadly interacted with either by the general public or by more people, it’s like the problem Peter Diamandis continues to have with how to make a prize that will get enough public attention and get somebody big to put up enough money. It’s very hard. While we all know the climate change issue is a real issue, how do you make a prize that has all the required qualities? It has a simple well-defined set of rules, it has a clearly-defined winner, people can follow it and understand it as they go. It has almost nothing to do with the actual problem to be solved, but the structure around it makes it easy for the media to follow it, et cetera.

For a second, forget about how you’re going to get the superforecasters and how well they’re going to do, and come up with a problem that meets the criteria that everybody cares about in some way, that they may not know they care about—aging, health, the environment. Come up with something that, if properly phrased, would get everybody’s attention, if it was a one-liner or a headliner. Then you also come up with a finite time frame to discuss it and to put some clear rules around who did better at accomplishing the goal. If you did that, I suspect you could get various people that have a passion for this, that appreciate the importance of better forecasting, maybe eliminating the next war that’s a mistake. You then can go to some of the people that have big funds available to them and say, “We’re going to do this prize, it’s an X price, and the person that gets closer on the date we reveal will get $10 million,” or something. It would not be hard to get somebody to fund something if you met the first two criteria: that it’s broadly interesting, and the contest could be easily understood. People could appreciate there is a winner.                                 

Getting the media attention that you think you wouldn’t get even at twenty times your size, you’d use it against them, the fact that the Tom Friedmans don’t respond. Whether it’s in the New York Times, or you have an article about it or an op-ed, or on public media where you have the Jon Stewarts of the world saying, “Hmm, this project is very important, lots of people are doing it, but look at these five super-pundits that seem to have expert opinions on everything; they wouldn’t respond to them.”  It would be pretty straightforward, if you got a credible contest going, to see that they were conspicuous by their absence, that the real pundits on these things are nowhere to be seen.                                 

That would probably start the more serious dialogue that you want which is, why is it that all these great pundits won’t do this? It might start the next op-ed that says, “We’ve looked at these guys and their track record is either nonexistent or bad,” which nobody seems to do. You pointed out over and over again that they don’t keep score. Just getting the public to realize that they don’t keep score, getting the public to realize that they speak with enough double-talk that, while they made a profound statement here, the small print eroded any capability for them to be culpable. Everything you are concerned about doing, you could probably humanize pretty well in a competition if you talk to people that have expertise.              

D.A. Wallach:    This would be a good TV show. Why don’t you make it a huge TV show? It’s got constant new material. I bet Vice or HBO would totally do this.              

Kamen:    Reality forecasting.              

Sutherland:    There’s also a very simple thing, which is you just start a New York Times column called “What Have You Changed Your Mind About?” If the answer to that is nothing, you look pretty stupid. You’re stealing it from Keynes aren’t you, when he was accused of inconsistency and he said, “When the facts change, I change my mind, sir. What do you do?” Simply reframing it so that a mark of intelligence is not mental consistency but mental flexibility might help.              

Barbara Mellers:    The nice thing about what Dean just said was, there is the carrot—the $10 million—and the stick. Here is a case where shame comes in again. It’s necessary to get the Tom Freidmans of the world feeling bad enough about not being part of it that they become a part of it.


Part 2              

Wael Ghonim:    You mentioned that Tom Friedman, if I remember correctly, said that he would be willing to do it anonymously.              

Tetlock:    No, I didn’t. A small number, less than 10 percent of the people that Dan Gardner surveyed were willing to do it anonymously. I’m not going to comment on who they were. I will, Tom was not among them.              

Ghonim:    The carrot, to me from a user perspective, is the desire to be right and the desire to announce that I am right. That’s enough of a carrot in this game. I would be involved in this game to prove that I am right. There is a social virality aspect of it. If I am 100 percent right, I like to show off on every single platform to tell people that I am 100 percent right.              

Tetlock:    Supers like to do that; they’re proud.              

Ghonim:    Yes. The problem is the risk of being wrong. What if I’m actually 60 percent wrong? What if I’m 80 percent wrong? Maybe there is a pseudo-anonymous model whereby you only disclose the things that I’m right about, and you don’t disclose my accuracy level. Internally, you give better macro data. For example, would Syria’s civil war end by the end of the year? You are collecting all the data from all the people and, based on that historical performance, you know your confidence level is 75 percent, yes. You are not going to expose anyone who did the wrong forecasting, and anyone who did the right forecasting would eventually have that on their profile, that they were right about Syria. On one hand, you collected the macro data and you have the right forecast for everybody. You don’t care about shaming those who got it wrong, and you gave good credit for the people who did the right forecasting.              

Tetlock:    It reminds me of the old B.F. Skinner boxes; it’s only reward. There is no punishment there.              

Ghonim:    There is a punishment internally in the system because their weight next time they forecast is going to be much less than the overall forecasting, which is really what you are after, right? You are not after individual results. You are not after telling someone you are 75 percent right. It doesn’t matter. You are after: would this happen 75 percent of the time?              

Tetlock:    In the second generation tournaments, the focus is on improving the quality of public debate. Incentivizing people to become superforecasters may be an important part of improving the probability estimate stream, but it’s a secondary goal to the societal goal.              

Ghonim:    Why wouldn’t Thomas Friedman do that? You’re not going to expose all the mistakes that he is doing and you’re going to give him some good credit for the stuff he got right. Everybody in the system knows probably Tom Friedman has missed 50 percent, but at least they know the 50 percent that he was right about.              

Brand:    What I would like in the second generation tournament is access to it as a public person, and the ability to record my own prediction—my own forecast—in a way that I can revisit it when I turned out to have been wrong. The way of turning out to have been wrong is the most important learning part of this whole process. Being comfortable acknowledging that, noticing that doing the kinds of analysis that these people who become superforecasters go through is, why was I wrong: “Oh, I had an ideological blip and I didn’t even realize.”                                 

Maybe I share my rightness and my wrongness with a group of people whose opinions I would like to pool and be part of and show off to, or be proudly humble in front of: “Look at how wrong I was on this. What do you think?” I’d like a place to, without big consequence, try my abilities.

I was a professional futurist for twenty-five years, we did scenarios and all this stuff, I bet I’m pretty good, or I’m pretty bad; either one would be interesting. I want to do that in a way where I’m forced to be honest with myself.              

Ghonim:    That’s a great approach as well.              

Jennifer Jacquet:    In that training part of your regime, that’s private, right? Other people don’t see the results of that beginning work that they do, is that correct?              

Mellers:    Does everyone see everyone’s scores? No. It’s your model. The only scores that are seen are the top 10 percent, and then everybody else can remain anonymous.              

Jacquet:    And in the cognitive bias training that they go through, that’s just for them?              

Daniel Kahneman:    I’m curious about your emphasis on tournaments. It would seem, from what you have said, that an organization could set up the forecasting team, selected in some way from among their employees, and train them internally and send them forecasting problems. Somehow it seems to me that the emphasis on tournaments is restricted, that there is more to what you are proposing than tournaments. In principle, you’re talking about finding people who think reasonably well to begin with, training them to think better, putting them in teams, and putting them to work. What am I missing? Why are you emphasizing tournaments so much?              

Tetlock:    There are different levels of analysis here. There is a private sector spinoff from the Good Judgment Project which is engaged with organizations in creating tournaments inside organizations, and doing exactly all those things that you described and that we talked about yesterday as critical for spotting and cultivating skill. That can all happen within any private sector or public organization. The focus here is on a societal level. I can’t think of a better way of crystallizing what I see as the entrenchment, the ossification, of status and close-mindedness that is making it so hard to learn from policy debates, making it so hard for policy debates to be meaningfully cumulative rather than rhetorical kabuki dances.              

Salar Kamangar:    Can you think of any alternatives, Danny?              

Kahneman:    The issue here is with the client. This is true in the context of policy, whether the decision-makers are ready for this, or what the social impact would be of having a subgroup of people improving their forecasting. If you’re viewing this as a campaign, the television program and large prize and so on would be extremely useful. But ultimately, I’m not sure that you can improve society thinking, or thinking about social problems by having essentially dispassionate people think dispassionately about problems when there are politically motivated hedgehogs who shout a lot louder.              

Tetlock:    The minimalist goal is to make it marginally more embarrassing to be incorrigibly close-minded, just marginally. The more ambitious goal is to make it substantially more embarrassing, and that requires talent and resources of the sort that academics like myself don’t possess. I don’t know how to create a TV show.              

Kamangar:    I have a question about the debate about prediction versus predicting policy—the discussion that you and Bob had earlier. There are certain policy decisions that are fairly unique where they hinge a lot on a specific prediction—the WMD example that you gave. You might argue that that was the sideshow and there was another story, but there was a long public debate about whether there’s WMD and that could have been a pivotal decision that would have changed the decision to go into Iraq. Most decisions, when it comes to policy, aren’t that way, like to go into Syria or not. You have a lot of factors you have to consider. Nevertheless, if you are in charge or if you’re advising our intelligence agencies on their hiring protocol, would you advise them to give a boost to people that do well in your tournaments versus those that don’t when it comes to making the kinds of decisions, like should we go into Syria?              

Tetlock:    I would. In fact, it’s always happened to some degree, as Barb knows from one of our superforecaster conferences. There was a person from the intelligence community, moderately senior, who encouraged the superforecasters to apply for jobs there. I don’t know how many did since the vast majority are already pretty gainfully employed. It’s a good idea, and I don’t think it occurs to people inside the IC to have people like superforecasters in many of the domains where superforecasters are generating good probability estimates. Just on a purely personnel selection level, that would be a good idea. The hard issue that you raised has to do with, yes, WMD in Iraq was a situation where there was one pivotal point of disagreement, and it was a real disagreement, although you can argue other things are going on. Other policy debates are more multidimensional. Which policy alternatives are more or less attractive hinges on somewhat intricate clusters of questions that would need to be generated.

The debates we saw over ObamaCare, the debates we see over quantitative easing, a lot of the debates we see, you can imagine multiple indicators that could tip the scales of plausibility toward one side or the other of the debate. That’s the hard part. That was the thing I dwelled on a bit yesterday, and the reason I dwelled on it is I see it as so crucial to making this work. Mass engagement is crucial, but so is the quality of the questions, and generating high quality questions is something we need to put a lot of energy into.              

Ghonim:    Would hiring superforecasters bias their judgments once they are within [the organization]. Right now, they are somehow independent when they make their forecasts, but if they are within the organization they might have [biases].              

Tetlock:    Oh sure, lots of things could happen to them. Very few of them will ever join the intelligence community.              

Ghonim:    But do you think it would introduce less accuracy to their forecasting?              

Tetlock:    I’m not sure. Once you enter an organization, you’re not playing a pure accuracy game. You’re playing a lot of other games. You’ve got to get along with the boss, you’ve got to get along with other people. There are all sorts of things you have to do to get along. Those inevitably would degrade accuracy to some degree, yes.              

Jacquet:    Also, Phil, people could opt out. They could just answer whatever questions they wanted, right? And Thomas Friedman has to answer every question.              

Tetlock:    Does he?              

Jacquet:    Isn’t that a major difference? In a sense, but he’s obligated every week to do …              

Kahneman:    To answer her question.              

Jacquet:    Yes, that’s right.              

Mellers:    But it’s his question, so …              

Jacquet:    It’s his question already, yes, that’s true. If they could answer their own questions, it would be like his job.              

Mellers:    Yes. It would be nice to have an accuracy guru in the company who everybody just trusts because the whole focus of that person is not to take sides, just completely engage in the single goal.              

Ghonim:    Could you think of examples? Does this happen anywhere?              

Mellers:    There was a guy at United Airlines who knew how to fix everything at the last possible minute when everybody else had given up. He was eighty or something like that. People were very upset when he was taken out of the workplace for three or four days to create this belief network: When this thing rattles and that thing breaks, what’s the probability that this-and-such is wrong? Apparently, it was helping United Airlines for many years. It was this incredibly valuable accuracy belief network, and that was his claim to fame.              

Tetlock:    They wanted to create a model of him, right?              

Mellers:    They wanted a model of him for what he was building.              

Tetlock:    Yes. And they did, didn’t they?              

Mellers:    Yes.              

Brand:    The question I have is the difference between doing a forecasting exercise in something one cares deeply about, and something one doesn’t care that much about but is interested enough to do the research.              

Tetlock:    And which one do you do better on. Does it help to care deeply?              

Brand:    Presumably, the most educational is both: comparing your score on ones you don’t care about and are objective, and the ones you care deeply about and are basically subjective. The question is, can you overcome that subjectivity—emotional stance, and all of it—using these techniques and thereby become an honest witness of the world? Has the work so far indicated anything of how those two play out?              

Tetlock:    We don’t really have data that speaks very directly to that. There is experimental literature though that looks at these offsetting motives—the desire to be accurate on the one hand and the desire to reach a specific conclusion on the other. Obviously, caring a lot about the issue is going to motivate you to immerse yourself in material, but if it motivates you to immerse yourself in the material solely for the purpose of creating a biased case for a particular conclusion, that’s not going to be very helpful for accuracy. We don’t know this, but there probably is some moderate optimal level of stress for promoting performance.              

Mellers:    A superforecaster would talk about this. I remember there was question about whether Assad would be in power until day X. A week or two before day X, people were saying, “I can’t believe this, but I’m hoping that Assad will be in power for the next week. I think I’m losing my moral compass.” The recognition of this tension is step one.              

Brand:    They’re responding to the enforced objectivity surprising them that they care now that the objective perspective is making cynics out of them. It’s what I would use as a public observer. If I can record my own votes, my own forecasts, I would be watching for the ones I care about versus the ones I don’t care about to see if wrong- or rightness plays out over time. If I’m always wrong or often wrong about the ones I care about, that would be an invitation to adjust at a pretty deep level.              

Tetlock:    That is the big challenge.              

Kamen:    I was told by somebody whom I would never contradict, Professor Kahneman, that if you step way back from where we started with your comments in the morning yesterday about this extraordinary group of people you’ve collected—they give you twenty, thirty, or forty hours a week for $250 prize—I can’t help but think of how parallel what you are doing is to my collecting mentors for my first program, which I’ve been doing for twenty-five years. By comparison, I now have 125,000 of them, working with 40-some-odd-thousand schools. Maybe the whole reason you got people that are 20 percent better than everybody else has nothing to do with anything you’re talking about. You found a way to select people that have so much passion, that are already so good, that love it so much—anybody that loves something and does it as a hobby will do better than somebody who you pay to do it professionally, whether it’s painting your house or whatever.                                 

Maybe you’re guilty of some of your own biases, or not looking at the raw data properly. Even if you did turn them into professionals, maybe [they’d] lose what [they] got because now they are under the pressure. It’s worse than [you] didn’t find the ones with passion, it’s, “It’s a job, I’ve got to get it done by 5:00,” and the pressure of the boss.                                 

The final parallel that I would offer up for you to think about is, we have been told more than once over the years that this is going to scale and get into all the schools, and we need to start paying people. I listened to the President of the United States in one of his State of the Unions, and I talked to him about it a few weeks ago. He said, “I want to have 60,000 new math and science teachers in the schools.” You look at why they aren’t there, well, somebody that graduates from college with a degree in English or journalism can go for various kinds of jobs. A teacher, sadly, probably gets paid less than some of the others, but nowhere near [the amount] somebody who graduates with a degree in computer science gets. You say, “Mr. President, you can’t find 60,000 people. You won’t be able to afford them to put them in a classroom. Thirdly, what makes you think they want to be in a classroom? They chose to do this.” For all sorts of reasons, we are where we are because we’re here, there’s a reason, you’re not going to fix that.                                 

On the other hand, I already have twice that many people. I have more than doubled the number of people you claim you need who have true expertise in engineering, and they donate all their time free. They’re available, they’re here, they’re working. They’re like your people; every one of them is probably doing more for these kids in these programs than anybody you could find and pay because they’re passionate about what they do. They care about the issue; they’re trying to train these kids. You say, “Yes, but it’s not solving the problem. Building a robot didn’t help anybody,” but your whole point from the beginning was: I want to make a group of people that are better at forecasting. I would argue that first we want to make kids better at understanding engineering.                                 

In order to do that, we had to attract world class people in a way that’s fun. We didn’t say, “Do a job that has value today,” we work hard to make it fun, like your competition. If you did try to turn it into something other than that, you will immediately take on the trappings of “It’s work, it’s progression, it’s a job,” you’re going to go into the same gene pool of people that are already doing it professionally.                                 

I would go to the other extreme and come up with some fun, exciting prognostications that you make public, make it clear that all the “experts” are nowhere to be seen, make sure all people with passion run out to do it and win big prizes because it’s fun. It will involve the public and you will make a first analogy to getting people that care about the issue to excite other people to care about an issue. They will all develop core talent and expertise by practice at those issues, and then they will later self-select to become the professionals that do it better than everybody else.                                 

You need to make sure it’s fun, make sure that the public can understand what’s going on. You’re very sophisticated, very academic. Everybody around this table finds things interesting that most of the public not only doesn’t find interesting, they can’t even understand what you’re talking about. You’ve got to make this thing much more human at some level, make it more exciting, and get more people involved in it, and let the rest of it take care of itself.              

Tetlock:    There was a throwaway line you said that I want to fixate on. You didn’t put a lot of emphasis on it, but it’s just exactly right. We are where we are for a reason. We have the stale status hierarchies we have and the punditocracy in our democracy; they exist for a reason. The major players do not have strong incentives for unilateral defection, which, in game theory terms, means that we may be locked in a suboptimal equilibrium trap. It’s also true that we are where we are for a reason. Barb and I and the other people on the Good Judgment Project are academics, we don’t know how to do a lot of the things that need to be done to scale this up. We don’t have that skill set. We have some advisory capacity and we could be helpful, but making that transition, someone else needs to take this over and do it, someone with the right connections, the right resources, the right skill sets beyond the academics. We are where we are for a reason, I like that. We should just close on that phrase.

John Brockman:    It seems to me that this group, this subject, is way too sophisticated for the general audience. If it were to be presented to a general audience, let’s say, in a book, it would have to be addressed as such in the form that the so-called general audience, which I don’t believe there is a general audience, would be ready for. Right now, I don’t think there is a general audience that’s ready to hear this.              

Tetlock:    We need to move to scale this up because the activity will be too easy to ignore otherwise.              

Kamen:    We don’t have enough kids doing math, science, engineering, and things that will matter to us and to their future, but it’s hard work and they don’t understand it, so let’s just talk to engineers. The whole point is that, given the right opportunity, given the right incentives to all kids, in particular, women and minorities who never thought they could or should or would be able, if they put their passion there, they can. The goal is to get way more people involved in something good for them and good for the future. To do it, you’re not dumbing it down, you’re just making it available in a way that the intellectuals, in this case the ivory towers of engineers, never did before.                                 

If you want the public to not be supporting people that send them into dumb wars or do policies that are self-destructive in a democracy, you can’t just assume that the elites will solve the problem. Your whole premise has been that we listen to the ones that are superficially dumb, that always have an easy answer that’s simple, easy to understand, but its common element is it’s typically wrong.                                 

Given the right incentives, more people could get better at predicting properly, and maybe the public would care about that and watch for it. Instead of assuming you’ve got to just talk to the people that might be interested, you’ve got to redouble your efforts to make it something that is accessible and fun and appreciated by a much broader group of people, first by pointing out how important it is. I don’t think that’s hard to do. Everybody in the world wants one thing more than anything else: I want to know what’s going to happen to myself, to my kids, to the stock market. I don’t think it’s a hard premise to say to people, “Would you like to be able to predict the future better than you can now?” Who’s not going to like that? I don’t think you have to keep this thing at an esoteric, academic plane in order to get what you want out of it. You don’t have to dumb it down, either; you just have to figure out how to relate to a broader group of people.              

Brockman:    I would agree with most of what you said. I wouldn’t use the word “elites,” I would say odd, or interesting people.              

Wallach:     You look at the audience for existing punditry and it’s ravenous. People watch this all day, every day. Everyone has MSNBC on.

All they’re watching is people giving opinions. People also love competition. I can’t imagine, just in theory, that people would like that punditry more if there were a competitive element layered onto it.              

Ghonim:    It’s entertaining.     

Wallach:     It’s more entertaining, right? It makes it more dramatic.

Peter Lee:    This feels right to me as well, but one of the tricky things is with something like robotics. Policy wise and politically, it’s neutral or positive. Where the questions come from and what they are, especially in the early…              

Tetlock:    That was one reason for institutionalizing a setup in which each side would nominate questions; there would be some neutral ground. You don’t want the tournament to be perceived as somehow partisan. There has to be the appearance and the reality—as close to reality as you can get—of a level playing field. I have to say what Dean has said just captures almost perfectly what my hope is. I see no better note on which to close.