The Intelligence Advanced Research Projects Activity (IARPA) is sponsoring the Good Judgment Project, a four-year research study organized as part of a government-sponsored forecasting tournament. Thousands of people around the world predict global events. Their collective forecasts are surprisingly accurate.
The Good Judgment research team is based in the University of Pennsylvania and the University of California Berkeley. The project is led by psychologists Philip Tetlock, author of Expert Political Judgment, Barbara Mellers, an expert on judgment and decision-making, and Don Moore, an expert on overconfidence. Other team members are experts in psychology, economics, statistics, and computer science and interface design.
Experts from the Intelligence Advanced Research Projects Activity responded to questions in late January.
Q: Many organizations have pointed to your groundbreaking work. Have you been surprised by the results of the project so far? I understand a pharmacist and professor for public policy and international affairs are two of your “super forecasters” due to the accuracy of their predictions. Some media reports have suggested that the forecasters are more intuitive than the intelligence analysts?
A: We were pleasantly surprised by several findings, including the performance boosts from teaming and training and, of course, the discovery of our “superforecasters.” It’s gratifying to find evidence that forecasting is a skill that can be learned and that forecasters who achieve outstanding results in one year tend to be among the top performers in subsequent years and do not merely “regress toward the mean.”
Q: How are participants selected? Other than asking if applicants have an academic degree, do you seek to select participants?
A: Participants self-select based on their interest and willingness to devote time to the forecasting tournament. Even the “requirement” to have a bachelors’ degree can be waived for adults who have expressed interest in the Project, but who have not completed their undergraduate degree. Interested readers can still sign up to participate at the http://www.goodjudgmentproject.com.
Q: What do you think draws people to participate? What training do they receive? I understand all the materials are unclassified? Do participants work individually or always in teams?
A: Participants are drawn to the project for diverse reasons, including a desire to improve their own forecasting skills, interest in world affairs and a desire to contribute to research that may lead to more reliable forecasts for the IC and other organizations. Many say they find the tournament fun and that they enjoy competition for its own sake.
We have several modes of forecasting: some participants work individually, while others are assigned to small forecasting teams; some make their forecasts in a survey environment, whereas others place play-money bets in prediction markets.
The training participants receive depends on the (random) assignment they receive to an experimental group because we are always testing the efficacy of our training materials. Many participants receive what we call “probabilistic reasoning training” that includes tips for how to avoid common cognitive biases. We also provide training that is specific to the forecasting environment (individual versus team, survey versus prediction market).
Q: I understand that machine reasoning via the advanced algorithms used generally beats out human reasoning in predicting events. Is there any part of the analysis process where humans are more adept?
A: The Good Judgment Project actually blends human forecasting with algorithms — we use aggregation algorithms to combine individual human forecasts into the most accurate possible crowd-sourced forecasts. Although forecasting using computer models works well if there are many similar events to be forecast (sales of thousands of similar products), forecasting geopolitical events without human input is much harder. For example, it isn’t obvious what the right comparison class is when estimating the probability of a coup in a particular central African country, and there are not thousands of comparable historical events to compare to.
One of our research colleagues, Jay Ulfelder, has been collecting data that directly compare human and machine-generated forecasts as part of the Early Warning Project. These events have extremely low base-rates, so machine-generated forecasts tend to do very well for the vast majority of cases because the algorithms predict (correctly) that the events won’t occur. Preliminary results, however, suggest that human forecasters may have an edge in detecting changes in the level of risk over time, something that is obviously of great importance to policymakers.
Q: This is Season 4 for the GJP. I understand there will be several modes of forecasting and/or participation in “side experiments” — can you talk about what the questions will be?
A: This year’s forecasting questions are grouped into several “clusters” that represent the broad policy questions on which we’re hoping to shed light. The clusters include:
- Will China become more confrontational in the Asia-Pacific and/or towards the United States?
- Will Iran become more cooperative in the Middle East and/or towards the United States?
- Will Russia become more confrontational towards its neighbors, Europe, and/or the United States?
- Will the global economy become more volatile?
- Will North Korea become more confrontational in the Asia-Pacific and/or towards the United States?
- Will Europe move towards tighter economic and political integration?
- Will environmental issues and/or resource scarcity affect interstate relations and/or regional/global stability?
- Will the global trade regime become more protectionist?
- Will domestic or bilateral conflicts in the Middle East/North Africa contribute to regional instability?
Within each cluster, we offer numerous specific forecasting questions. For example, within the cluster about European economic and political integration, we asked a question in fall 2014 about whether voters in Scotland would pass the independence referendum, and within the Iran cluster, we have a question currently open that asks when Iran will release Jason Rezaian, the Washington Post’s Tehran bureau chief, who has been detained for over five months. We also have a “grab bag” category for questions that don’t fall within any of the clusters, but are nonetheless of significant geopolitical interest.
Q: Have you shared the results of the project with the intelligence community yet? Is it too early to determine how the results of the project will be used — perhaps to change hiring practices — or to improve outcomes by assembling teams based on the results of your study?
A: GJP regularly sends the forecasts results to IARPA for data analysis.
Q: What role has technology played in the participants’ forecasting?
A: Participation occurs entirely online, so forecasters use their own technology (computer and Internet connection) to access the background data survey, training materials and forecasting websites. The forecasting websites themselves are part of the technology package: GJP has developed a survey platform, with a special emphasis on tools to assist team-based forecasting, and has worked with other vendors to enhance forecasting via prediction markets and a special platform designed for an experiment we’re running in Year 4 to see how best we can elicit and aggregate forecasts on “continuous” outcomes (developing probability estimates for the entire range of dates during which an outcome of interest such as a negotiated end to a conflict might occur).
And, of course, most participants also use web searches to locate information that may help them to make more accurate forecasts. Finally, as noted above, the Good Judgment Project research team makes extensive use of aggregation algorithms to combine individual forecasts into “wisdom-of-the-crowd” predictions that are generally more accurate than the forecasts of any single participant.
Q: How does the GJP fit into the Aggregative Contingent Estimation program?
A: IARPA’s ACE Program is sponsoring the GJP research. The goal of the Aggregative Contingent Estimation program is to generate accurate and timely probabilistic forecasts for geopolitical events by aggregating the judgments of many widely-dispersed analysts. The approach has been to elicit, weight, and combine independent forecasts from over 15,000 research participants, using information about the participants and their patterns of judgment. The potential impact of the program is to improve intelligence estimates to support decision-makers, identify the most accurate analysts, and measure the effects of analytic training and tradecraft. ACE has achieved a 50-plus percent reduction in error compared to the current state-of-the-art.
IARPA ACE contracts were competitively awarded; at the start of year one of the program in 2011, five ACE performer teams were funded. The ACE program was structured as a forecasting tournament, with teams competing with each other to develop the most accurate forecasting methods. One performer, the Good Judgment Project, led by researchers at the University of Pennsylvania and UC-Berkeley, outperformed the other teams on accuracy by such a significant margin that GJP was the only ACE team IARPA has funded beginning in Year 3 of the program.