We aren't currently maintaining Metaforecast. We hope to do so again in the future.
On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar was way too low - current accuracy is ~75% or ~59% depending on the benchmark.
The GPT-4 paper, in section 5 on "Limitations", describes the problem of hallucinations, i.e. producing seemingly-reasonable text that is factually incorrect.
GPT-4 uses two evals for factuality. One is an internal OpenAI set of "adversarially-designed factuality evaluations", Figure 6, on which GPT-4 scored ~75% across 9 domains, compared to ~55% for GPT-3.5.
The other is a public benchmark, TruthfulQA, which "tests the model’s ability to separate fact from an adversarially-selected set of incorrect statements", in Figure 7, on which GPT-4 scored ~59%, compared to ~47% for GPT-3.5-turbo.
| Indicator | Value |
|---|---|
| Stars | ★★★☆☆ |
| Platform | Metaculus |
| Number of forecasts | 124 |
On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar...