Will OpenAI release an LLM product or API that hallucinates 5x less than GPT-4 did when it was released, by June 30, 2025?

Metaculus
★★★☆☆
80%
Likely
Yes

Question description

On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar was way too low - current accuracy is ~75% or ~59% depending on the benchmark.

The GPT-4 paper, in section 5 on "Limitations", describes the problem of hallucinations, i.e. producing seemingly-reasonable text that is factually incorrect.

GPT-4 uses two evals for factuality. One is an internal OpenAI set of "adversarially-designed factuality evaluations", Figure 6, on which GPT-4 scored ~75% across 9 domains, compared to ~55% for GPT-3.5.

The other is a public benchmark, TruthfulQA, which "tests the model’s ability to separate fact from an adversarially-selected set of incorrect statements", in Figure 7, on which GPT-4 scored ~59%, compared to ~47% for GPT-3.5-turbo.

Indicators

IndicatorValue
Stars
★★★☆☆
PlatformMetaculus
Number of forecasts124

Capture

Resizable preview:
80%
Likely

On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar...

Last updated: 2024-10-07
★★★☆☆
Metaculus
Forecasts: 124

Embed

<iframe src="https://https://metaforecast.org/questions/embed/metaculus-17443" height="600" width="600" frameborder="0" />