Metaforecast

Question description #

On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar was way too low - current accuracy is ~75% or ~59% depending on the benchmark.

The GPT-4 paper, in section 5 on "Limitations", describes the problem of hallucinations, i.e. producing seemingly-reasonable text that is factually incorrect.

GPT-4 uses two evals for factuality. One is an internal OpenAI set of "adversarially-designed factuality evaluations", Figure 6, on which GPT-4 scored ~75% across 9 domains, compared to ~55% for GPT-3.5.

The other is a public benchmark, TruthfulQA, which "tests the model’s ability to separate fact from an adversarially-selected set of incorrect statements", in Figure 7, on which GPT-4 scored ~59%, compared to ~47% for GPT-3.5-turbo.

Indicators #

Indicator	Value
Stars	★★★☆☆
Platform	Metaculus
Number of forecasts	124

Capture #

Resizable preview:

Will OpenAI release an LLM product or API that hallucinates 5x less than GPT-4 did when it was released, by June 30, 2025?

80%

Likely

Last updated: 2024-10-07

On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar...

Last updated: 2024-10-07

★★★☆☆

Metaculus

Forecasts: 124

Embed #

Preview