Will OpenAI release an LLM product or API that hallucinates 5x less than GPT-4 did when it was released, by June 30, 2025?
Question description
On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar was way too low - current accuracy is ~75% or ~59% depending on the benchmark.
The GPT-4 paper, in section 5 on "Limitations", describes the problem of hallucinations, i.e. producing seemingly-reasonable text that is factually incorrect.
GPT-4 uses two evals for factuality. One is an internal OpenAI set of "adversarially-designed factuality evaluations", Figure 6, on which GPT-4 scored ~75% across 9 domains, compared to ~55% for GPT-3.5.
The other is a public benchmark, TruthfulQA, which "tests the model’s ability to separate fact from an adversarially-selected set of incorrect statements", in Figure 7, on which GPT-4 scored ~59%, compared to ~47% for GPT-3.5-turbo.
Indicators
Indicator | Value |
---|---|
Stars | ★★★☆☆ |
Platform | Metaculus |
Number of forecasts | 124 |
Capture
On June 10, 2023, Mustafa Suleyman, cofounder of DeepMind, tweeted that "LLM hallucinations will be largely eliminated by 2025." He elaborated this meant 80% accuracy by June 30, 2025, though Gary Marcus, Riley Goodside, and others responded this bar...
Embed
<iframe src="https://https://metaforecast.org/questions/embed/metaculus-17443" height="600" width="600" frameborder="0" />