If and when this graph is extended to 10^14 parameter models trained on 10^14 elapsed tokens of similar-quality data, will the 10^14 parameter learning curve have slowed down substantially?

Metaculus
★★★☆☆
29%
Unlikely
Yes

Question description

Consider figure 15 from this paper.

Some people (arguably the authors of this paper) predict that as we scale models past GPT-3's size (the 10^11 parameter learning curve, models with parameter count X trained on X elapsed tokens will score close to the L(D) line at X elapsed tokens.

We are interested in whether instead the trendline will "plateau" or at least be substantially slower than the line L(D) by the end of the next 3 orders of magnitude of parameter count. For the sake of specificity, let's say substantially slower = less than half as steep as L(D) on this graph.

Indicators

IndicatorValue
Stars
★★★☆☆
PlatformMetaculus
Number of forecasts109

Capture

Resizable preview:
29%
Unlikely

Consider figure 15 from this paper.

Some people (arguably the authors of this paper) predict that as we scale models past GPT-3's size (the 10^11 parameter learning curve, models with parameter count X trained on X elapsed tokens will score close to...

Last updated: 2024-04-23
★★★☆☆
Metaculus
Forecasts: 109

Embed

<iframe src="https://https://metaforecast.org/questions/embed/metaculus-6939" height="600" width="600" frameborder="0" />