If and when this graph is extended to 10^14 parameter models trained on 10^14 elapsed tokens of similar-quality data, will the 10^14 parameter learning curve have slowed down substantially?

★★★☆☆

29%

Unlikely

Yes

Question description #

Consider figure 15 from this paper.

Some people (arguably the authors of this paper) predict that as we scale models past GPT-3's size (the 10^11 parameter learning curve, models with parameter count X trained on X elapsed tokens will score close to the L(D) line at X elapsed tokens.

We are interested in whether instead the trendline will "plateau" or at least be substantially slower than the line L(D) by the end of the next 3 orders of magnitude of parameter count. For the sake of specificity, let's say substantially slower = less than half as steep as L(D) on this graph.

Indicators #

Indicator	Value
Stars	★★★☆☆
Platform	Metaculus
Number of forecasts	109