If and when this graph is extended to 10^14 parameter models trained on 10^14 elapsed tokens of similar-quality data, will the 10^14 parameter learning curve have slowed down substantially?
Question description
Consider figure 15 from this paper.
Some people (arguably the authors of this paper) predict that as we scale models past GPT-3's size (the 10^11 parameter learning curve, models with parameter count X trained on X elapsed tokens will score close to the L(D) line at X elapsed tokens.
We are interested in whether instead the trendline will "plateau" or at least be substantially slower than the line L(D) by the end of the next 3 orders of magnitude of parameter count. For the sake of specificity, let's say substantially slower = less than half as steep as L(D) on this graph.
Indicators
Indicator | Value |
---|---|
Stars | ★★★☆☆ |
Platform | Metaculus |
Number of forecasts | 109 |
Capture
Consider figure 15 from this paper.
Some people (arguably the authors of this paper) predict that as we scale models past GPT-3's size (the 10^11 parameter learning curve, models with parameter count X trained on X elapsed tokens will score close to...
Embed
<iframe src="https://https://metaforecast.org/questions/embed/metaculus-6939" height="600" width="600" frameborder="0" />