MetaforecastStatus
SearchToolsAbout

‌

‌
‌
‌
‌
‌
‌

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

Manifold Markets
★★☆☆☆
20%
Unlikely
Yes

Question description #

Resolves YES if Meta releases weights of an LLM trained on at least 60T bytes of data (roughly equivalent to the 15T tokens used to train the Llama 3.1 models) in 2025 which does not use standard fixed-vocabulary tokenization.

A qualifying model must be released under a license roughly as permissive as Llama 3.1.

This market was spurred by recent research from Meta showing a proof-of-concept for a tokenizer-free LLM. A qualifying model from Meta does not need to use the patching technique from this paper as long as it's not using tokenization.

https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

Indicators #

IndicatorValue
Stars
★★☆☆☆
PlatformManifold Markets
Forecasters7
VolumeM1.2k

Capture #

Resizable preview:
Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?
20%
Unlikely
Last updated: 2025-05-19

Resolves YES if Meta releases weights of an LLM trained on at least 60T bytes of data (roughly equivalent to the 15T tokens used to train the Llama 3.1 models) in 2025 which does not use standard fixed-vocabulary tokenization.

A qualifying model...

Last updated: 2025-05-19
★★☆☆☆
Manifold Markets
Forecasters: 7
Volume: M1.2k

Embed #

<iframe src="https://metaforecast.org/questions/embed/manifold-s6EP5PudyE" height="600" width="600" frameborder="0" />

Preview