2 Sources
2 Sources
[1]
Bolmo's architecture unlocks efficient byte‑level LM training without sacrificing quality
Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche -- and make it practical at scale -- the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by "bytefiying" them and reusing their backbone and capabilities. The company launched two versions, Bolmo 7B and Bolmo 1B, which are "the first fully open byte-level language model," according to Ai2. The company said the two models performed competitively with -- and in some cases surpassed -- other byte-level and character-based models. Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably -- key requirements for moderation, edge deployments, and multilingual applications. For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operational complexity. Ai2's Bolmo is an attempt to make that approach practical at scale -- without retraining from scratch. How Bolmo works and how it was built Ai2 said it trained the Bolmo models using its Dolma 3 data mix, which helped train its Olmo flagship models, and some open code datasets and character-level data. The company said its goal "is to provide a reproducible, inspectable blueprint for byteifying strong subword language models in a way the community can adopt and extend." To meet this goal, Ai2 will release its checkpoints, code, and a full paper to help other organizations build byte-level models on top of its Olmo ecosystem. Since training a byte-level model completely from scratch can get expensive, Ai2 researchers instead chose an existing Olmo 3 7B checkpoint to byteify in two stages. In the first stage, Ai2 froze the Olmo 3 transformer so that they only train certain parts, such as the local encoder and decoder, the boundary predictor, and the language modeling head. This was designed to be "cheap and fast" and requires just 9.8 billion tokens. The next stage unfreezes the model and trains it with additional tokens. Ai2 said the byte-level approach allows Bolmo to avoid the vocabulary bottlenecks that limit traditional subword models. Strong performance among its peers Byte-level language models are not as mainstream as small language models or LLMs, but this is a growing field in research. Meta released its BLT architecture research last year, aiming to offer a model that is robust, processes raw data, and doesn't rely on fixed vocabularies. Other research models in this space include ByT5, Stanford's MrT5, and Canine. Ai2 evaluated Bolmo using its evaluation suite, covering math, STEM reasoning, question answering, general knowledge, and code. Bolmo 7B showed strong performance, outperforming character-focused benchmarks like CUTE and EXECUTE, and also improving accuracy over the base LLM Olmo 3. Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding. Why enterprises may choose byte-level models Enterprises find value in a hybrid model structure, using a mix of models and model sizes. Ai2 makes the case that organizations should also consider byte-level models not only for robustness and multilingual understanding, but because it "naturally plugs into an existing model ecosystem." "A key advantage of the dynamic hierarchical setup is that compression becomes a toggleable knob," the company said. For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By retrofitting a strong subword model rather than training from scratch, Ai2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.
[2]
World's 1st byte-level AI models: Bolmo 7B and 1B explained, how are they different
Allen Institute Bolmo models challenge tokenized LLMs with raw byte processing The Allen Institute for AI (Ai2) has released Bolmo, a new family of AI models that represents a shift in how machines can process language. While byte-level architectures like ByT5 and Byte Latent Transformer have existed in research circles, Bolmo 7B and 1B are distinct as the first fully open model family to implement this approach competitively at these parameter scales. This release addresses specific limitations found in standard Large Language Models (LLMs) such as Llama or GPT. By removing the tokenizer and reading raw text as bytes, these models offer a robust alternative for handling noisy data and languages that are often underrepresented in standard token vocabularies. Also read: AI agents to India's internet boom: Cloudflare highlights 5 epic things of 2025 Standard LLMs do not read text character-by-character. They rely on a tokenizer which breaks text into chunks called tokens based on a fixed vocabulary. This works well for standard English but can be brittle when facing typos or rare words that fall outside that list. Bolmo eliminates the tokenizer. It reads raw UTF-8 bytes directly. Its vocabulary consists of the 256 possible values of a byte. This allows the model to process the atomic structure of data without relying on a pre-defined dictionary. The result is a model that is natively robust to spelling errors and "noisy" text because it cannot encounter an "unknown" token. The researchers achieved this by adapting existing high-performance models using a technique called "byteification," effectively retrofitting a byte-level input mechanism onto a standard transformer architecture. Also read: I took a 10000mAh MagSafe power bank on vacation, and it wasn't from Apple The two models in this release differ primarily in their base architecture and intended scale. Bolmo 7B is the larger and more capable model. It was adapted from the Olmo 3 7B base. Because it builds upon the newer Olmo 3 architecture, it retains strong general-purpose capabilities. In benchmarks, it demonstrates that a byte-level model can achieve performance parity with standard token-based models of the same size, while showing particular strength in tasks requiring character-level manipulation. Bolmo 1B is the smaller variant, derived from the Olmo 2 1B base. Its smaller parameter count makes it significantly faster and less computationally intensive to run. While it does not share the same advanced base architecture as the 7B version, it provides a functional entry point for byte-level processing on hardware with limited resources. The significance of Bolmo is not that it renders tokenizers obsolete, but that it proves tokenizer-free models can be competitive. The documentation shows that these models do not suffer the significant performance penalty often associated with byte-level processing in the past. They offer developers and researchers a functional, open-weight option for applications where standard tokenization fails, such as processing garbled text, complex code strings, or highly morphological languages.
Share
Share
Copy Link
The Allen Institute for AI has released Bolmo 7B and 1B, the first fully open byte-level models that eliminate tokenizers by processing raw UTF-8 bytes directly. Built by adapting Olmo 3 models through a byteification technique, these models handle misspellings, rare languages, and noisy text more reliably than traditional tokenized LLMs while maintaining competitive performance.
The Allen Institute for AI (Ai2) has unveiled Bolmo, a family of open byte-level language models that marks a significant shift in how AI systems process text
1
. The release includes Bolmo 7B and 1B, which Ai2 describes as "the first fully open byte-level language model" designed to handle multilingual applications and noisy text without relying on traditional tokenization1
. Unlike standard LLMs such as GPT or Llama that break text into predefined token chunks, these tokenizer-free AI models operate directly on raw UTF-8 bytes, using a vocabulary of just 256 possible byte values2
.Byte-level models eliminate the brittleness inherent in traditional subword models by processing raw text as bytes. Standard tokenization works well for common English text but struggles with typos, rare words, and underrepresented languages that fall outside fixed vocabularies
2
. By reading text at the atomic byte level, Bolmo cannot encounter an "unknown" token, making it natively robust to noisy text, spelling errors, and unconventional inputs2
. This approach proves particularly valuable for enterprises deploying AI across moderation systems, edge deployments, and multilingual applications where reliability matters more than perfect accuracy on clean data1
.
Source: Digit
Rather than training from scratch, Ai2 researchers developed a cost-effective approach by adapting Olmo 3 models using what they call "byteification"
2
. The process occurred in two stages: first, researchers froze the Olmo 3 transformer architecture while training only specific components like the local encoder and decoder, boundary predictor, and language modeling head using just 9.8 billion tokens1
. The second stage unfroze the model and trained it with additional tokens from Ai2's Dolma 3 data mix, which also powered the flagship Olmo models, along with open code datasets and character-level data1
. This retrofitting approach signals a lower-risk path for organizations wanting robustness without abandoning existing infrastructure1
.Related Stories
Bolmo 7B demonstrated strong performance across Ai2's evaluation suite covering math, STEM reasoning, question answering, general knowledge, and code
1
. The model outperformed character-focused benchmarks like CUTE and EXECUTE while also improving accuracy over the base Olmo 3 LLM1
. In tasks requiring character-level manipulation, coding, math, and multiple-choice QA, Bolmo 7B surpassed models of comparable size1
. The documentation shows these models achieve performance parity with standard token-based transformer architecture systems without suffering the significant performance penalty historically associated with byte-level processing2
. While byte-level models remain less mainstream than typical LLMs, the field is growing with research efforts like Meta's BLT architecture, ByT5, Stanford's MrT5, and Canine1
.Bolmo 1B, derived from the Olmo 2 1B base, offers a smaller parameter count that makes it faster and less computationally intensive for hardware with limited resources
2
. Ai2 positions these open byte-level language models as part of a hybrid model strategy, arguing that organizations should consider them not only for robustness and multilingual understanding but because the technology "naturally plugs into an existing model ecosystem"1
. The dynamic hierarchical setup makes compression a toggleable feature, offering flexibility for enterprises already running heterogeneous model stacks1
. To support adoption, Ai2 will release model checkpoints, code, and a full paper, providing what the company calls "a reproducible, inspectable blueprint for byteifying strong subword models in a way the community can adopt and extend"1
. This open-source AI approach enables developers to build functional solutions for applications where standard tokenization fails, such as processing garbled text, complex code strings, or highly morphological languages2
.Summarized by
Navi
[1]
27 Nov 2024•Technology

20 Nov 2025•Technology

17 Apr 2025•Technology

1
Business and Economy

2
Policy and Regulation

3
Technology
