Allen Institute for AI launches Bolmo, the first fully open byte-level models for robust text processing

Reviewed byNidhi Govil

2 Sources

Share

The Allen Institute for AI has released Bolmo 7B and 1B, the first fully open byte-level models that eliminate tokenizers by processing raw UTF-8 bytes directly. Built by adapting Olmo 3 models through a byteification technique, these models handle misspellings, rare languages, and noisy text more reliably than traditional tokenized LLMs while maintaining competitive performance.

Allen Institute for AI Introduces Bolmo to Transform Language Processing

The Allen Institute for AI (Ai2) has unveiled Bolmo, a family of open byte-level language models that marks a significant shift in how AI systems process text

1

. The release includes Bolmo 7B and 1B, which Ai2 describes as "the first fully open byte-level language model" designed to handle multilingual applications and noisy text without relying on traditional tokenization

1

. Unlike standard LLMs such as GPT or Llama that break text into predefined token chunks, these tokenizer-free AI models operate directly on raw UTF-8 bytes, using a vocabulary of just 256 possible byte values

2

.

How Byte-Level Models Address Vocabulary Bottlenecks

Byte-level models eliminate the brittleness inherent in traditional subword models by processing raw text as bytes. Standard tokenization works well for common English text but struggles with typos, rare words, and underrepresented languages that fall outside fixed vocabularies

2

. By reading text at the atomic byte level, Bolmo cannot encounter an "unknown" token, making it natively robust to noisy text, spelling errors, and unconventional inputs

2

. This approach proves particularly valuable for enterprises deploying AI across moderation systems, edge deployments, and multilingual applications where reliability matters more than perfect accuracy on clean data

1

.

Source: Digit

Source: Digit

Adapting Olmo 3 Models Through Byteification Technique

Rather than training from scratch, Ai2 researchers developed a cost-effective approach by adapting Olmo 3 models using what they call "byteification"

2

. The process occurred in two stages: first, researchers froze the Olmo 3 transformer architecture while training only specific components like the local encoder and decoder, boundary predictor, and language modeling head using just 9.8 billion tokens

1

. The second stage unfroze the model and trained it with additional tokens from Ai2's Dolma 3 data mix, which also powered the flagship Olmo models, along with open code datasets and character-level data

1

. This retrofitting approach signals a lower-risk path for organizations wanting robustness without abandoning existing infrastructure

1

.

Competitive Performance Against Established Benchmarks

Bolmo 7B demonstrated strong performance across Ai2's evaluation suite covering math, STEM reasoning, question answering, general knowledge, and code

1

. The model outperformed character-focused benchmarks like CUTE and EXECUTE while also improving accuracy over the base Olmo 3 LLM

1

. In tasks requiring character-level manipulation, coding, math, and multiple-choice QA, Bolmo 7B surpassed models of comparable size

1

. The documentation shows these models achieve performance parity with standard token-based transformer architecture systems without suffering the significant performance penalty historically associated with byte-level processing

2

. While byte-level models remain less mainstream than typical LLMs, the field is growing with research efforts like Meta's BLT architecture, ByT5, Stanford's MrT5, and Canine

1

.

Practical Applications for Enterprise Deployments

Bolmo 1B, derived from the Olmo 2 1B base, offers a smaller parameter count that makes it faster and less computationally intensive for hardware with limited resources

2

. Ai2 positions these open byte-level language models as part of a hybrid model strategy, arguing that organizations should consider them not only for robustness and multilingual understanding but because the technology "naturally plugs into an existing model ecosystem"

1

. The dynamic hierarchical setup makes compression a toggleable feature, offering flexibility for enterprises already running heterogeneous model stacks

1

. To support adoption, Ai2 will release model checkpoints, code, and a full paper, providing what the company calls "a reproducible, inspectable blueprint for byteifying strong subword models in a way the community can adopt and extend"

1

. This open-source AI approach enables developers to build functional solutions for applications where standard tokenization fails, such as processing garbled text, complex code strings, or highly morphological languages

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo