2 Sources
2 Sources
[1]
Small AI Models Are Changing the Game for Developers
While most of the AI world is racing to build ever-bigger language models like OpenAI's GPT-5 and Anthropic's Claude Sonnet 4.5, the Israeli AI startup AI21 is taking a different path. AI21 has just unveiled Jamba Reasoning 3B, a 3-billion-parameter model. This compact, open-source model can handle massive context windows of 250,000 tokens (meaning that it can "remember" and reason over much more text than typical language models) and can run at high speed, even on consumer devices. The launch highlights a growing shift: smaller, more efficient models could shape the future of AI just as much as raw scale. "We believe in a more decentralized future for AI -- one where not everything runs in massive data centers," says Ori Goshen, Co-CEO of AI21, in an interview with IEEE Spectrum. "Large models will still play a role, but small, powerful models running on devices will have a significant impact" on both the future and the economics of AI, he says. Jamba is built for developers who want to create edge-AI applications and specialized systems that run efficiently on-device. AI21's Jamba Reasoning 3B is designed to handle long sequences of text and challenging tasks like math, coding, and logical reasoning -- all while running with impressive speed on everyday devices like laptops and mobile phones. Jamba Reasoning 3B can also work in a hybrid setup: simple jobs are handled locally by the device, while heavier problems get sent to powerful cloud servers. According to AI21, this smarter routing could dramatically cut AI infrastructure costs for certain workloads -- potentially by an order of magnitude. With 3 billion parameters, Jamba Reasoning 3B is tiny by today's AI standards. Models like GPT-5 or Claude run well past 100 billion parameters, and even smaller models, such as Llama 3 (8B) or Mistral (7B), are more than twice the size of AI21's model, Goshen notes. That compact size makes it more remarkable that AI21's model can handle a context window of 250,000 tokens on consumer devices. Some proprietary models, like GPT-5, offer even longer context windows, but Jamba sets a new high-water mark among open-source models. The previous open-model record of 128,000 tokens was held by Meta's Llama 3.2 (3B), Microsoft's Phi-4 Mini, and DeepSeek R1, which are all much larger models. Jamba Reasoning 3B can process more than 17 tokens per second even when working at full capacity -- that is, with extremely long inputs that use its full 250,000-token context window. Many other models slow down or struggle once their input length exceeds 100,000 tokens. Goshen explains that the model is built on an architecture called Jamba, which combines two types of neural network designs: transformer layers, familiar from other large language models, and Mamba layers, which are designed to be more memory-efficient. This hybrid design enables the model to handle long documents, large codebases, and other extensive inputs directly on a laptop or phone -- using about one-tenth the memory of traditional transformers. Goshen says the model runs much faster than traditional transformers because it relies less on a memory component called the KV cache, which can slow down processing as inputs get longer. The model's hybrid architecture gives it an advantage in both speed and memory efficiency, even with very long inputs, confirms a software engineer who works in the LLM industry. The engineer requested anonymity because they're not authorized to comment on other companies' models. As more users run generative AI locally on laptops, models need to handle long context lengths quickly without consuming too much memory. At 3 billion parameters, Jamba meets these requirements, says the engineer, making it a model that's optimized for on-device use. Jamba Reasoning 3B is open source under the permissive Apache 2.0 license and available on popular platforms such as Hugging Face and LM Studio. The release also comes with instructions for fine-tuning the model through an open-source reinforcement-learning platform (called VERL), making it easier and more affordable for developers to adapt the model for their own tasks. "Jamba Reasoning 3B marks the beginning of a family of small, efficient reasoning models," Goshen said. "Scaling down enables decentralization, personalization, and cost efficiency. Instead of relying on expensive GPUs in data centers, individuals and enterprises can run their own models on devices. That unlocks new economics and broader accessibility."
[2]
A121 Labs' Jamba Reasoning 3B is a powerful tiny model that promises to transform AI economics - SiliconANGLE
A121 Labs' Jamba Reasoning 3B is a powerful tiny model that promises to transform AI economics Generative artificial intelligence developer AI21 Labs Inc. says it wants to bring agentic AI workloads out of the data center and onto user's devices with its newest model, Jamba Reasoning 3B. Launched today, Jamba Reasoning 3B is one of the smallest models the company has ever released, and is the latest addition to the Jamba family of open-source models available under an Apache 2.0 license. It's a small language model or SLM that's built atop of AI21 Labs' own hybrid SSM-transformer architecture, making it different from most large language models, which are based on transformer-only frameworks. SSM means that it's a "state space model", which refers to a class of highly-efficient algorithms for sequential modeling that identify a current state and then predict what the next state will be. Jamba Reasoning 3B combines the Transformers architecture with AI21 Labs' own Mamba neural network architecture and boasts a context window length of 256,000 tokens, with the ability to handle up to one million. It demonstrates efficiency grains of between two-and-five times that of similar lightweight models. In a blog post, the company explained that Jamba Reasoning 3B utilizes rope scaling technology to stretch its attention mechanism, allowing it to handle tasks with much less compute power than larger models. AI21 Labs highlighted its impressive performance, with a "combined intelligence" and "output tokens per second" ratio that surpasses similarly sized LLMs such as Alibaba Cloud's Qwen 3.4B, Google LLC's Gemma 3.4B, Meta Platforms Inc.'s Llama 3.2 3B, IBM Corp's Granite 4.0 Micro and Microsoft's Phi-4 Mini. That evaluation was based on a series of benchmarks, including IFBench, MMLU-Pro and Humanity's Last Exam. AI21 Labs believes there will be a big market for tiny language models like Jamba Reasoning 3B, which is designed to be customized using retrieval-augmented generation techniques that provide it with more contextual knowledge. The company cites research that shows how anything from 40% to 70% of AI tasks in enterprises can be handled efficiently by smaller models. In doing so, companies can benefit from 10-to-30-times lower costs. "On-device SLMs like Jamba Reasoning 3B enable cost-effective, heterogeneous compute allocation -- processing simple tasks locally while reserving cloud resources for complex reasoning," the company explained. SLMs can also power most "AI agents", which perform tasks autonomously on behalf of human workers, with a high degree of efficiency, the company said. In agentic workflows, Jamba Reasoning 3B can act like an "on-device controller" orchestrating their operations, activating cloud-baed LLMs only when the extra compute power is needed to get more sophisticated tasks done. This means that SLMs can potentially power much lower latency agentic workflows, with additional benefits such as offline resilience and enhanced data privacy. "This ushers in a decentralized AI era, akin to the 1980s' shift from mainframes to personal computers, empowering local computation while seamlessly integrating cloud capabilities for greater scalability," the company wrote. AI21 Labs co-Chief Executive Ori Goshen told VentureBeat in an interview that SLMs like Jamba Reasoning 3B can free up data centers to focus only on the hardest AI problems and help to solve economic challenges faced by the industry. "What we're seeing right now in the industry is an economics issue, where there are very expensive data center build-outs, and the revenue that is generated [from them] versus the depreciation rate of all their chips shows that the math doesn't add up," he explained. The company provided a number of examples of where AI is better processed locally by SMBs. For instance, contact centers can run customer service agents on small devices to handle customer calls and decide if they can handle issues themselves, if a more powerful model should do it, or if the issue needs to be taken care of by a human agent. Futurum Group analyst Brad Shimmin told AI Business that the theory behind state space models is an old one, but until recently the technology hasn't existed to create them. "Now you can use this state space model idea because it scales really well and is extremely fast," he said.
Share
Share
Copy Link
AI21 Labs introduces Jamba Reasoning 3B, a compact 3-billion-parameter AI model with impressive capabilities. This small language model challenges the trend of ever-larger AI systems, offering efficiency and versatility for on-device applications.
Israeli AI startup AI21 Labs has unveiled Jamba Reasoning 3B, a groundbreaking 3-billion-parameter language model that challenges the prevailing trend of ever-larger AI systems. This compact, open-source model boasts an impressive 250,000-token context window and can run efficiently on consumer devices, marking a significant shift in AI development
1
.Jamba Reasoning 3B combines transformer layers with Mamba layers, creating a hybrid architecture that enables efficient processing of long documents and extensive inputs directly on laptops or phones. The model can handle up to 17 tokens per second, even when working at full capacity with its maximum context window
1
.AI21 Labs claims that Jamba Reasoning 3B outperforms similarly sized models in terms of combined intelligence and output tokens per second. Benchmarks include comparisons with models from Alibaba Cloud, Google, Meta, IBM, and Microsoft
2
.The introduction of Jamba Reasoning 3B could have far-reaching implications for AI economics. AI21 Labs suggests that 40% to 70% of enterprise AI tasks can be efficiently handled by smaller models, potentially reducing costs by 10 to 30 times
2
.Ori Goshen, Co-CEO of AI21, emphasizes the potential for a more decentralized AI future: "Large models will still play a role, but small, powerful models running on devices will have a significant impact"
1
.Related Stories
Jamba Reasoning 3B is designed for developers creating edge-AI applications and specialized systems. Its hybrid setup allows for local processing of simple tasks while routing more complex problems to cloud servers. This approach could dramatically reduce AI infrastructure costs for certain workloads
1
.The model's efficiency makes it suitable for various applications, including:
2
Brad Shimmin, an analyst at Futurum Group, notes that while the theory behind state space models is not new, recent technological advancements have made their implementation possible. "Now you can use this state space model idea because it scales really well and is extremely fast," Shimmin explains
2
.As the AI industry grapples with the economics of large-scale data center deployments, models like Jamba Reasoning 3B offer a potential solution. By enabling more efficient, decentralized AI processing, these small language models could reshape the landscape of AI development and deployment in the coming years.
Summarized by
Navi
[1]