2 Sources
2 Sources
[1]
IBM's open source Granite 4.0 Nano AI models can run locally directly in your browser
In an industry where model size is often seen as a proxy for intelligence, IBM is charting a different course -- one that values efficiency over enormity, and accessibility over abstraction. The 114-year-old tech giant's four new Granite 4.0 Nano models, released today, range from just 350 million to 1.5 billion parameters, a fraction of the size of their server-bound cousins from the likes of OpenAI, Anthropic, and Google. These models are designed to be highly accessible: the 350M variants can run comfortably on a modern laptop CPU with 8-16GB of RAM, while the 1.5B models typically require a GPU with at least 6-8GB of VRAM for smooth performance -- or sufficient system RAM and swap for CPU-only inference. This makes them well-suited for developers building applications on consumer hardware or at the edge, without relying on cloud compute. In fact, the smallest ones can even run locally on your own web browser, as Joshua Lochner aka Xenova, creator of Transformer.js and a machine learning engineer at Hugging Face, wrote on the social network X. All the Granite 4.0 Nano models are released under the Apache 2.0 license -- perfect for use by researchers and enterprise or indie developers, even for commercial usage. They are natively compatible with llama.cpp, vLLM, and MLX and are certified under ISO 42001 for responsible AI development -- a standard IBM helped pioneer. But in this case, small doesn't mean less capable -- it might just mean smarter design. These compact models are built not for data centers, but for edge devices, laptops, and local inference, where compute is scarce and latency matters. And despite their small size, the Nano models are showing benchmark results that rival or even exceed the performance of larger models in the same category. The release is a signal that a new AI frontier is rapidly forming -- one not dominated by sheer scale, but by strategic scaling. What Exactly Did IBM Release? The Granite 4.0 Nano family includes four open-source models now available on Hugging Face: * Granite-4.0-H-1B (~1.5B parameters) - Hybrid-SSM architecture * Granite-4.0-H-350M (~350M parameters) - Hybrid-SSM architecture * Granite-4.0-1B - Transformer-based variant, parameter count closer to 2B * Granite-4.0-350M - Transformer-based variant The H-series models -- Granite-4.0-H-1B and H-350M -- use a hybrid state space architecture (SSM) that combines efficiency with strong performance, ideal for low-latency edge environments. Meanwhile, the standard transformer variants -- Granite-4.0-1B and 350M -- offer broader compatibility with tools like llama.cpp, designed for use cases where hybrid architecture isn't yet supported. In practice, the transformer 1B model is closer to 2B parameters, but aligns performance-wise with its hybrid sibling, offering developers flexibility based on their runtime constraints. "The hybrid variant is a true 1B model. However, the non-hybrid variant is closer to 2B, but we opted to keep the naming aligned to the hybrid variant to make the connection easily visible," explained Emma, Product Marketing lead for Granite, during a Reddit "Ask Me Anything" (AMA) session on r/LocalLLaMA. A Competitive Class of Small Models IBM is entering a crowded and rapidly evolving market of small language models (SLMs), competing with offerings like Qwen3, Google's Gemma, LiquidAI's LFM2, and even Mistral's dense models in the sub-2B parameter space. While OpenAI and Anthropic focus on models that require clusters of GPUs and sophisticated inference optimization, IBM's Nano family is aimed squarely at developers who want to run performant LLMs on local or constrained hardware. In benchmark testing, IBM's new models consistently top the charts in their class. According to data shared on X by David Cox, VP of AI Models at IBM Research: * On IFEval (instruction following), Granite-4.0-H-1B scored 78.5, outperforming Qwen3-1.7B (73.1) and other 1-2B models. * On BFCLv3 (function/tool calling), Granite-4.0-1B led with a score of 54.8, the highest in its size class. * On safety benchmarks (SALAD and AttaQ), the Granite models scored over 90%, surpassing similarly sized competitors. Overall, the Granite-4.0-1B achieved a leading average benchmark score of 68.3% across general knowledge, math, code, and safety domains. This performance is especially significant given the hardware constraints these models are designed for. They require less memory, run faster on CPUs or mobile devices, and don't need cloud infrastructure or GPU acceleration to deliver usable results. Why Model Size Still Matters -- But Not Like It Used To In the early wave of LLMs, bigger meant better -- more parameters translated to better generalization, deeper reasoning, and richer output. But as transformer research matured, it became clear that architecture, training quality, and task-specific tuning could allow smaller models to punch well above their weight class. IBM is banking on this evolution. By releasing open, small models that are competitive in real-world tasks, the company is offering an alternative to the monolithic AI APIs that dominate today's application stack. In fact, the Nano models address three increasingly important needs: Community Response and Roadmap Signals IBM's Granite team didn't just launch the models and walk away -- they took to Reddit's open source community r/LocalLLaMA to engage directly with developers. In an AMA-style thread, Emma (Product Marketing, Granite) answered technical questions, addressed concerns about naming conventions, and dropped hints about what's next. Notable confirmations from the thread: * A larger Granite 4.0 model is currently in training * Reasoning-focused models ("thinking counterparts") are in the pipeline * IBM will release fine-tuning recipes and a full training paper soon * More tooling and platform compatibility is on the roadmap Users responded enthusiastically to the models' capabilities, especially in instruction-following and structured response tasks. One commenter summed it up: "This is big if true for a 1B model -- if quality is nice and it gives consistent outputs. Function-calling tasks, multilingual dialog, FIM completions... this could be a real workhorse." Another user remarked: "The Granite Tiny is already my go-to for web search in LM Studio -- better than some Qwen models. Tempted to give Nano a shot." Background: IBM Granite and the Enterprise AI Race IBM's push into large language models began in earnest in late 2023 with the debut of the Granite foundation model family, starting with models like Granite.13b.instruct and Granite.13b.chat. Released for use within its Watsonx platform, these initial decoder-only models signaled IBM's ambition to build enterprise-grade AI systems that prioritize transparency, efficiency, and performance. The company open-sourced select Granite code models under the Apache 2.0 license in mid-2024, laying the groundwork for broader adoption and developer experimentation. The real inflection point came with Granite 3.0 in October 2024 -- a fully open-source suite of general-purpose and domain-specialized models ranging from 1B to 8B parameters. These models emphasized efficiency over brute scale, offering capabilities like longer context windows, instruction tuning, and integrated guardrails. IBM positioned Granite 3.0 as a direct competitor to Meta's Llama, Alibaba's Qwen, and Google's Gemma -- but with a uniquely enterprise-first lens. Later versions, including Granite 3.1 and Granite 3.2, introduced even more enterprise-friendly innovations: embedded hallucination detection, time-series forecasting, document vision models, and conditional reasoning toggles. The Granite 4.0 family, launched in October 2025, represents IBM's most technically ambitious release yet. It introduces a hybrid architecture that blends transformer and Mamba-2 layers -- aiming to combine the contextual precision of attention mechanisms with the memory efficiency of state-space models. This design allows IBM to significantly reduce memory and latency costs for inference, making Granite models viable on smaller hardware while still outperforming peers in instruction-following and function-calling tasks. The launch also includes ISO 42001 certification, cryptographic model signing, and distribution across platforms like Hugging Face, Docker, LM Studio, Ollama, and watsonx.ai. Across all iterations, IBM's focus has been clear: build trustworthy, efficient, and legally unambiguous AI models for enterprise use cases. With a permissive Apache 2.0 license, public benchmarks, and an emphasis on governance, the Granite initiative not only responds to rising concerns over proprietary black-box models but also offers a Western-aligned open alternative to the rapid progress from teams like Alibaba's Qwen. In doing so, Granite positions IBM as a leading voice in what may be the next phase of open-weight, production-ready AI. A Shift Toward Scalable Efficiency In the end, IBM's release of Granite 4.0 Nano models reflects a strategic shift in LLM development: from chasing parameter count records to optimizing usability, openness, and deployment reach. By combining competitive performance, responsible development practices, and deep engagement with the open-source community, IBM is positioning Granite as not just a family of models -- but a platform for building the next generation of lightweight, trustworthy AI systems. For developers and researchers looking for performance without overhead, the Nano release offers a compelling signal: you don't need 70 billion parameters to build something powerful -- just the right ones.
[2]
IBM releases small open-source Granite 4 models for mobile devices and browsers - SiliconANGLE
IBM releases small open-source Granite 4 models for mobile devices and browsers IBM Corp. today announced the release of Granite 4 Nano, a family of extremely small generative artificial intelligence models designed to run at the edge, on-device or in browsers. The company said the models exhibit extremely high performance for size and represent the company's smallest models yet. The Granite 4.0 Nano family includes four instruct models and their base model counterparts between 1.5 billion and 350 million parameters. Parameters are the internal values that a large language model learns during training to understand context from user text queries and generate answers. Larger LLMs need increased computing power and energy, leading to increased operational costs. They also require specialized hardware, such as powerful graphics processing units and substantial machine memory. Tiny LLMs require far less compute and memory, meaning that they can run on consumer hardware, such as laptops and PCs, and mobile devices. The tradeoff is a reduction in accuracy and contextual knowledge that is trimmed from the models to reduce their size; however, with advanced compression techniques, a lot of knowledge and capability can be packed into a smaller size. Very small LLMs enhance privacy and security, provide offline access to reasoning and allow complete control and customization. By avoiding the transmission of sensitive data to cloud servers, local LLMs can also be cost-effective as they do not incur cloud expenses. The models include Granite 4.0 H 1B and 350M, 1.5 billion and 350 million parameter models featuring the model family's hybrid architecture and two alternative traditional transformer-based versions designed to be compatible where hybrid workloads may not have optimized support. Granite 4 models have a specialized architecture developed by IBM that combines an additional algorithm with the transformer design that powers most LLMs. Transformers use an attention algorithm to understand and generate text by focusing on the most important parts of an input. IBM hybridized the transformer with processing components based on the Mamba neural network architecture, which is more hardware-efficient than traditional transformers. There is a lot of competition in the sub-billion to near 1 billion parameter model design space, where developers focus on performance and capability. Rivals include the Qwen models from Alibaba Group Ltd., liquid foundation models from Liquid AI Inc. and Gemma models designed by Google LLC. IBM stated that Granite Nano models perform better than several similarly sized models across various benchmarks in general knowledge, math, coding and safety. Additionally, the Nano models outperformed competitors for agentic workflows, including instruction following and tool calling in IFEval, or Instruction-Following Evaluation, and Berkley's Function Calling Leaderboard v3. Granite 4.0 H 1B reached top marks in accuracy on IFEval at 78.5 compared to Quen3 1.7B at 73.1 and Gemma 3 1B scoring 59.3. In tool calling, the same model secured 54.8 on Berkley's leaderboard, compared to Quen3 at 52.2 and Gemma 3 at 16.3. IBM released all the Granite 4 Nano models under the open-source Apache 2.0 license, which is highly permissive. The license allows for broad commercial use and includes special considerations for research.
Share
Share
Copy Link
IBM unveils four new open-source Granite 4.0 Nano AI models ranging from 350M to 1.5B parameters, designed to run locally on consumer hardware including laptops and web browsers. These compact models outperform competitors in benchmarks while requiring minimal computing resources.
IBM has released four new Granite 4.0 Nano AI models that fundamentally challenge the prevailing "bigger is better" philosophy in artificial intelligence
1
. The models, ranging from just 350 million to 1.5 billion parameters, represent a fraction of the size of server-bound models from companies like OpenAI, Anthropic, and Google, yet deliver competitive performance in their class2
.
Source: SiliconANGLE
The 114-year-old tech giant's approach prioritizes efficiency and accessibility over raw computational power. The smallest 350M variants can run comfortably on modern laptop CPUs with 8-16GB of RAM, while the 1.5B models typically require a GPU with at least 6-8GB of VRAM for optimal performance
1
. Remarkably, the smallest models can even run locally in web browsers, as demonstrated by Joshua Lochner, creator of Transformer.js and machine learning engineer at Hugging Face1
.The Granite 4.0 Nano family includes four distinct models now available on Hugging Face under the permissive Apache 2.0 license
1
. The lineup consists of Granite-4.0-H-1B and H-350M models featuring hybrid state space architecture (SSM), alongside standard transformer variants Granite-4.0-1B and 350M2
.The hybrid architecture represents IBM's innovative approach, combining transformer design with processing components based on the Mamba neural network architecture, which proves more hardware-efficient than traditional transformers
2
. The H-series models excel in low-latency edge environments, while the transformer variants offer broader compatibility with existing tools like llama.cpp1
.Despite their compact size, the Granite 4.0 Nano models demonstrate impressive benchmark results that rival or exceed larger competitors. According to data from David Cox, VP of AI Models at IBM Research, the Granite-4.0-H-1B scored 78.5 on IFEval instruction following benchmarks, significantly outperforming Qwen3-1.7B at 73.1 and Gemma 3-1B at 59.3
1
2
.In function calling capabilities, measured by Berkeley's Function Calling Leaderboard v3, the Granite-4.0-1B achieved a leading score of 54.8, surpassing Qwen3 at 52.2 and significantly outperforming Gemma 3 at 16.3
2
. The models also excelled in safety benchmarks, scoring over 90% on SALAD and AttaQ evaluations1
.Related Stories
IBM enters a crowded small language model market that includes competitors like Qwen3, Google's Gemma, LiquidAI's LFM2, and Mistral's dense models in the sub-2B parameter space
1
. However, while major players like OpenAI and Anthropic focus on models requiring GPU clusters, IBM targets developers seeking performant LLMs for local or constrained hardware environments2
.The models are certified under ISO 42001 for responsible AI development, a standard IBM helped pioneer, and maintain native compatibility with popular frameworks including llama.cpp, vLLM, and MLX
1
. This comprehensive compatibility ensures broad adoption potential across diverse development environments and use cases.Summarized by
Navi
[1]
03 Oct 2025•Technology

21 Oct 2024•Technology

19 Dec 2024•Technology
