3 Sources
[1]
Google's Latest Open-Source AI Model Can Run Locally on Just 2GB RAM
Gemma 3n was released as an early preview in May The AI model is available in two variants -- E2B and E4B It is built on the MatFormer architecture Google released the full version of Gemma 3n, its latest open-source model in the Gemma 3 family of artificial intelligence (AI) models, on Thursday. First announced in May, the new model is designed and optimised for on-device use cases and features several new architecture-based improvements. Interestingly, the large language model (LLM) can be run locally on just 2GB of RAM. This means the model can be deployed and operated even on a smartphone, provided it comes with AI-enabled processing power. In a blog post, the Mountain View-based tech giant announced the release of the full version of Gemma 3n. The model follows the launch of the Gemma 3 and GemmaSign models and joins the Gemmaverse. Since it is an open-source model, the company has provided its model weights as well as the cookbook to the community. The model itself is available to use under a permissive Gemma license, which allows both academic and commercial usages. Gemma 3n is a multimodal AI model. It natively supports image, audio, video, and text inputs. However, it can only generate text outputs. It is also a multilingual model and supports 140 languages for text, and 35 languages when the input is multimodal. Google says that Gemma 3n has a "mobile-first architecture," which is built on Matryoshka Transformer or MatFormer architecture. It is a nested transformer, named after the Russian nesting dolls, where one fits inside another. This architecture offers a unique way of training AI models with different parameter sizes. Gemma 3n comes in two sizes -- E2B and E4B -- short for effective parameters. This means, despite being five billion and eight billion parameters in size, the active parameters are just two and four billion. This is achieved using a technique called Per-Layer Embeddings (PLE), where only the most essential parameters are required to be loaded into the fast memory (VRAM). The rest remains in the extra layer embeddings and can be handled by the CPU. So, with the MatFormer system, the E4B variant nests the E2B model, and when the larger model is being trained, it simultaneously trains the smaller model. This gives users the convenience of either using E4B for more advanced operations or E2B for faster outputs without finding any noticeable differences in the quality of the processing or output. Google is also letting users create custom-sized models by tweaking certain internal parts. For this, the company is releasing the MatFormer Lab tool that will let developers test different combinations to help them find the custom model sizes. Currently, Gemma 3n is available to download via Google's Hugging Face listing and Kaggle listing. Users can also visit Google AI Studio to try Gemma 3n. Notably, Gemma models can also be deployed directly to Cloud Run from AI Studio.
[2]
Meet Gemma 3n: Google's lightweight AI model that works offline with just 2GB RAM
It is able to achieve that by shifting workflow to CPU not just NPUGoogle has officially rolled out Gemma 3n, its latest on-device AI model first teased back in May 2025. What makes this launch exciting is that Gemma 3n brings full-scale multimodal processing think audio, video, image, and text straight to smartphones and edge devices, all without needing constant internet or heavy cloud support. It's a big step forward for developers looking to bring powerful AI features to low-power devices running on limited memory. At the core of Gemma 3n is a new architecture called MatFormer short for Matryoshka Transformer. Think Russian nesting dolls: smaller, fully-functional models tucked inside bigger ones. This clever setup lets developers scale AI performance based on the device's capability. You get two versions E2B runs on just 2GB RAM, and E4B works with around 3GB. Despite packing 5 to 8 billion raw parameters, both versions behave like much smaller models when it comes to resource use. That's thanks to smart design choices like Per-Layer Embeddings (PLE), which shift some of the load from the GPU to the CPU, helping save memory. It also features KV Cache Sharing, which speeds up processing of long audio and video inputs by nearly 2x perfect for real-time use cases like voice assistants and mobile video analysis. Gemma 3n isn't just light on memory it's stacked with serious capabilities. For speech-based features, it uses an audio encoder adapted from Google's Universal Speech Model, which means it can handle speech-to-text and even language translation directly on your phone. It's already showing solid results, especially when translating between English and European languages like Spanish, French, Italian, and Portuguese. On the visual front, it's powered by Google's new MobileNet-V5 -- a lightweight but powerful vision encoder that can process video at up to 60fps on phones like the Pixel. That means smooth, real-time video analysis without breaking a sweat. And it's not just fast -- it's also more accurate than older models. Developers can plug into Gemma 3n using popular tools like Hugging Face Transformers, Ollama, MLX, llama.cpp, and more. Google's also kicked off the Gemma 3n Impact Challenge, offering a $150,000 prize pool for apps that showcase the model's offline magic. The best part? Gemma 3n runs entirely offline. No cloud, no connection just pure on-device AI. With support for over 140 languages and the ability to understand content in 35, it's a game-changer for building AI apps where connectivity is patchy or privacy is a priority. Want to try Gemma 3n for yourself? Here's how you can get started:
[3]
Gemma 3n: Google's open-weight AI model that brings on-device intelligence
Gemma 3n's open weights give developers unmatched freedom to build, customize, and deploy on-device AI. The future of AI isn't just in vast server farms powering chatbots from afar. Increasingly, it's about models smart enough to run right on your phone, tablet, or laptop, delivering intelligence without needing an internet connection. Google's newly launched Gemma 3n is a major leap in this direction, offering a potent blend of small size, multimodal abilities, and open access. And crucially, it arrived before similar efforts from OpenAI. At the heart of Gemma 3n's significance is its status as an open-weight model. In simple terms, an open-weight model is an AI system where the actual model data, the "weights" it learned during training is publicly shared. This allows developers to download, inspect, modify, fine-tune, and run the model on their own hardware. Also read: ROCm 7: AMD's big open-source bet on the future of AI This contrasts with closed-weight models like OpenAI's GPT-4 or Google's Gemini, where the model runs only on company servers, and users interact with it via an API. Open-weight models give developers more control, encourage innovation, and let AI run independently on local devices, something increasingly important for privacy, security, and offline use. Gemma 3n is the latest in Google's family of open-weight AI models, specifically designed for on-device AI, that is, AI that can run directly on edge devices like smartphones, tablets, and laptops. The "n" in its name stands for "nano," a nod to its compact size and efficiency. What sets Gemma 3n apart is its ability to handle multimodal inputs natively. Earlier models were text-only, but Gemma 3n can process text, images, audio, and even video as input, generating text responses in return. This opens up possibilities for real-time transcription, translation, image understanding, and video analysis, all done directly on the device. Gemma 3n isn't just smaller, it's smarter in how it uses resources. The model comes in two sizes: Both versions bring high-quality AI performance to devices that would have struggled with earlier-generation models. Also read: Gemini CLI: Google's latest open source AI agent explained Gemma 3n's architecture reflects its on-device focus. MatFormer allows the model to flexibly scale its compute usage depending on hardware limits which is a concept Google calls "elastic inference." The audio encoder is based on Google's Universal Speech Model (USM), this enables high-quality speech-to-text and translation directly on-device. Vision encoder is powered by the lightweight MobileNet-V5, it supports fast, efficient video analysis at up to 60FPS on modern smartphones. OpenAI has long spoken of on-device AI and GPT-4o showed what's possible in terms of efficiency, but its models remain cloud-bound. You can't download or modify GPT-4o; it runs on OpenAI's servers. Google, with Gemma 3n, has delivered what OpenAI so far hasn't: a powerful, open-weight, multimodal AI model that can run locally, offline, and at scale on everyday hardware. It's available now via Hugging Face, Kaggle, Google AI Studio, and other developer-friendly platforms. Gemma 3n represents more than just another model release. It signals a new phase of AI development: one where powerful models don't just sit in the cloud, but live on devices in your pocket. It opens the door to smarter, more private, more customizable AI, and raises the bar for what on-device AI can be.
Share
Copy Link
Google releases Gemma 3n, an open-source AI model designed for on-device use, capable of running on just 2GB RAM. This multimodal model supports various input types and works across 140 languages, marking a significant advancement in accessible AI technology.
Google has officially released Gemma 3n, its latest open-source AI model, marking a significant leap in on-device artificial intelligence capabilities. This new addition to the Gemma 3 family of AI models is designed to operate efficiently on devices with limited resources, running on as little as 2GB of RAM 12.
Source: Digit
Gemma 3n stands out for its multimodal functionality, capable of processing various input types including text, images, audio, and video. While it can handle these diverse inputs, the model generates text-only outputs. Impressively, Gemma 3n supports 140 languages for text input and 35 languages for multimodal inputs, making it a versatile tool for developers worldwide 13.
At the core of Gemma 3n's efficiency is its "mobile-first architecture" based on the Matryoshka Transformer (MatFormer). This nested transformer design, inspired by Russian nesting dolls, allows for training AI models with different parameter sizes simultaneously 1. The model comes in two variants:
Despite having 5 to 8 billion raw parameters, these variants behave like much smaller models in terms of resource usage. This efficiency is achieved through techniques such as Per-Layer Embeddings (PLE), which optimizes memory usage by shifting some workload from GPU to CPU 12.
Gemma 3n's ability to run entirely offline is a game-changer for AI applications. It eliminates the need for constant internet connectivity or heavy cloud support, making it ideal for use in areas with limited connectivity or where privacy is a priority 23.
The model incorporates advanced components for specific tasks:
Source: Economic Times
As an open-source model, Gemma 3n is available under a permissive license that allows both academic and commercial usage. Google has provided model weights and a cookbook to the community, encouraging innovation and customization 13.
Developers can access Gemma 3n through various platforms:
Source: NDTV Gadgets 360
Gemma 3n represents a significant step forward in democratizing AI technology. Its ability to run powerful, multimodal AI models on everyday devices with limited resources opens up new possibilities for developers and end-users alike. This release puts Google ahead in the race to bring sophisticated AI capabilities to edge devices, outpacing competitors like OpenAI in delivering open-weight, on-device AI solutions 3.
As the AI industry continues to evolve, Gemma 3n sets a new standard for what's possible in on-device intelligence, promising a future where powerful AI assistants and tools are accessible to a broader range of devices and users.
Microsoft's in-house AI chip, codenamed Braga, has been delayed by at least six months, pushing its mass production to 2026. The chip is expected to underperform compared to Nvidia's Blackwell, highlighting the challenges tech giants face in developing custom AI processors.
8 Sources
Technology
22 hrs ago
8 Sources
Technology
22 hrs ago
Meta Platforms is in advanced talks with private capital firms to raise $29 billion for building AI data centers, highlighting the company's aggressive push into artificial intelligence infrastructure.
4 Sources
Business and Economy
22 hrs ago
4 Sources
Business and Economy
22 hrs ago
OpenAI has begun using Google's TPUs to power ChatGPT and other products, marking a significant shift from its reliance on NVIDIA GPUs and Microsoft's data centers.
4 Sources
Technology
14 hrs ago
4 Sources
Technology
14 hrs ago
Anthropic, a leading AI company, has initiated the Economic Futures Program to research and address the potential economic fallout of AI, including job losses and market disruptions. The program offers grants, hosts policy forums, and aims to gather comprehensive data on AI's economic impact.
5 Sources
Business and Economy
22 hrs ago
5 Sources
Business and Economy
22 hrs ago
Facebook, owned by Meta, is asking users for permission to access and process photos from their camera rolls, including those not yet shared on the platform, to generate AI-powered creative suggestions.
4 Sources
Technology
22 hrs ago
4 Sources
Technology
22 hrs ago