2 Sources
[1]
New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models
TL;DR: Intel's new driver for Arc Pro GPUs increases integrated GPU memory allocation to 93% of system RAM, enabling larger LLM inference on select models like Arc Pro B390 and B370. This supports running substantial AI models on affordable hardware, though performance depends on memory bandwidth and computational power. Intel's latest driver release, 32.0.101.8517, for Arc Pro GPUs increases the integrated GPU's memory allocation to enable broader LLM inference support. The new driver allows users to allocate up to 93% of their system RAM to the integrated GPU. While the driver currently supports only a select number of SKUs, Intel is paving the way for larger LLM inference workloads without hitting memory capacity bottlenecks. Traditional memory partitioning usually limits a GPU to 50% of system RAM. AMD's Variable Graphics Memory (VGM) allows high-end configurations, such as the Strix Halo, to allocate 96GB from a 128GB pool to the iGPU. Intel has been more aggressive in this regard. Last year, Intel raised the limit to 87% with its new "Shared GPU Memory Override" for Core Ultra Series 2 processors. The latest driver release pushes that boundary further to 93% for local AI inference. This only supports integrated Arc Pro GPUs, such as the Arc Pro B390 and Arc Pro B370. While this allocation update is the headline feature for integrated GPUs only, the driver also supports discrete Arc Pro A and B-series cards. This allows users to run much larger LLMs without expensive hardware. On a 32GB system, this allocation provides enough memory to run a Qwen 2.5 32B model at 4-bit quantization with a comfortable context window. Meanwhile, workstations equipped with 64GB of RAM can run heavyweight models like Llama 3 70B, with enough headroom for the KV cache and system stability. While this is impressive, computational power and bandwidth still affect the model's run time. Intel's Core Ultra Series 3 (Panther Lake) chips feature fast LPDDR5X-9600 memory, delivering bandwidth in the 150 GB/s range. AMD's Strix Halo, on the other hand, has a 256-bit memory bus that delivers 256 GB/s of bandwidth. This ensures large models not only fit in memory but also run at respectable speeds. Apple Silicon, however, remains the gold standard. The M5 Max offers 614 GB/s bandwidth, but its real advantage is the Unified Memory Architecture (UMA). Apple's UMA ignores the traditional partitioning found in the x86 world, where, instead of setting a hard limit or fence, the entire memory pool is natively accessible to both the CPU and the GPU. We've seen UMA's quirks in action, with a user running a 400B LLM on an iPhone 17 Pro. Apple offers efficiency and speed, while Intel and AMD are competing on flexibility and affordability for AI workloads, especially with the advent of LPCAMM2.
[2]
Intel's Latest Drivers Let's Users Allocate Up To 93% of System Memory To Arc iGPUs For Wider AI LLM Support
Intel now gives users the ability to allocate up to 93% of system memory to Arc iGPUs, enabling wider AI LLM support. Intel has dropped a new HotFix driver for Arc Pro Graphics, 302.0.101.8517 - Q1.26 R2, which lets users allocate even more system memory to the GPU, ideal for running larger AI LLMs. Previously, Intel's Drivers were able to allocate up to 87% of the system memory to the GPU; say you had 32 GB system memory, then you could allocate up to 28 GB of memory. With the newest drivers, you can now allocate up to 92% of the system memory. For a 32 GB memory configuration, that's 30 GB of VRAM allocated to the GPU. The new driver is now available and works with Intel's Arc Pro iGPUs, such as the Arc Pro B390 and Arc Pro B370, and is also compatible with several Intel Arc Pro GPUs within the Battlemage and Alchemist line. Although there are no other changes mentioned in the driver release, Intel is working towards broader ISV certifications for its Arc Pro GPUs. This is a higher memory allocation than AMD's Ryzen AI chips, which allow up to 87% memory allocation, which is still very useful for running bigger LLMs. On AI MAX+ platforms, you can allocate a massive 112 GB of memory to the GPU while running 128 GB of system memory.
Share
Copy Link
Intel released a new driver for Arc Pro GPUs that allows users to allocate up to 93% of system RAM to integrated GPUs, up from the previous 87% limit. This memory allocation breakthrough enables users to run substantially larger Large Language Models on affordable hardware without hitting memory capacity constraints.
Intel has released driver version 32.0.101.8517 for Arc Pro GPUs, introducing a significant capability that allows users to dedicate up to 93% system memory allocation to the integrated GPU
1
2
. This represents a notable increase from the previous 87% limit that Intel established last year with its "Shared GPU Memory Override" feature for Core Ultra Series 2 processors1
. The driver release specifically targets Arc Pro GPUs including the Arc Pro B390 and Arc Pro B370, while also supporting discrete Arc Pro A and B-series cards from the Battlemage and Alchemist lineups1
2
.
Source: Wccftech
The expanded system memory to iGPU allocation directly addresses one of the primary bottlenecks in running AI models locally: VRAM capacity. Traditional memory partitioning typically limits a GPU to 50% of system RAM, creating significant constraints for LLM inference tasks
1
. With this Intel driver update, a system equipped with 32GB of RAM can now allocate 30GB to the GPU, providing sufficient memory to run models like Qwen 2.5 32B at 4-bit quantization with a comfortable context window1
2
. Workstations with 64GB of RAM gain even more capability, able to handle heavyweight Large Language Models like Llama 3 70B while maintaining enough headroom for the KV cache and system stability1
.
Source: TweakTown
Related Stories
Intel's approach positions the company aggressively against AMD in the AI inference space. While AMD's Ryzen AI chips currently allow up to 87% memory allocation, AMD's Variable Graphics Memory (VGM) technology in high-end configurations like Strix Halo can allocate 96GB from a 128GB pool to the iGPU
1
2
. On AI MAX+ platforms, users can allocate a massive 112GB of memory to the GPU while running 128GB of system memory2
. However, memory capacity alone doesn't determine performance. Intel's Core Ultra Series 3 (Panther Lake) chips feature fast LPDDR5X-9600 memory delivering bandwidth around 150 GB/s, while AMD's Strix Halo achieves 256 GB/s through its 256-bit memory bus1
. Apple Silicon maintains an advantage with the M5 Max offering 614 GB/s memory bandwidth, though Intel and AMD are competing on flexibility and affordability through technologies like LPCAMM21
. Apple's Unified Memory Architecture eliminates traditional partitioning entirely, allowing the entire memory pool to be natively accessible to both CPU and GPU simultaneously1
.Summarized by
Navi
[1]
16 Aug 2025•Technology

25 Mar 2026•Technology

19 May 2025•Technology
1
Technology

2
Policy and Regulation

3
Science and Research
