Phison aiDAPTIV+ delivers 10X faster AI inference on consumer PCs using NAND flash as memory

3 Sources

Share

Phison showcased aiDAPTIV+ at CES 2026, a software and hardware solution that accelerates AI inference by up to 10 times on consumer PCs. The technology uses NAND flash as memory expansion, enabling systems with just 32GB of DRAM to run AI models requiring 96GB through conventional approaches. Partners including Nvidia, AMD, MSI, and Acer demonstrated the technology, which addresses the memory bottleneck limiting AI deployment on ordinary hardware.

Phison Demonstrates 10X Faster AI Inference with aiDAPTIV+ Technology

At CES 2026, Phison unveiled consumer PCs running AI inference up to ten times faster using its aiDAPTIV+ software and hardware solution. The technology, first introduced as an enterprise proof-of-concept in mid-2024, has now been repositioned to enable AI models on client PCs, dramatically expanding its potential use-case

1

. Partners including Nvidia, AMD, MSI, Acer, Asus, and Corsair demonstrated systems featuring the technology at the event.

Source: Tom's Hardware

Source: Tom's Hardware

Memory Bottleneck for AI Models Drives Innovation

According to Phison CEO Pua Khein Seng, the real constraint in AI isn't compute power but memory capacity. "In AI models, the real bottleneck isn't computing power - it's memory," Pua explained in an exclusive interview. "If you don't have enough memory, the system crashes"

2

. This fundamental limitation affects everything from laptops running local inference to hyperscalers building AI data centers, yet industry focus has remained heavily weighted toward GPU compute capabilities.

Source: TechRadar

Source: TechRadar

NAND Flash Memory Expansion Enables Larger AI Models

The aiDAPTIV+ platform transforms NAND flash into a managed memory tier alongside DRAM, enabling systems to handle far larger AI models than their installed memory would normally permit. When tokens no longer fit into the GPU's key-value (KV) cache during inference, conventional systems evict older KV entries, forcing the GPU to recompute them from scratch when needed again. With aiDAPTIV+, these tokens are instead written to flash and retained for future reuse, dramatically reducing memory requirements

1

. Acer successfully demonstrated running a gpt-oss-120b model on a laptop with just 32GB of memory, a feat that would typically require approximately 96GB using conventional approaches.

Time to First Token Improvements Address User Experience

Phison's intelligent storage solution significantly improves Time to First Token, the critical delay between submitting a prompt and seeing the first output. "If you ask your device something and have to wait 60 seconds for the first word, would you wait?" Pua asked. "When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think it's garbage"

2

. The technology stores frequently used KV cache in SSDs, allowing quick retrieval when users repeat or revisit queries rather than recomputing everything from scratch.

Enterprise AI Infrastructure Through Infinitix Partnership

Phison has partnered with AI infrastructure management software provider Infinitix to integrate aiDAPTIV+ with the AI-Stack platform, delivering an enterprise-grade solution for AI training and inference that unifies hardware and software

3

. Built on a Kubernetes-native architecture, AI-Stack integrates GPU partitioning, aggregation, and cross-node computing with full support for Nvidia and AMD GPUs. The partnership brings storage-layer capabilities into AI infrastructure scheduling, allowing enterprises to integrate heterogeneous compute, memory, and storage capacity resources. Infinitix CEO WenYu Chen noted that AI has entered a phase where "the priority is no longer raw compute power, but how efficiently that power is managed, scaled, and converted into business value"

3

.

Source: DIGITIMES

Source: DIGITIMES

Implications for Consumer PCs and Small Businesses

The technology is especially relevant for Mixture of Experts models and agentic AI workloads, where a 120-billion-parameter MoE model can be handled with 32GB of DRAM compared to the roughly 96GB required conventionally

1

. This capability brings large-model inference and limited training to consumer PCs with entry-level or even integrated GPUs, potentially valuable for developers and small businesses unable to make substantial AI investments. The implementation involves AI-aware SSDs based on advanced Phison controllers, special firmware, and software, making deployment straightforward for PC makers and value-added resellers targeting premium models for developers and power users.

Rethinking GPU Economics and HBM Constraints

Pua's memory-first perspective extends to how organizations build GPU servers, noting many companies buy extra GPUs primarily to aggregate VRAM rather than for compute throughput. "Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power," he explained. "Most of those expensive GPUs end up idle because they're just being used for their memory"

2

. By using high-speed SSDs and intelligent memory expansion to overcome HBM and GDDR constraints, aiDAPTIV+ allows GPUs to be purchased and scaled for compute instead, with enterprises able to deploy large language model training and inference without fully investing in high-end HBM GPUs

3

. Pua argues CSP profit equals storage capacity, as cloud service providers have invested over $200 billion in GPUs but generate revenue from inference, which requires massive data storage.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo