Phison aiDAPTIV+ Speeds AI Inference 10X on PCs

Phison Demonstrates 10X Faster AI Inference with aiDAPTIV+ Technology

At CES 2026, Phison unveiled consumer PCs running AI inference up to ten times faster using its aiDAPTIV+ software and hardware solution. The technology, first introduced as an enterprise proof-of-concept in mid-2024, has now been repositioned to enable AI models on client PCs, dramatically expanding its potential use-case 1

. Partners including Nvidia, AMD, MSI, Acer, Asus, and Corsair demonstrated systems featuring the technology at the event.

Source: Tom's Hardware

Memory Bottleneck for AI Models Drives Innovation

According to Phison CEO Pua Khein Seng, the real constraint in AI isn't compute power but memory capacity. "In AI models, the real bottleneck isn't computing power - it's memory," Pua explained in an exclusive interview. "If you don't have enough memory, the system crashes" 2

. This fundamental limitation affects everything from laptops running local inference to hyperscalers building AI data centers, yet industry focus has remained heavily weighted toward GPU compute capabilities.

Source: TechRadar

NAND Flash Memory Expansion Enables Larger AI Models

The aiDAPTIV+ platform transforms NAND flash into a managed memory tier alongside DRAM, enabling systems to handle far larger AI models than their installed memory would normally permit. When tokens no longer fit into the GPU's key-value (KV) cache during inference, conventional systems evict older KV entries, forcing the GPU to recompute them from scratch when needed again. With aiDAPTIV+, these tokens are instead written to flash and retained for future reuse, dramatically reducing memory requirements 1

. Acer successfully demonstrated running a gpt-oss-120b model on a laptop with just 32GB of memory, a feat that would typically require approximately 96GB using conventional approaches.

Time to First Token Improvements Address User Experience

Phison's intelligent storage solution significantly improves Time to First Token, the critical delay between submitting a prompt and seeing the first output. "If you ask your device something and have to wait 60 seconds for the first word, would you wait?" Pua asked. "When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think it's garbage" 2

. The technology stores frequently used KV cache in SSDs, allowing quick retrieval when users repeat or revisit queries rather than recomputing everything from scratch.

Enterprise AI Infrastructure Through Infinitix Partnership

Phison has partnered with AI infrastructure management software provider Infinitix to integrate aiDAPTIV+ with the AI-Stack platform, delivering an enterprise-grade solution for AI training and inference that unifies hardware and software 3

. Built on a Kubernetes-native architecture, AI-Stack integrates GPU partitioning, aggregation, and cross-node computing with full support for Nvidia and AMD GPUs. The partnership brings storage-layer capabilities into AI infrastructure scheduling, allowing enterprises to integrate heterogeneous compute, memory, and storage capacity resources. Infinitix CEO WenYu Chen noted that AI has entered a phase where "the priority is no longer raw compute power, but how efficiently that power is managed, scaled, and converted into business value" 3

Source: DIGITIMES

Implications for Consumer PCs and Small Businesses

The technology is especially relevant for Mixture of Experts models and agentic AI workloads, where a 120-billion-parameter MoE model can be handled with 32GB of DRAM compared to the roughly 96GB required conventionally 1

. This capability brings large-model inference and limited training to consumer PCs with entry-level or even integrated GPUs, potentially valuable for developers and small businesses unable to make substantial AI investments. The implementation involves AI-aware SSDs based on advanced Phison controllers, special firmware, and software, making deployment straightforward for PC makers and value-added resellers targeting premium models for developers and power users.

Rethinking GPU Economics and HBM Constraints

Pua's memory-first perspective extends to how organizations build GPU servers, noting many companies buy extra GPUs primarily to aggregate VRAM rather than for compute throughput. "Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power," he explained. "Most of those expensive GPUs end up idle because they're just being used for their memory" 2

. By using high-speed SSDs and intelligent memory expansion to overcome HBM and GDDR constraints, aiDAPTIV+ allows GPUs to be purchased and scaled for compute instead, with enterprises able to deploy large language model training and inference without fully investing in high-end HBM GPUs 3

. Pua argues CSP profit equals storage capacity, as cloud service providers have invested over $200 billion in GPUs but generate revenue from inference, which requires massive data storage.

Phison aiDAPTIV+ delivers 10X faster AI inference on consumer PCs using NAND flash as memory

Phison Demonstrates 10X Faster AI Inference with aiDAPTIV+ Technology

Memory Bottleneck for AI Models Drives Innovation

NAND Flash Memory Expansion Enables Larger AI Models

Time to First Token Improvements Address User Experience

Enterprise AI Infrastructure Through Infinitix Partnership

Implications for Consumer PCs and Small Businesses

Rethinking GPU Economics and HBM Constraints

References

Phison demos 10X faster AI inference on consumer PCs with software and hardware combo that enables 3x larger AI models -- Nvidia, AMD, MSI, and Acer systems demoed with aiDAPTIV+

'In AI models, the real bottleneck isn't computing power -- it's memory': Phison CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isn't a good idea, and why CSP profit goes hand in hand with storage capacity

Phison, Infinitix build enterprise AI infrastructure stack

Related Stories

Nvidia and SK Hynix develop AI SSD targeting 100 million IOPS to solve inference bottlenecks

Micron Unveils High-Performance SSDs Optimized for AI Data Centers

Nvidia and Kioxia Collaborate on Revolutionary 100 Million IOPS SSD for AI Servers

Recent Highlights

OpenAI Releases GPT-5.4, New AI Model Built for Agents and Professional Work

Anthropic sues Pentagon over supply chain risk label after refusing autonomous weapons use

OpenAI secures $110 billion funding round as questions swirl around AI bubble and profitability

Recent Highlights

Today's Top Stories

Google Maps unveils Ask Maps chatbot and 3D navigation in biggest redesign in over a decade

Google uses AI and 5 million news reports to predict flash floods across 150 countries

Perplexity launches Personal Computer, an AI agent that runs 24/7 on your Mac mini

AI autocomplete covertly shifts human opinions on social issues, even when users ignore suggestions