3 Sources
3 Sources
[1]
Phison demos 10X faster AI inference on consumer PCs with software and hardware combo that enables 3x larger AI models -- Nvidia, AMD, MSI, and Acer systems demoed with aiDAPTIV+
At CES 2026, Phison demonstrated consumer PCs with its aiDAPTIV+ software/hardware combo running AI inference up to ten times faster than without its specialized suite of technologies. When Phison introduced its aiDAPTIV+ technology in mid-2024, it essentially transformed NAND memory into a managed memory tier alongside DRAM to enable large AI models to train or run on systems that did not have enough DDR5 and/or HBM memory, but at the time, it was merely a proof-of-concept aimed at enterprises. By early 2026, the positioning of the technology has changed, and now Phison sees it as an enabler of AI inference models on client PCs, which broadly increases the use-case. So, Normally, when tokens no longer fit into the GPU's key-value (KV) cache during inference, older KV entries are evicted, so if/when the model needs those tokens again (in cases of long context or agent loops), the GPU must recompute them from scratch, which makes AI inference inefficient on systems with limited memory capacity. However, with a system equipped with Phison's aiDAPTIV+ stack, tokens that no longer fit into the GPU's KV cache are written to flash and retained for future reuse, which can reduce memory requirements in many cases and dramatically increase the time to first token, which is the time it takes to produce the first word of a response. The renewed focus of Phison's aiDAPTIV+ platform is designed to let ordinary PCs with entry-level or even integrated GPUs handle far larger AI models than their installed DRAM would normally permit. Bringing large-model inference and limited training to desktops and notebooks may be valuable for developers and small businesses that cannot afford to make big investments in AI at the moment, so Phison has a list of aiDAPTIV+ testing partners with systems featuring the technology at CES 2026, such as Acer, Asus, Corsair, Emdoor, MSI, and even Nvidia. For example, Acer has managed to run an gpt-oss-120b model on an Acer laptop with just 32GB of memory, which opens doors to a number of applications. According to Phison's internal testing, aiDAPTIV+ can accelerate inference response times by up to 10 times, as well as reduce power consumption and improve Time to First Token on notebook PCs. Obviously, the larger the model and the longer the context, the higher the gain, so the technology is especially relevant for Mixture of Experts models and agentic AI workloads. Phison claims that a 120-billion-parameter MoE model can be handled with 32 GB of DRAM, compared with roughly 96 GB required by conventional approaches, because inactive parameters are kept in flash rather than resident in main memory. Given that Phison's aiDAPTIV+ stack involves an AI-aware SSD (or SSDs) based on an advanced controller from Phison, special firmware, and software, the implementation of the technology should be pretty straightforward. This is important for PC makers, value-added resellers, and small businesses interested in using this capability, so it is reasonable to expect a number of them to actually use this technology with their premium models aimed at developers and power users. For Phison, this means usage of their controllers as well as added revenue from selling the aiDAPTIV+ stack to partners.
[2]
'In AI models, the real bottleneck isn't computing power -- it's memory': Phison CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isn't a good idea, and why CSP profit goes hand in hand with storage capacity
In our exclusive interview, Phison CEO Pua Khein Seng also told us where AI actually makes its money The technology industry is increasingly talking about GPUs being central to AI infrastructure, but the limiting factor whichdecides what models you can run is actually memory. In a wide-ranging interview, Phison CEO Pua Khein Seng, who invented the world's first single-chip USB flash drive, told TechRadar Pro the focus on compute has distracted from a more basic constraint which shows up everywhere, from laptops running local inference to hyperscalers building AI data centers. "In AI models, the real bottleneck isn't computing power - it's memory," Pua said. "If you don't have enough memory, the system crashes." This something is what's behind Phison's aiDAPTIV+ work, which the company discussed publicly at CES 2026, and essentially is a way to extend AI processing to integrated GPU systems by using NAND flash as a memory pool. Pua describes it as using SSD capacity to compensate for DRAM limits and keep GPUs focused on compute instead of waiting on memory. "Our invention uses SSDs as a complement to DRAM memory," he says. "We use this as memory expansion." A practical goal is improving responsiveness during inference, especially Time to First Token, the delay between submitting a prompt and seeing the first output. Pua argues long TTFT makes local AI feel broken, even when the model eventually completes the task. "If you ask your device something and have to wait 60 seconds for the first word, would you wait?" he says. "When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think it's garbage." Pua links TTFT improvements to better reuse of memory-heavy inference data, particularly KV cache, comparing it to a doctor repeating the same instructions to every patient because nothing is saved between visits. "In AI inference, there's something called KV cache - it's like cookies in web browsing," he expanded. "Most systems don't have enough DRAM, so every time you ask the same question, it has to recompute everything." Phison's approach, Pua added, is to "store frequently used cache in the storage" so the system can retrieve it quickly when a user repeats or revisits a query. That memory-first framing extends beyond laptops into how companies build GPU servers, as Pua notes many organizations buy extra GPUs not for compute throughput, but to collect more VRAM, which leads to wasted silicon. "Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power," he adds. "Most of those expensive GPUs end up idle because they're just being used for their memory." If SSDs can provide a larger memory pool, Pua says, GPUs can be bought and scaled for compute instead. "Once you have enough memory, then you can focus on compute speed," he notes, "if one GPU is slow, you can add two, four, or eight GPUs to improve computing power." From there, Pua widened the lens to the economics of hyperscalers and AI infrastructure, describing the current wave of GPU spending as necessary but incomplete, because the business case for AI depends on inference, and inference depends on data storage. "CSPs have invested over $200 billion in GPUs," he says. "They're not making money directly from GPUs. The revenue comes from inference, which requires massive data storage." He summarized the situation with a line he returned to repeatedly: "CSP profit equals storage capacity." That argument also feeds into Phison's push toward extreme-capacity enterprise SSDs. The company has announced a 244TB model, and Pua told us, "Our current 122TB drive uses our X2 controller with 16-layer NAND stacking. To reach 244TB, we simply need 32-layer stacking. The design is complete, but the challenge is manufacturing yield." He also outlined an interesting alternative route: higher-density NAND dies. "We're waiting for 4Tb NAND dies, with those, we could achieve 244TB with just 16 layers," he said, adding that timing would depend on manufacturing maturity. On PLC NAND, Pua was clear Phison doesn't control when it arrives, but he told us he intends to support it once manufacturers can ship it reliably. "PLC is five-bit NAND, that's primarily a NAND manufacturer decision, not ours," he said. "When NAND companies mature their PLC technology, our SSD designs will be ready to support it." He was more skeptical about a different storage trend: tying flash directly into GPU-style memory stacks, sometimes discussed under labels like high-bandwidth flash. Pua argued the endurance mismatch creates a nasty failure mode. "The challenge with integrating NAND directly with GPUs is the write cycle limitation," he said. "NAND has finite program/erase cycles. If you integrate them, when the NAND reaches end-of-life, you have to discard the entire expensive GPU card." Phison's preferred model is modular: "keeping SSDs as replaceable, plug-and-play components. When an SSD wears out, you simply replace it while keeping the expensive GPU." Taken together, Pua's view of the AI hardware future is less about chasing ever-larger GPUs and more about building systems where memory capacity is cheap, scalable, and replaceable. Whether the target is local inference on an integrated GPU or rack-scale inference in a hyperscaler, the company is betting that storage density and memory expansion will decide what's practical long before another jump in compute does.
[3]
Phison, Infinitix build enterprise AI infrastructure stack
Demand for high-performance compute and storage for AI training and inference continues to climb. Phison has partnered with AI infrastructure management software provider Infinitix to integrate its aiDAPTIV+ intelligent storage technology with Infinitix's AI-Stack platform, delivering an enterprise-grade AI training and inference solution that unifies hardware and software. Phison said the collaboration uses high-speed SSDs and intelligent memory expansion to overcome the hardware constraints of traditional HBM and GDDR. By integrating aiDAPTIV+ with AI-Stack, enterprises can incorporate hardware acceleration into AI workload scheduling in Kubernetes-native environments, enabling end-to-end performance optimisation from model training to inference deployment. Infinitix CEO WenYu Chen said AI has entered a phase of large-scale adoption driven by architectural and platform capabilities, where the priority is no longer raw compute power, but how efficiently that power is managed, scaled, and converted into business value. The partnership brings storage-layer capabilities into AI infrastructure scheduling, allowing enterprises to integrate heterogeneous compute, memory, and storage resources in Kubernetes-native environments. This enables AI data centres to deploy large-scale model training and inference with more flexible and cost-efficient architectures, supporting scalable, enterprise-class AI platforms. Phison CEO KS Pua said AI is rapidly shifting from single-GPU computing toward system-level architectures spanning multiple nodes and resources. With aiDAPTIV+, Phison incorporates the NAND storage layer into AI memory and compute architectures, redefining how AI systems scale. Through AI-Stack's native scheduling capabilities, NAND storage, memory, and compute resources can operate in coordination across enterprise environments. Built on a Kubernetes-native architecture, AI-Stack integrates GPU partitioning, aggregation, and cross-node computing, with full support for Nvidia and AMD GPUs. It enables unified management of conventional GPU servers and Phison's aiDAPTIV+ nodes on a single platform. With multi-tenant access control, automated scheduling, centralised monitoring, and billing mechanisms, the platform reduces the complexity of AI infrastructure governance and operations. Enterprises can deploy large language model training and inference without fully investing in high-end HBM GPUs. Phison said the two companies will continue to deepen cooperation across AI, intelligent storage, and cloud operations to support efficient, scalable data infrastructure.
Share
Share
Copy Link
Phison showcased aiDAPTIV+ at CES 2026, a software and hardware solution that accelerates AI inference by up to 10 times on consumer PCs. The technology uses NAND flash as memory expansion, enabling systems with just 32GB of DRAM to run AI models requiring 96GB through conventional approaches. Partners including Nvidia, AMD, MSI, and Acer demonstrated the technology, which addresses the memory bottleneck limiting AI deployment on ordinary hardware.
At CES 2026, Phison unveiled consumer PCs running AI inference up to ten times faster using its aiDAPTIV+ software and hardware solution. The technology, first introduced as an enterprise proof-of-concept in mid-2024, has now been repositioned to enable AI models on client PCs, dramatically expanding its potential use-case
1
. Partners including Nvidia, AMD, MSI, Acer, Asus, and Corsair demonstrated systems featuring the technology at the event.
Source: Tom's Hardware
According to Phison CEO Pua Khein Seng, the real constraint in AI isn't compute power but memory capacity. "In AI models, the real bottleneck isn't computing power - it's memory," Pua explained in an exclusive interview. "If you don't have enough memory, the system crashes"
2
. This fundamental limitation affects everything from laptops running local inference to hyperscalers building AI data centers, yet industry focus has remained heavily weighted toward GPU compute capabilities.
Source: TechRadar
The aiDAPTIV+ platform transforms NAND flash into a managed memory tier alongside DRAM, enabling systems to handle far larger AI models than their installed memory would normally permit. When tokens no longer fit into the GPU's key-value (KV) cache during inference, conventional systems evict older KV entries, forcing the GPU to recompute them from scratch when needed again. With aiDAPTIV+, these tokens are instead written to flash and retained for future reuse, dramatically reducing memory requirements
1
. Acer successfully demonstrated running a gpt-oss-120b model on a laptop with just 32GB of memory, a feat that would typically require approximately 96GB using conventional approaches.Phison's intelligent storage solution significantly improves Time to First Token, the critical delay between submitting a prompt and seeing the first output. "If you ask your device something and have to wait 60 seconds for the first word, would you wait?" Pua asked. "When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think it's garbage"
2
. The technology stores frequently used KV cache in SSDs, allowing quick retrieval when users repeat or revisit queries rather than recomputing everything from scratch.Phison has partnered with AI infrastructure management software provider Infinitix to integrate aiDAPTIV+ with the AI-Stack platform, delivering an enterprise-grade solution for AI training and inference that unifies hardware and software
3
. Built on a Kubernetes-native architecture, AI-Stack integrates GPU partitioning, aggregation, and cross-node computing with full support for Nvidia and AMD GPUs. The partnership brings storage-layer capabilities into AI infrastructure scheduling, allowing enterprises to integrate heterogeneous compute, memory, and storage capacity resources. Infinitix CEO WenYu Chen noted that AI has entered a phase where "the priority is no longer raw compute power, but how efficiently that power is managed, scaled, and converted into business value"3
.
Source: DIGITIMES
Related Stories
The technology is especially relevant for Mixture of Experts models and agentic AI workloads, where a 120-billion-parameter MoE model can be handled with 32GB of DRAM compared to the roughly 96GB required conventionally
1
. This capability brings large-model inference and limited training to consumer PCs with entry-level or even integrated GPUs, potentially valuable for developers and small businesses unable to make substantial AI investments. The implementation involves AI-aware SSDs based on advanced Phison controllers, special firmware, and software, making deployment straightforward for PC makers and value-added resellers targeting premium models for developers and power users.Pua's memory-first perspective extends to how organizations build GPU servers, noting many companies buy extra GPUs primarily to aggregate VRAM rather than for compute throughput. "Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power," he explained. "Most of those expensive GPUs end up idle because they're just being used for their memory"
2
. By using high-speed SSDs and intelligent memory expansion to overcome HBM and GDDR constraints, aiDAPTIV+ allows GPUs to be purchased and scaled for compute instead, with enterprises able to deploy large language model training and inference without fully investing in high-end HBM GPUs3
. Pua argues CSP profit equals storage capacity, as cloud service providers have invested over $200 billion in GPUs but generate revenue from inference, which requires massive data storage.Summarized by
Navi
17 Dec 2025•Technology

30 Jul 2025•Technology

12 Sept 2025•Technology

1
Technology

2
Technology

3
Policy and Regulation
