Nvidia's Flagship GPUs Face Critical Virtualization Bug, Prompting $1,000 Bounty for Fix

Reviewed byNidhi Govil

3 Sources

Share

Nvidia's RTX 5090 and RTX PRO 6000 GPUs are experiencing a severe virtualization reset bug that renders the cards unresponsive, requiring system reboots. The issue affects AI workloads and virtualization setups, with Nvidia working on a fix.

Nvidia's Latest GPUs Face Severe Virtualization Bug

Nvidia's flagship GPUs, the RTX 5090 and RTX PRO 6000, are reportedly plagued by a critical virtualization reset bug that renders the cards unresponsive and requires a full system reboot to recover

1

2

. This issue is particularly concerning for AI workloads and virtualization setups, prompting a $1,000 bug bounty for anyone who can identify a fix or root cause.

Source: Tom's Hardware

Source: Tom's Hardware

The Nature of the Bug

The problem occurs after a GPU has been passed through to a virtual machine (VM) using KVM and VFIO. When the guest is shut down or the GPU is reassigned, the host issues a PCIe function-level reset (FLR), which is a standard cleanup procedure for passthrough devices

1

. However, instead of returning to a known-good state, the GPU fails to respond, becoming unreadable to system tools and requiring a complete power cycle of the machine to restore normal operation.

Impact on Users and Businesses

CloudRift, a GPU cloud provider, has reported encountering this issue on multiple Blackwell-equipped systems in production

1

2

. The bug is particularly problematic for multi-tenant AI workloads and home lab setups using virtualization, as a single card failure can take down the entire host system. Users across various forums, including Proxmox and Level1Techs, have reported similar behaviors, with some experiencing complete host hangs after shutting down Windows guests

1

3

.

Source: Wccftech

Source: Wccftech

Nvidia's Response and Community Efforts

Nvidia has reportedly acknowledged the issue and is actively working on developing a fix

2

3

. In the meantime, CloudRift has issued a $1,000 public bug bounty for anyone able to identify a fix or root cause of the problem

1

2

. The community is actively engaged in finding solutions, with users experimenting with various BIOS settings and configurations to mitigate the issue.

Implications for the Industry

This bug highlights the challenges faced by cutting-edge hardware in complex virtualization environments. As GPUs become increasingly crucial for AI and machine learning workloads, such issues can have significant implications for businesses relying on these technologies. The incident also underscores the importance of thorough testing and rapid response to hardware-level bugs in the fast-paced world of GPU development and deployment.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo