9 Sources
[1]
Nvidia chips become the first GPUs to fall to Rowhammer bit-flip attacks
Nvidia is recommending a mitigation for customers of one of its GPU product lines that will degrade performance by up to 10 percent in a bid to protect users from exploits that could let hackers sabotage work projects and possibly cause other compromises. The move comes in response to an attack a team of academic researchers demonstrated against Nvidia's RTX A6000, a widely used GPU for high-performance computing that's available from many cloud services. A vulnerability the researchers discovered opens the GPU to Rowhammer, a class of attack that exploits physical weakness in DRAM chip modules that store data. Rowhammer allows hackers to change or corrupt data stored in memory by rapidly and repeatedly accessing -- or hammering -- a physical row of memory cells. By repeatedly hammering carefully chosen rows, the attack induces bit flips in nearby rows, meaning a digital zero is converted to a one or vice versa. Until now, Rowhammer attacks have been demonstrated only against memory chips for CPUs, used for general computing tasks. Like catastrophic brain damage That changed last week as researchers unveiled GPUhammer, the first known successful Rowhammer attack on a discrete GPU. Traditionally, GPUs were used for rendering graphics and cracking passwords. In recent years, GPUs have become the workhorses for tasks such as high-performance computing, machine learning, neural networking, and other AI uses. No company has benefited more from the AI and HPC boom than Nvidia, which last week became the first company to reach a $4 trillion valuation. While the researchers demonstrated their attack against only the A6000, it likely works against other GPUs from Nvidia, the researchers said. The researchers' proof-of-concept exploit was able to tamper with deep neural network models used in machine learning for things like autonomous driving, healthcare applications, and medical imaging for analyzing MRI scans. GPUHammer flips a single bit in the exponent of a model weight -- for example in y, where a floating point is represented as x times 2y. The single bit flip can increase the exponent value by 128. The result is an altering of the model weight by a whopping 2128, degrading model accuracy from 80 percent to 0.1 percent, said Gururaj Saileshwar, an assistant professor at the University of Toronto and co-author of an academic paper demonstrating the attack. "This is like inducing catastrophic brain damage in the model: with just one bit flip, accuracy can crash from 80% to 0.1%, rendering it useless," Saileshwar wrote in an email. "With such accuracy degradation, a self-driving car may misclassify stop signs (reading a stop sign as a speed limit 50 mph sign), or stop recognizing pedestrians. A healthcare model might misdiagnose patients. A security classifier may fail to detect malware." In response, Nvidia is recommending users implement a defense that could degrade overall performance by as much as 10 percent. Among machine learning inference workloads the researchers studied, the slowdown affects the "3D U-Net ML Model" the most. This model is used for an array of HPC tasks, such as medical imaging. The performance hit is caused by the resulting reduction in bandwidth between the GPU and the memory module, which the researchers estimated as 12 percent. There's also a 6.25 percent loss in memory capacity across the board, regardless of the workload. Performance degradation will be the highest for applications that access large amounts of memory. A figure in the researchers' academic paper provides the overhead breakdowns for the workloads tested. Rowhammer attacks present a threat to memory inside the typical laptop or desktop computer in a home or office, but most Rowhammer research in recent years has focused on the threat inside cloud environments. That's because these environments often allot the same physical CPU or GPU to multiple users. A malicious attacker can run Rowhammer code on a cloud instance that has the potential to tamper with the data a CPU or GPU is processing on behalf of a different cloud customer. Saileshwar said that Amazon Web Services and smaller providers such as Runpod and Lambda Cloud all provide A6000s instances. (He added that AWS enables a defense that prevents GPUhammer from working.) Not your parents' Rowhammer Rowhammer attacks are difficult to perform for various reasons. For one thing, GPUs access data from GDDR (graphics double data rate) physically located on the GPU board, rather than the DDR (double data rate) modules that are separate from the CPUs accessing them. The proprietary physical mapping of the thousands of banks inside a typical GDDR board is entirely different from their DDR counterparts. That means that hammering patterns required for a successful attack are completely different. Further complicating attacks, the physical addresses for GPUs aren't exposed, even to a privileged user, making reverse engineering harder. GDDR modules also have up to four times higher memory latency and faster refresh rates. One of the physical characteristics Rowhammer exploits is that the increased frequency of accesses to a DRAM row disturbs the charge in neighboring rows, introducing bit flips in neighboring rows. Bit flips are much harder to induce with higher latencies. GDDR modules also contain proprietary mitigations that can further stymie Rowhammer attacks. In response to GPUhammer, Nvidia published a security notice last week reminding customers of a protection formally known as system-level error-correcting code. ECC works by using what are known as memory words to store redundant control bits next to the data bits inside the memory chips. CPUs and GPUs use these words to quickly detect and correct flipped bits. GPUs based on Nvidia's Hopper and Blackwell architectures already have ECC turned on. On other architectures, ECC is not enabled by default. The means for enabling the defense vary by the architecture. Checking the settings in Nvidia GPUs designated for data centers can be done out-of-band using a system's BMC (baseboard management controller) and software such as Redfish to check for the "ECCModeEnabled" status. ECC status can also be checked using an in-band method that uses the system CPU to probe the GPU. The protection does come with its limitations, as Saileshwar explained in an email: On NVIDIA GPUs like the A6000, ECC typically uses SECDED (Single Error Correction, Double Error Detection) codes. This means Single-bit errors are automatically corrected in hardware and Double-bit errors are detected and flagged, but not corrected. So far, all the Rowhammer bit flips we detected are single-bit errors, so ECC serves as a sufficient mitigation. But if Rowhammer induces 3 or more bit flips in a ECC code word, ECC may not be able to detect it or may even cause a miscorrection and a silent data corruption. So, using ECC as a mitigation is like a double-edged sword. Saileshwar said that other Nvidia chips may also be vulnerable to the same attack. He singled out GDDR6-based GPUs in Nvidia's Ampere generation, which are used for machine learning and gaming. Newer GPUs, such as the H100 (with HBM3) or RTX 5090 (with GDDR7), feature on-die ECC, meaning the error detection is built directly into the memory chips. "This may offer better protection against bit flips," Saileshwar said. "However, these protections haven't been thoroughly tested against targeted Rowhammer attacks, so while they may be more resilient, vulnerability cannot yet be ruled out." In the decade since the discovery of Rowhammer, GPUhammer is the first variant to flip bits inside discrete GPUs and the first to attack GDDR6 GPU memory modules. All attacks prior to GPUhammer targeted CPU memory chips such as DDR3/4 or LPDDR3/4. That includes this 2018 Rowhammer variant. While it used a GPU as the hammer, the memory being targeted remained LPDDR3/4 memory chips. GDDR forms of memory have a different form factor. It follows different standards and is soldered onto the GPU board, in contrast to LPDDR, which is in a chip located on hardware apart from the CPUs. Besides Saileshwar, the researchers behind GPUhammer include Chris S. Lin and Joyce Qu from the University of Toronto. They will be presenting their research next month at the 2025 Usenix Security Conference.
[2]
New Rowhammer attack silently corrupts AI models on GDDR6 Nvidia cards -- 'GPUHammer' attack drops AI accuracy from 80% to 0.1% on RTX A6000
A group of researchers has discovered a new attack called GPUHammer that can flip bits in the memory of NVIDIA GPUs, quietly corrupting AI models and causing serious damage, without ever touching the actual code or data input. Fortunately, Nvidia is already ahead of the bad actors and has put out guidelines on how to mitigate the risk involved in this situation. Regardless, if you're using a card with GDDR6 memory, this is worth paying attention to The team behind the discovery, from the University of Toronto, showed how the attack could drop an AI model's accuracy from 80% to under 1% -- just by flipping a single bit in memory. It's not just theoretical either, as they ran it on a real NVIDIA RTX A6000, using a technique that repeatedly hammers memory cells until one nearby flips, messing with whatever's stored there. GPUHammer is a GPU-focused version of a known hardware issue called Rowhammer. It's been around for a while in the world of CPUs and RAM. Basically, modern memory chips are so tightly packed that repeatedly reading or writing one row can cause electrical interference that flips bits in nearby rows. That flipped bit could be anything -- a number, a command, or part of a neural network's weight -- and that's where things go wrong. Until now, this was mostly a concern for DDR4 system memory, but GPUHammer proves it can happen on GDDR6 VRAM too, which is what powers many modern NVIDIA cards, especially in AI and workstation workloads. This is a serious cause for concern, at least in specific situations. The researchers showed that even with some safeguards in place, they could cause multiple bit flips across several memory banks. In one case, this completely broke a trained AI model, making it essentially useless. The scary part is that it doesn't require access to your data. The attacker just needs to share the same GPU in a cloud environment or server, and they could potentially interfere with your workload however they want. As mentioned, the attack was tested on an RTX A6000, but the risk applies to a wide range of Ampere, Ada, Hopper, and Turing GPUs, especially those used in workstations and servers. NVIDIA has published a full list of affected models and recommends ECC for most of them. That said, newer GPUs like the RTX 5090 and H100 have built-in ECC directly on the chip, which handles this automatically -- no user setup required. However, if you're someone just sitting at home worried about their personal setup, this isn't the kind of attack you'd see targeting individual gamers or home PCs. It's more relevant to shared GPU environments like cloud gaming servers, AI training clusters, or VDI setups where multiple users run workloads on the same hardware. That being said, the core idea that memory on a GPU can be tampered with silently is something the entire industry needs to take seriously, especially as more games, apps, and services start leaning on AI. NVIDIA has responded with a simple but important recommendation: turn on ECC (Error Correction Code) if your GPU supports it. ECC is a feature that adds redundancy to memory so it can detect and fix errors like these bit flips. Keep in mind, enabling ECC does come with a small trade-off -- around 10% slower performance for machine learning tasks, and about 6-6.5% less usable VRAM. But for serious AI work with peace of mind, that's worth it. You can enable it using Nvidia's command-line tool: You can also check if ECC is active with: Attacks like GPUHammer don't just crash systems or cause glitches. They tamper with the integrity of AI itself, affecting how models behave or make decisions. And because it all happens at the hardware level, these changes are nearly invisible unless you know exactly what to look for. In regulated industries like healthcare, finance, or autonomous driving, that could cause serious problems -- wrong decisions, security failures, even legal consequences. Even though the average user isn't directly at risk, GPUHammer is a wake-up call. As GPUs continue to evolve beyond gaming into AI, creative work, and productivity, so do the risks. Memory safety, even on a GPU, is no longer optional.
[3]
Nvidia A6000 GPUs flip memory bits if beaten by GPUHammer
The Rowhammer attack on computer memory is back, and for the first time, it's able to mess with bits in Nvidia GPUs, despite defenses designed to protect against this kind of hacking. Last week, Nvidia issued a security advisory, telling customers about the possible threat, which was disclosed to the company and cloud providers in January by researchers from Canada's University of Toronto. The researchers, Chris (Shaopeng) Lin, Joyce Qu, and Gururaj Saileshwar, describe their findings in a paper [PDF] titled "GPUHammer: Rowhammer Attacks on GPU Memories are Practical." Scheduled to be presented at USENIX Security 2025 shindig in August, the paper describes "the first Rowhammer attack on Nvidia GPUs with GDDR6 DRAM." It focuses specifically on Nvidia A6000 GPUs with GDDR6 memory; newer GPUs like the H100 and RTX 5090 do not appear to be susceptible to this particular exploit. In our exploit, we show for the first time that such an attack can be executed using our Rowhammer-induced bit-flips on GPUs The Rowhammer attack dates back to 2014 when computer scientists from Carnegie Mellon University and Intel published a paper [PDF] describing how repeatedly accessing the same memory row in a DRAM chip could flip the stored electronic bits, resulting in data corruption and errors. Intel knew about the issue at least since 2012, when it began filing relevant patents to protect systems. The attack generally requires the attacker and victim to be tenants on the same hardware, with enough privileges to run the attack code. There is, however, a variant that operates over the network under certain conditions. In the eleven years since its public disclosure, the memory-smashing technique has been applied to many different devices and applications, including browsers, VMs, Android phones, flash storage, network devices that have remote direct memory access (RDMA) enabled, FPGAs, Arm chips, and AMD chips. Now it's Nvidia's turn. GPUHammer presents a particularly concerning threat because it can be used to meddle with AI models, which rely heavily on GPUs. The researchers showed they can use GPUHammer to alter the weights of a deep neural network to make AI model inference (output) less accurate, an attack technique referred to as Terminal Brain Damage in a 2019 research paper. "In our exploit, we show for the first time that such an attack can be executed using our Rowhammer-induced bit-flips on GPUs, and the resultant tampering of the DNN weights resident in the GPU memory can impact the DNN accuracy significantly," the authors state in their paper. They claim that in their proof-of-concept attack, they were able to degrade the accuracy of machine-learning models by up to 80 percent, despite the presence of a defense called Target Row Refresh in GDDR6 memory. Organizations running AI applications in a cloud environment with other tenants thus could find their models making significant mispredictions if subject to a GPUHammer beating. Nvidia does have a mitigation: enabling Error Correction Codes (ECC), using the command and then rebooting. The consequence of doing so, however, is a performance hit of about 10 percent and a reduction in memory capacity of about 6.25 percent. ®
[4]
NVIDIA shares guidance to defend GDDR6 GPUs against Rowhammer attacks
NVIDIA is warning users to activate System Level Error-Correcting Code mitigation to protect against Rowhammer attacks on graphical processors with GDDR6 memory. The company is reinforcing the recommendation as new research demonstrates a Rowhammer attack against an NVIDIA A6000 GPU (graphical processing unit). Rowhammer is a hardware fault that can be triggered through software processes and stems from memory cells being too close to each other. The attack was demonstrated on DRAM cells but it can affect GPU memory, too. It works by accessing a memory row with enough read-write operations, which causes the value of adjacent data bits to flip from one to zero and vice-versa, causing the in-memory information to change. The effect could be a denial-of-service condition, data corruption, or even privilege escalation. System Level Error-Correcting Codes (ECC) can preserve the integrity of the data by adding redundant bits and correcting single-bit errors to maintain data reliability and accuracy. In workstation and data center GPUs where VRAM handles large datasets and precise calculations related to AI workloads, ECC must be enabled to prevent crucial errors in their operation. NVIDIA's security notice notes that researchers at the University of Toronto showed "a potential Rowhammer attack against an NVIDIA A6000 GPU with GDDR6 Memory" where System-Level ECC was not enabled. The academic researchers developed GPUHammer, an attack method to flip bits on GPU memories. Although hammering is harder on GDDR6 because of higher latency and faster refresh compared with CPU-based DDR4, the researchers were able to demonstrate that Rowhammer attacks on GPU memory banks is possible. The GPU maker notes that newer GPUs like Blackwell RTX 50 Series (GeForce), Blackwell Data Center GB200, B200, B100, and Hopper Data Center H100, H200, H20, and GH200, come with built-in on-die ECC protection, which does not require an intervention from the user. One way to check if System Level ECC is enabled is to use an out-of-band method that utilizes the system's BMC (Baseboard Management Controller) and hardware interface software, like the Redfish API, to check the "ECCModeEnabled" status. Tools like NSM Type 3 and NVIDIA SMBPBI can also be used for configuration, though they require access to the NVIDIA Partner Portal. A second In-Band method also exists, using the nvidia-smi command-line utility from the system's CPU to check and enable ECC where supported. Rowhammer represents a real security concern that could cause data corruption or enable attacks in multi-tenant environments like cloud servers where vulnerable GPUs may be deployed. However, the real risk is context-dependent, and exploiting Rowhammer reliably is complicated, requiring specific conditions, high access rates, and precise control, making it an attack difficult to execute.
[5]
GPUHammer: New RowHammer Attack Variant Degrades AI Models on NVIDIA GPUs
NVIDIA is urging customers to enable System-level Error Correction Codes (ECC) as a defense against a variant of a RowHammer attack demonstrated against its graphics processing units (GPUs). "Risk of successful exploitation from RowHammer attacks varies based on DRAM device, platform, design specification, and system settings," the GPU maker said in an advisory released this week. Dubbed GPUHammer, the attacks mark the first-ever RowHammer exploit demonstrated against NVIDIA's GPUs (e.g., NVIDIA A6000 GPU with GDDR6 Memory), causing malicious GPU users to tamper with other users' data by triggering bit flips in GPU memory. The most concerning consequence of this behavior, University of Toronto researchers found, is the degradation of an artificial intelligence (AI) model's accuracy from 80% to less than 1%. RowHammer is to modern DRAMs just like how Spectre and Meltdown are to contemporary CPUs. While both are hardware-level security vulnerabilities, RowHammer targets the physical behavior of DRAM memory, whereas Spectre exploits speculative execution in CPUs. RowHammer causes bit flips in nearby memory cells due to electrical interference in DRAM stemming from repeated memory access, while Spectre and Meltdown allow attackers to obtain privileged information from memory via a side-channel attack, potentially leaking sensitive data. In 2022, academics from the University of Michigan and Georgia Tech described a technique called SpecHammer that combines RowHammer and Spectre to launch speculative attacks. The approach essentially entails triggering a Spectre v1 attack by using Rowhammer bit-flips to insert malicious values into victim gadgets. GPUHammer is the latest variant of RowHammer, but one that's capable of inducing bit flips in NVIDIA GPUs despite the presence of mitigations like target refresh rate (TRR). In a proof-of-concept developed by the researchers, using a single-bit flip to tamper with a victim's ImageNet deep neural network (DNN) models can degrade model accuracy from 80% to 0.1%. Exploits like GPUHammer threaten the integrity of AI models, which are increasingly reliant on GPUs to perform parallel processing and carry out computationally demanding tasks, not to mention open up a new attack surface for cloud platforms. To mitigate the risk posed by GPUHammer, it's advised to enable ECC through "nvidia-smi -e 1." Newer NVIDIA GPUs like H100 or RTX 5090 are not affected due to them featuring on-die ECC, which helps detect and correct errors arising due to voltage fluctuations associated with smaller, denser memory chips. "Enabling Error Correction Codes (ECC) can mitigate this risk, but ECC can introduce up to a 10% slowdown for [machine learning] inference workloads on an A6000 GPU," Chris (Shaopeng) Lin, Joyce Qu, and Gururaj Saileshwar, the lead authors of the study, said, adding it also reduces memory capacity by 6.25%. The disclosure comes as researchers from NTT Social Informatics Laboratories and CentraleSupelec presented CrowHammer, a type of RowHammer attack that enables a key recovery attack against the FALCON (FIPS 206) post-quantum signature scheme, which has been selected by NIST for standardization. "Using RowHammer, we target Falcon's RCDT [reverse cumulative distribution table] to trigger a very small number of targeted bit flips, and prove that the resulting distribution is sufficiently skewed to perform a key recovery attack," the study said. "We show that a single targeted bit flip suffices to fully recover the signing key, given a few hundred million signatures, with more bit flips enabling key recovery with fewer signatures."
[6]
GPUHammer Attack on NVIDIA GDDR6: Corrupts AI Models
Graphics cards are no longer just about rendering games; they're core to today's AI workloads. That's why the newly discovered GPUHammer attack is grabbing attention. Developed by researchers at the University of Toronto, GPUHammer silently flips bits in GDDR6 memory on NVIDIA GPUs. Even a single bit flip in a neural network weight can wreck an AI model's performance. In one demonstration on an RTX A6000, model accuracy plunged from 80 percent to almost zero. So how does GPUHammer work? It builds on Rowhammer, a flaw in densely packed DRAM chips where hammering one row causes electrical interference in its neighbors. Until now, this was mainly an issue for DDR4 in CPUs, but it turns out GDDR6 VRAM on NVIDIA cards is vulnerable too. An attacker in a shared environment -- like a cloud GPU instance -- simply hammers specific memory addresses until a useful bit flips. Because you're not touching the software or data directly, traditional antivirus or integrity checks won't spot anything amiss. When attackers flip bits in model parameters, weight matrices can change in unpredictable ways. Imagine your image classifier suddenly mislabels every cat as a toaster. That's exactly the kind of havoc GPUHammer can wreak. And it's not theoretical: the team measured multiple flips across different memory banks, even when basic safeguards were active. They broke a fully trained model, rendering it nearly useless for its intended purpose. Does this mean every AI project on NVIDIA hardware is at risk? Not necessarily. ECC -- Error Correction Code -- is your friend here. ECC adds redundant bits to each memory block, spotting and fixing most single‑bit errors on the fly. NVIDIA's advice is to turn on ECC if your card supports it. You'll trade off about 10 percent of your machine learning throughput and lose around 6.5 percent of VRAM, but you'll gain resilience against these subtle attacks. Most modern NVIDIA GPUs in data centers, like those based on Ampere, Ada, Hopper, and Turing architectures, already integrate ECC directly on the chip. That means you get hardware‑level protection without manual setup. Older workstation and server cards still need you to enable ECC via driver settings or the control panel. If you're running mixed workloads in a virtualized environment, it's a quick step to boost security. For individual users who lift weights at home with a single GPU, GPUHammer is unlikely to show up. You'd need someone else to be sharing the exact same graphics processor and intentionally hammer memory rows. That scenario is common in multi‑tenant clouds, AI training clusters, or virtual desktop infrastructure, but rare on a standalone desktop. Still, the core insight is sobering: GPU memory isn't immune to hardware‑level tampering. As AI accelerators become ubiquitous, ensuring memory integrity will be critical. Research teams and cloud providers should audit their GPU configurations, confirm ECC is active, and monitor for unexplained drops in model performance. Developers can also implement application‑level checks, like hashing critical data structures, to detect unexpected corruptions. In the long term, GPU manufacturers will likely refine memory designs or introduce more robust error detection. Academic work like GPUHammer highlights the importance of hardware‑software co‑design in security. By bringing these vulnerabilities to light, the University of Toronto researchers are helping the industry build stronger, more reliable AI platforms.
[7]
What is Rowhammer bit-flip attack which forced Nvidia to issue security alert
Researchers discovered that Nvidia's A6000 GPUs are prone to Rowhammer attacks. This allows hackers to tamper with user data on shared GPUs. Nvidia has issued an alert, advising users to enable Error Correction Code. Newer GPUs with GDDR7 or HBM3 memory have built-in protection. This vulnerability poses a risk in multi-tenant environments. A team of researchers has revealed that Nvidia's A6000 GPUs using GDDR6 memory are vulnerable to Rowhammer attacks, which can allow hackers to interfere with users' data on a shared GPU in the cloud, like AI models, even if they don't have direct access to it. In response to the research, Nvidia issued an alert for users asking them to ensure system-level ECC is enabled across the following NVIDIA products. "Specific generations of DRAM devices starting with DDR4, LPDDR5, HBM3, and GDDR7 implement On-Die ECC (OD-ECC) to help with DRAM scaling. OD-ECC indirectly protects Rowhammer bit flips. Note: OD-ECC is not adjustable by users. If OD-ECC is present, it is always enabled," the company said in its alert Rowhammer is a security issue where repeated access to memory can silently change data. Researchers just showed that this attack now works on Nvidia GPUs, so Nvidia issued an alert and recommends using ECC to stay safe. Rowhammer for GPUs, also called GPUHammer, is a new hardware attack that targets graphics cards, specifically NVIDIA GPUs with GDDR6 memory. It is a hardware vulnerability found in computer memory chips (DRAM). Normally, data is safely stored in separate rows of memory cells. But if you rapidly and repeatedly access (hammer) one row, it introduces bit flips in adjacent memory rows. This can "flip" bits of data even if no one directly accessed that data. "Since 2014, this vulnerability has been widely studied in CPUs and CPU-based memories like DDR3, DDR4, and LPDDR4. However, with critical AI and ML workloads now running on discrete GPUs in the cloud, it is vital to assess the vulnerability of GPU memories to Rowhammer attacks," the researchers said. Rowhammer is a circuit-level DRAM vulnerability that lets attackers flip bits in neighboring memory rows. It had only been shown on CPU DRAM before, but now, researchers have demonstrated the first Rowhammer bit-flip attack on GPU DRAM, specifically on GDDR6 memory used in NVIDIA GPUs like the A6000. Attackers can potentially use Rowhammer to mess with another user's data on a shared GPU in the cloud, like AI models, even if they don't have direct access to it. The research proved that even a single bit flip could destroy the accuracy of an AI model. The attack works in shared environments where multiple people or programs are using the same GPU at the same time (multi-tenant setups). Nvidia confirmed the Rowhammer risk on some GPUs and advised customers to turn on Error Correction Code (ECC), a memory feature that can catch and fix single-bit errors. This helps stop Rowhammer attacks, though it might slow down some AI workloads by up to 10%. Newer Nvidia GPUs (like those with GDDR7 or HBM3 memory) already have built-in protections (on-die ECC) against this type of attack.
[8]
Nvidia chips hacked, fall victim to Rowhammer bit-flip attacks; here's how to secure the AI GPUs
Canadian researchers have discovered a vulnerability, named GPU Hammer, in Nvidia A6000 GPUs, enabling Rowhammer bit-flip attacks. This attack allows malicious users to sabotage AI models by tampering with data, potentially degrading model accuracy significantly. Nvidia suggests enabling System-Level ECC as a simple fix, especially in multi-tenant environments where simultaneous GPU access is required for the attack. A team of Canadian researchers has proved and demonstrated that Nvidia A6000 GPUs are vulnerable to Rowhammer bit-flip attacks, which can easily allow attackers to sabotage artificial intelligence models running on the widely used hardware of the tech giant. The attack, called GPU Hammer, was created by University of Toronto researchers Chris Lin, Joyce Qu, and Gururaj Saileshwar, and it may pose significant risks to AI usage. It is the first attack to show Rowhammer bit flips on GPU memories, specifically on a GDDR6 memory in an NVIDIA A6000 GPU. According to the researchers, the attacks induce bit flips across all tested DRAM banks, despite in-DRAM defenses like TRR, using user-level CUDA code. These bit flips allow a malicious GPU user to tamper with another user's data on the GPU in shared, time-sliced environments. In a proof-of-concept, we use these bit flips to tamper with a victim's DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip. Rowhammer lets attackers alter or corrupt memory data by rapidly and repeatedly accessing a specific row of memory cells. This repeated hammering of selected rows causes bit flips in adjacent rows, turning digital zeros into ones or vice versa. So far, Rowhammer attacks have only been shown on memory chips used in CPUs for general-purpose computing. Reacting to the new research, Nvidia released a security notice saying that the fix is simple. The users just need to enable System-Level ECC, or error-correcting code. This simple setting creates a redundancy in the bits, so if one gets flipped, the system can automatically correct it before anything goes wrong. "For enterprise customer environments that require enhanced levels of assurance and integrity, NVIDIA recommends using professional and data center products (instead of consumer-grade graphics hardware) and ensuring that ECC is enabled to prevent Rowhammer-style attacks. This is enabled by default on the Hopper and Blackwell Data Center class of GPUs," Nvidia said in a statement. When evaluating the risk, it's important to consider whether the GPU setup is single-tenant or multi-tenant. A Rowhammer attack between tenants can only be carried out if they access the GPU simultaneously.
[9]
Research Reveals GPUHammer's Capability To Destroy AI Model Accuracy On GDDR6 Memory GPUs From 80% To Just 0.1%
With just single-bit flips in DRAM banks, the GPUHammer can easily bring the GPU accuracy to less than 1% on high-end GPUs equipped GDDR6 VRAM. The researchers at the University of Toronto demonstrated how RowHammer attacks can easily bring down the AI Model accuracy of GPUs by inducing bit flips in the GPU memory banks. The RowHammer vulnerability, which allows attackers to destroy the data inside the memory cells can also affect the GPU memory as demonstrated by the researchers. By inducing bit flips across the tested DRAM banks on video memory, which in this case was the GDDR6 VRAM of the NVIDIA RTX A6000, researchers were able to degrade the GPU efficiency in AI models significantly. This was carried out even in the presence of hardware-level defences like the DRAM-target refresh rate (TRR) and with a single bit flip in the FP16 value, the DNN prediction accuracy went from 80% to just 0.1% across major ImageNet models. The GPUHammer essentially comes into action in three steps: Reverse-Engineering DRAM Bank Mappings, Maximizing Hammering Efficiency, and Synchronization with DRAM Refresh Cycles. The researchers have explained all those steps in detail on the website, which basically helped them trigger the single-bit flips across the four DRAM banks using the ~12K activations per flip. In simple words, the GDDR6 memory on the RTX A6000 becomes vulnerable, but other GPUs with the GDDR6 memory, like the RTX 3080, didn't see such results. This may be due to the differences in the GDDR6 memory on both GPUs as NVIDIA utilizes memory chips from different vendors like Samsung, SK Hynix, and Micron. Similarly, no bit flips were seen on the NVIDIA RTX 5090, and even data center cards like A100 and H100 GPUs, which boast the HBM memory (High Bandwidth Memory). Thankfully, there is no need to worry even if you own an RTX A6000, since the GPUHammer can be mitigated by enabling ECC (Error-Correcting Code), which can detect and correct hte single-bit flips. Nonetheless, this can have an adverse effect on the performance of the RTX A6000 and one can see up to 10% slower performance in ML inference workloads and up to 6.25% loss of usable VRAM capacity. Meanwhile, NVIDIA has also issued a security notice regarding this vulnerability and advises SYSTEM-LEVEL ECC to be enabled on affected GPUs. Thankfully, a lot of modern GPUs like the Hopper and Blackwell have ECC enabled by default.
Share
Copy Link
Researchers demonstrate the first Rowhammer attack on NVIDIA GPUs, potentially compromising AI model accuracy. NVIDIA recommends enabling ECC as a mitigation, despite performance trade-offs.
Researchers from the University of Toronto have unveiled GPUHammer, the first successful Rowhammer attack targeting NVIDIA GPUs with GDDR6 memory. This groundbreaking discovery extends the reach of Rowhammer vulnerabilities beyond traditional CPU memory, posing significant threats to AI model integrity and cloud computing environments 1.
Source: Guru3D.com
GPUHammer exploits physical weaknesses in GDDR6 memory chips, allowing attackers to induce bit flips by repeatedly accessing specific memory rows. This technique can corrupt data stored in GPU memory without directly altering code or input data 2.
The researchers demonstrated the attack on an NVIDIA RTX A6000 GPU, a widely used model in high-performance computing and cloud services. By flipping a single bit in the exponent of a model weight, they were able to degrade AI model accuracy from 80% to 0.1%, effectively rendering the model useless 1.
Source: Ars Technica
The potential impact of GPUHammer on AI applications is severe. Gururaj Saileshwar, an assistant professor at the University of Toronto and co-author of the study, likened the effect to "inducing catastrophic brain damage in the model" 1. This could lead to critical failures in various domains:
The attack is particularly concerning in shared GPU environments, such as cloud servers, where multiple users run workloads on the same hardware 2.
In response to the GPUHammer threat, NVIDIA has issued a security advisory recommending the activation of System-Level Error-Correcting Code (ECC) for affected GPU models 3. ECC adds redundancy to memory, allowing for the detection and correction of bit flips 4.
To enable ECC, users can use the NVIDIA command-line tool:
nvidia-smi -e 1
However, this mitigation comes with trade-offs:
Source: Economic Times
The GPUHammer attack potentially affects a wide range of NVIDIA GPUs with GDDR6 memory, including models from the Ampere, Ada, Hopper, and Turing architectures 2. However, newer GPUs like the RTX 5090 and H100 have built-in on-die ECC, providing inherent protection against this type of attack 5.
As GPUs continue to evolve beyond gaming into AI, creative work, and productivity, the discovery of GPUHammer serves as a wake-up call for the industry. It highlights the need for ongoing research into hardware vulnerabilities and the development of robust security measures to protect the integrity of AI models and other critical applications relying on GPU acceleration.
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to challenge competitors in the premium handset market.
20 Sources
Technology
2 hrs ago
20 Sources
Technology
2 hrs ago
Google's Pixel 10 series introduces groundbreaking AI features, including Magic Cue, Camera Coach, and Voice Translate, powered by the new Tensor G5 chip and Gemini Nano model.
12 Sources
Technology
3 hrs ago
12 Sources
Technology
3 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
10 hrs ago
6 Sources
Technology
10 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, enhanced AI features, and improved health tracking capabilities.
17 Sources
Technology
2 hrs ago
17 Sources
Technology
2 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
2 hrs ago
7 Sources
Technology
2 hrs ago