4 Sources
[1]
Lack of PCIe bandwidth can nerf RTX 5090 by up to 25% in content creation workloads -- Puget data confirms performance hit when using older generations and fewer lanes
Large language model AI workloads are unaffected, but content creation and 3D rendering took a real hit. It's always been clear that PCIe bandwidth has an effect on gaming, especially when playing on some of the most high-end GPUs. But what about content creation? Puget Systems put this idea to the test and found that lacking proper PCIe bandwidth can seriously hinder performance in video rendering and game development. That suggests professionals working with multiple add-in cards may need to take care in their placement to ensure the best possible performance for their workloads. PCIe generation is almost exclusively down to your motherboard's specifications. But just because a board supports the latest PCIe 5 on its 16x slots doesn't mean they're all equal. Plugging cards into both slots can cause them to default to a mere 8x lane configuration, halving bandwidth for both cards. Installing additional NVMe SSDs can also reduce the number of available lanes, potentially impacting performance in a range of professional tasks. The same goes for older motherboards with support for older standards, too. Puget proved that PCIe bandwidth can limit performance in its DaVinci Resolve benchmarks. The best performance was, unsurprisingly, with the PCIe configuration with the most bandwidth. Puget found only margin of error differences between PCIe 5 16x, PCIe 5 8x, and PCIe 4 16x. But when switching down to PCIe 5 4x, PCIe 4 8x, or PCIe 3 16x, performance took a 10% hit. Dropping down one tier further to PCIe 4 4x or PCIe 3 8x saw a total dip of 25% from the original figures. That's the kind of performance drop-off that could have a serious impact on a business, potentially reducing profits through longer time to results, or more staff and equipment time required. After Effects saw less of a drop off, with only the slowest PCIe configurations falling outside of the margin of error, but they still show an impact from having less available PCIe bandwidth. Puget recorded similar results with Unreal Engine 5.5 virtual production tests, where PCIe 4.0 4x and PCIe 3 8x were around 7% slower than the highest bandwidth configurations. It also recorded a five percent difference between the highest and lowest bandwidth options in Blender, though the Llama large language model benchmark shows little effect. In conclusion, it seems that available PCIe bandwidth can have a notable impact on performance in professional applications, and particularly in video editing. However, it should be noted that this test was performed with an Nvidia RTX 5090 graphics card. That's the fastest GPU in the world outside of some professional options, and it demands the most PCIe bandwidth of any card. Systems built around more modest GPUs may not suffer so much from PCIe bandwidth constraints. For those on the cutting edge, though, be careful with your add-in card configurations if you want to manage PCIe bandwidth effectively. Or just wait for PCIe 6.
[2]
Impact of PCIe 5.0 Bandwidth on GPU Content Creation Performance
With the release of the NVIDIA Blackwell GPUs and RDNA 4-based Radeon 9000-series GPUs, we finally have consumer video cards that support the PCIe 5.0 standard. Although we have had motherboards with support for it for some time now, we didn't have any devices other than storage that took advantage of it. This leads now to the question: What impact does PCIe 5.0's increased bandwidth have on GPU performance in content creation applications? PCI Express (abbreviated PCIe or PCI-e) is a technology used to connect various internal computer devices to the motherboard. The physical connectors and communication schema are used for drives, GPUs, and add-in cards like RAID or HBA cards and network cards. Since 2003, we have seen a variety of revisions and updates to the standard. Currently, the most common PCIe specification seen on new high-end motherboards is PCIe 5.0 at 16x, though often with some 4.0 lanes available. The primary difference between PCI Express versions is transfer rate. A PCIe connection between devices has two defining features: the number of lanes and the PCIe version. Most slots on the motherboard have between four and sixteen lanes (x4, x8, or x16), with the occasional x1 or x2 slot. Each of these lanes has a maximum transfer rate, defined by the PCIe version. Since PCIe 3.0, each new version has doubled this transfer rate. As an example, PCIe 5.0 supports up to 32 GT/s per lane. So, an x16 slot has 16 lanes each at 32 GT/s for a maximum throughput of 64 GB/s. If that same slot were using the ePCIe 4.0 protocol, it would have 16 lanes at 16 GT/s for a throughput of up to 32 GB/s. Alternatively, you could achieve 32 GB/s with x8 lanes at PCIe 5.0. At present, consumer desktop motherboards tend to feature limited "free" PCIe lanes. We tend to be disappointed with the quantity, connectivity, and placement of PCIe slots on motherboards. Although it depends on the specific price point being targeted, many boards will have a primary 5.0 x16 slot, and then only a few other slots, typically at 4.0 x4 or even 3.0 x1. One reason we like the ASUS ProArt boards we often carry is in part due to the PCIe slot layout and support. However, this isn't merely motherboard vendors being cheap. Instead of maximizing add-in card support, they are typically dedicating many of the available PCIe lanes (from the CPU or chipset) to additional features like M.2 slots, USB ports, and Ethernet/WiFi. The drawback to this is that once a GPU is installed, there may be no way to add a second at full bandwidth (if at all). Even our preferred boards require that the GPU be run at x8 if we want to install most add-in cards or additional GPUs. Given that add-in cards (GPUs or otherwise) may need to run at lower-than-maximum bandwidth due to PCIe lane availability concerns, it is reasonable to ask what the cost is. How much performance is lost when video cards are operated at less than their maximum PCIe bandwidth? An Illustration of the PCIe Slot Problem To illustrate the current difficulties of multiple PCIe devices in modern consumer motherboards, we grabbed a handful of the best-selling AM5 and LGA 1851 compatible motherboards on Newegg, alongside our preferred ASUS ProArt X870 and Z890 Creator boards. Most of these aren't the cheapest options available, but they also aren't the most expensive. They are, arguably, the most popular for new PC builds, though. The first thing we notice when looking at these is that, save for the ASUS TUF board, none have more than 3 PCIe expansion slots. Of those three slots, none of them are actually more than 4.0 x4, save for the ProArt boards. On many boards, one of those three is even slower, at 3.0 x1 or x2. (Note that, while physically an x16 length slot, most of the non-primary slots are only electrically wired for x4 or less). The TUF board does offer a bevy of 4.0 x4 and 4.0 x1 slots, while the ProArts can do x16 in either of the top two slots, though both are limited to x8 when both are in use. Of course, not everyone needs tons of add-in cards. For many users, a single GPU is the only one they'll use. But we have found that professionals frequently require a GPU plus at least one add-in card. Based on this, we think the primary bandwidths to keep an eye on in the upcoming results are 5.0 x16, 5.0 x8, and 4.0 x4. For those on older motherboards considering a GPU upgrade, 3.0 x16 and x8 are also likely relevant. We choose our benchmarks to cover many workflows and tasks to provide a balanced look at the application and its hardware interactions. However, many users have more specialized workflows. Recognizing this, we like to provide individual results for benchmarks as well. If a specific area in an application comprises most of your work, examining those results will give a more accurate understanding of the performance disparities between components. Otherwise, we recommend skipping over this section and focusing on our more in-depth analysis in the following sections. Video Editing / Motion Graphics: DaVinci Resolve Studio & After Effects In both DaVinci Resolve and After Effects, we only included the "Overall" scores. This is because we saw little difference in the overall performance trends when we separated them by workflow. However, we have those in the raw results tables above if you want to see the specific performance scores for various workflows, such as 3D in After Effects or Intraframe media in Resolve. Starting with DaVinci Resolve (Chart #1), we found that GPU PCIe bandwidth does noticeably affect overall performance. At the high end of the bandwidth spectrum, we see relatively similar performance from PCIe 5.0 x16, 5.0 x8, and 4.0 x16. We technically have the 5.0 x16 result ahead, but it is within what we would consider the margin of error for this type of testing. After those three, the next grouping is all the 16 GB/s combinations: 5.0 x4, 4.0 x8, and 3.0 x16. This cluster is about 90% as performant as the prior. We don't love a 10% performance reduction just by having a slower slot, but it is often acceptable. However, the next tier down isn't. 3.0 x8 and 4.0 x4 were only 75% as fast as the full-bandwidth (5.0 x16) result. Similarly, the slowest option, 3.0 x4, had only 54% the performance. While running a GPU in any of those combinations is likely rare, we definitely recommend avoiding configuring a GPU at these bandwidths for DaVinci Resolve. We see less overall effect in After Effects (Chart #2). Visually, unlike DaVinci Resolve, the bars are less clustered by color, and there is less of a stair-step pattern. The slowest three bandwidths are the slowest three results, though. Here, the results for 64 GB/s to 16 GB/s are all within the margin of error, essentially random. Once we drop to 8 GB/s with 3.0 x8, we are outside that margin (though only with respect to the grouping). At 8 GB/s, 4.0 x4 is slower than the higher-bandwidth results. Finally, 3.0 x4 is 10% slower than 16 GB/s or greater configurations. Our recommendation would be to worry less about PCIe bandwidth in After Effects, but to try to avoid a really-low bandwidth situation like 3.0 x4. Game Dev / Virtual Production: Unreal Engine Our Unreal Engine benchmark results appear to be somewhere between DaVinci Resolve and After Effects. Like the former, there is clear clustering of the bandwidths, but, like AE, not many distinct "steps" exist. 5.0 x16, x8, and x4 as well as 4.0 x16 and x8, and 3.0 x16 are all functionally identical. 3.0 x16 looks like it may be a touch slower than the rest, but it is just within the margin of error for this testing. However, we do see results outside of that for the lower bandwidths. 4.0 x4 and 3.0 x8 are 93% as fast as the 64 GB/s results, and 3.0 x4 trails with 90% the performance. Overall, none of these are huge differences in performance. As we discussed above, while a 10% performance hit isn't great, it is also acceptable in some cases. We would urge caution when dropping a GPU to 4.0 x4 or below, but it may be a tradeoff worth making for multi-GPU or to facilitate add-in cards. GPU Rendering: Blender & Octane For this article, we tested with three rendering benchmarks: V-Ray, Blender, and Octane. However, our V-Ray results seemed particularly anomalous, so we haven't included them in the charts, though they are in the results table above. In Blender and Octane, we see essentially no effect of bandwidth on performance. In the case of Blender, the total change from average is about 5%, while Octane is 2.5%. All the results are largely within the margin of error, and we can't draw many conclusions. In this case, that means there is likely no effect. This makes sense as the scenes are all contained within GPU VRAM, and the loading time isn't counted. Overall, there seems to be little to no downside to installing a GPU in a reduced-bandwidth situation for offline rendering applications. AI: LLM (llama) Finally, our Llama.cpp benchmark looks at GPU performance in prompt processing and token generation. For both workflows, the results seem effectively random, with no discernible pattern. The overall difference in performance is also fairly small, about 6% for prompt processing. Due to this, we would generally say that bandwidth has little effect on AI performance. However, we would caution that our LLM benchmark is very small, and LLM setups frequently involve multiple GPUs that are offloading some of the model to system RAM. In either of these cases, we expect that PCIe bandwidth could have a large effect on overall performance. Does GPU PCIe Bandwidth Affect Content Creation Performance? On modern motherboards, you often only get one PCIe slot at a full 5.0 x16 bandwidth. Additional slots may be 5.0 x8, but are likely much lower, at 4.0 x4 or below. Because of this, multi-GPU setups or configurations with add-in cards may find one or more GPUs with dramatically reduced PCIe bandwidth. Although most of the workflows we tested don't show too much performance loss at 4.0 x4, that's not true across the board. In video editing/motion graphics, we saw the largest impact. PCIe 5.0 x16, x8, and 4.0 x16 were functionally equivalent. However, below that, we started to see some differences, especially in DaVinci Resolve. In that application, 3.0 x16 was 10% slower, and our typical-case 4.0 x4 was about 25% slower. These margins are reduced in After Effects, but still present. We recommend caution when configuring a system for video editing applications with multiple add-in cards, as reducing the number of lanes available to the GPU can have a measurable impact on performance. Our Unreal Engine benchmark also showed performance impacts from PCIe bandwidth. However, the impacts are more minor. We only saw a noticeable hit once the bandwidth was reduced to 4.0 x4 (or equivalent), with an average fps drop of 7%. 3.0 x4 was slightly worse, at 10% slower than maximum bandwidth. While we are less concerned about this amount of lost performance, it should still be kept in mind. Offline renderers and LLM benchmarks showed no impact from PCIe bandwidth on performance. This makes sense as both tend to load their work fully into GPU VRAM and crash if they can't. There are some exceptions to this with LLMs, but operating out of system RAM is a huge slowdown. Thus, while reduced PCIe bandwidth may slow initial model or scene loading, it should have a negligible impact on performance after that. Our one note of caution here is that, in situations where you are pooling VRAM to fit a model, PCIe bandwidth may have a large effect. We were not able to test that here. When we configure the systems we sell, we balance the need for maximum performance from components with the desire for add-in cards necessary for our customers to do their work. Frequently, this means reducing the primary GPU to PCI-e 5.0 x8, which reduces the PCI-e bandwidth in half. However, as we showed in this article, this major reduction in bandwidth often has a minimal impact on real-world performance. Outside of a few uncommon situations, this testing confirms that as long as you have a modern motherboard that supports PCIe 5.0, running the GPU at x8 speeds is not an issue. However, lower-end motherboards, which will require the GPU to run at 4.0 x4, may introduce performance penalties.
[3]
PCIe Lane Configurations can alter up to 25% perf in content creation with RTX 5090
The NVIDIA GeForce RTX 5090 is built with PCIe 5.0 x16 support, offering up to 64 GB/s of bandwidth. This is a notable step up from the RTX 4090, which runs at PCIe 4.0 x16 with 32 GB/s bandwidth. Recently, Puget Systems ran a series of tests to see how different PCIe versions and lane counts impact the RTX 5090's performance, especially for content creators working with video editing and 3D rendering tools. Their findings show that when PCIe bandwidth is limited, the RTX 5090's performance can drop significantly -- by as much as 25% in some cases. The tests focused heavily on DaVinci Resolve, a popular video editing software. Interestingly, running the RTX 5090 at full PCIe 5.0 x16, PCIe 5.0 x8, or even PCIe 4.0 x16 made little difference in render times -- they were almost identical. However, when the bandwidth was cut down to PCIe 5.0 x4, PCIe 4.0 x8, or PCIe 3.0 x16, render times slowed by roughly 10%. The slowdown became more pronounced with PCIe 4.0 x4 or PCIe 3.0 x8 lanes, increasing render times by about 25%. Other applications behaved differently; for example, Adobe After Effects only showed small performance drops when bandwidth fell below 8 GB/s. Unreal Engine 5.5 tests showed about a 7% drop in frame rates when running on the lowest PCIe lane counts. Offline rendering programs like Blender and OctaneBench, plus AI model benchmarks such as Llama, showed almost no impact from PCIe speed differences. Why does this happen? The RTX 5090's PCIe 5.0 interface uses NRZ signaling, the same basic method as PCIe 4.0, but with stricter measures to keep the signal clean and timed accurately. Despite being backward compatible with PCIe 4.0, 3.0, and earlier versions, the real-world bandwidth available depends heavily on your motherboard. Most motherboards allocate a full x16 lane configuration to only one PCIe slot. Any other slots or NVMe drives have to share the remaining lanes, often running at x8, x4, or less. This lane sharing means your RTX 5090 might run at PCIe 4.0 x4 or even lower speeds in some setups, which limits the GPU's ability to move data efficiently. This bandwidth limitation matters most when working with large, high-resolution video timelines or complex 3D scenes where quick data transfer between the GPU and storage or CPU is critical. If your RTX 5090 runs on a slot with reduced lanes, projects will take longer to render or export. For professionals using multi-GPU configurations, ensuring the RTX 5090 is installed in a slot that supports the full x16 lanes is crucial to maintain expected performance. Sources: Puget Systems, via Toms Hardware
[4]
NVIDIA RTX 5090 Loses Over 25% Performance Without Full PCIe Bandwidth, With Noticeable Losses in Rendering Workloads
NVIDIA's flagship Blackwell GPU apparently loses a massive chunk of performance if it isn't operated under the full PCIe bandwidth, in particular with video editing applications. Many factors determine a GPU's performance across multiple workloads, and one important one is the PCIe bandwidth that is accessible to it. With the latest PCIe 5.0 generation, insufficient PCIe lanes could severely deteriorate GPU performance, especially across intensive workloads like video rendering and content creation. Puget Systems has conducted extensive testing on NVIDIA's GeForce RTX 5090 to determine the performance impact with lesser PCIe bandwidth, and based on the result, it is evident that the difference is quite significant. Interestingly, PCIe bandwidth allocated to GPUs gets affected if another device is put in a PCIe slot apart from the primary ones, since it essentially distributes the lanes equally. Many motherboard manufacturers have a single PCIe 5.0 x 16 slot, which allows the RTX 5090 to operate at full performance, but if other add-in cards, such as a PCIe network card, are connected, it does affect GPU performance. Puget Systems tested NVIDIA's flagship Blackwell GPU in rendering applications and AI workloads, and it seemed like the GPU ran perfectly only under full PCIe bandwidth consumption. Starting with After Effects, the RTX 5090 saw a significant performance hit when it dropped from PCIe 5.0 x 16 to PCIe 3.0 x 4, marking more than a 10% difference. Similarly, with DaVinci Resolve, the performance hit was more than 20% when the GPU ran under PCIe 3.0 x 4, and there were noticeable hits as well when the number of lanes dropped from x16 to x4 across the same generation. This shows that operating multiple add-in cards on your motherboard could affect GPU performance massively. In Game Dev benchmarks, particularly on Unreal Engine, almost all PCIe configurations showed little change in performance. This was also true for AI workloads, such as Llama.cpp benchmark, where performance was unaffected when lower PCIe bandwidth was assigned to the onboard GPU. The reason why the drop isn't much significant here is that these applications are more dependent on the GPU VRAM. While the average consumer shouldn't worry much about PCIe bandwidth, professionals, especially content creators, should keep it in mind.
Share
Copy Link
Recent tests reveal that NVIDIA's RTX 5090 GPU can suffer significant performance drops in content creation tasks when PCIe bandwidth is limited, highlighting the importance of proper PCIe configuration for professionals.
Recent tests conducted by Puget Systems have revealed that the NVIDIA RTX 5090, the latest flagship GPU in the Blackwell series, can experience significant performance drops when operating under limited PCIe bandwidth conditions. This finding has important implications for professionals in content creation and other GPU-intensive fields 1.
PCIe (PCI Express) is the standard interface for connecting high-speed components like GPUs to a computer's motherboard. The bandwidth available to a GPU depends on two factors: the PCIe generation (e.g., PCIe 3.0, 4.0, or 5.0) and the number of lanes allocated (typically x16, x8, or x4). The RTX 5090 supports PCIe 5.0 x16, offering up to 64 GB/s of bandwidth 2.
Source: Tom's Hardware
The impact of reduced PCIe bandwidth varies significantly across different applications:
Source: Guru3D.com
3D Rendering and Motion Graphics: Applications like After Effects showed less dramatic but still noticeable performance reductions, particularly at the lowest bandwidth configurations 1.
Game Development: Unreal Engine 5.5 tests revealed approximately 7% slower performance at the lowest PCIe bandwidth configurations 1.
AI Workloads: Interestingly, large language model benchmarks like Llama showed little to no impact from reduced PCIe bandwidth 4.
These findings have significant implications for professionals, especially those in content creation:
Motherboard Selection: The choice of motherboard becomes crucial, as many consumer boards limit the number of full-bandwidth PCIe slots 2.
Multi-GPU Setups: Professionals using multiple GPUs or add-in cards need to be particularly cautious, as adding cards can reduce the available lanes for each device 3.
Workflow Optimization: The impact varies by application, so professionals should consider their specific workflows when configuring their systems 4.
As GPUs continue to advance, the demand for PCIe bandwidth is likely to increase. This trend underscores the importance of staying informed about hardware configurations and their potential impact on performance. For those on the cutting edge of content creation and other GPU-intensive tasks, careful consideration of PCIe bandwidth allocation will be crucial to maximize the potential of high-end GPUs like the RTX 5090 1.
Summarized by
Navi
[2]
The Model Context Protocol (MCP) is emerging as a game-changing framework for AI integration, offering a standardized approach to connect AI agents with external tools and services. This innovation promises to streamline development processes and enhance AI capabilities across various industries.
2 Sources
Technology
15 hrs ago
2 Sources
Technology
15 hrs ago
A new study reveals that advanced AI language models, including ChatGPT and Llama, are increasingly prone to oversimplifying complex scientific findings, potentially leading to misinterpretation and misinformation in critical fields like healthcare and scientific research.
2 Sources
Science and Research
15 hrs ago
2 Sources
Science and Research
15 hrs ago
OpenAI publicly disavows Robinhood's sale of 'OpenAI tokens', stating they are not actual company equity. The incident raises questions about AI company ownership and tokenization of private assets.
4 Sources
Business and Economy
2 days ago
4 Sources
Business and Economy
2 days ago
Elon Musk's xAI obtains an air permit for 15 gas turbines at its Memphis data center, sparking debate over pollution and environmental justice in predominantly Black neighborhoods.
6 Sources
Technology
2 days ago
6 Sources
Technology
2 days ago
Sony and AMD collaborate on Project Amethyst to bring FSR 4, an AI-powered upscaling technology, to the PS5 Pro in 2026, promising significant improvements in graphics and performance.
2 Sources
Technology
2 days ago
2 Sources
Technology
2 days ago