2 Sources
[1]
$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling -- modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference
Running LLMs locally on your GPU requires a lot of VRAM, which can drive the rig's cost up exponentially these days. Amidst the ongoing AI boom, the best value lies in older, often forgotten silicon that's still capable, which is exactly what YouTuber Hardware Haven found. He took an Nvidia V100 server GPU with an SMX interface, which is similar to using a socketed processor, and converted it to a standard PCIe bus, which plugged into a consumer motherboard. It ended up performing quite well for its stature (and cost), even against modern SKUs. The contraption begins with an Nvidia Tesla V100 AI GPU that uses the SMX2 socket and is designed for rack-scale deployments. The SMX interface is a mezzanine-based connector that mounts GPUs flat against a specialized baseboard, similar to a CPU socket, and the GPU is then screwed down to the baseboard. The host was able to acquire this GPU for just $100, and the accompanying SMX-to-PCIe x16 adapter was also around $100, bringing the total cost of the setup to $200. The V100 comes with either 16 or 32GB of HBM2 (we're working with 16GB here, sporting 900 GB/s of bandwidth), and it's based on the Turing architecture. The PCIe adapter card didn't come with any cooling of its own, and since the V100 is literally just a heatsink on a PCB, the YouTuber designed and 3D-printed a duct for it. He attached an 80mm Notcua fan on the end to draw in fresh air toward the heatsink. The adapter also has 2x 8-pin PCIe power connectors for, well, power, along with 3x 4-pin PWM headers. It does not feature a secondary SMX socket for NVLink; however, such sockets are much more expensive. Once the GPU was ready and slotted into a standard Ryzen system, it was time to test just how artificially intelligent a 2017 card is. Keep in mind that the V100 has no display output, so you need integrated graphics in your CPU to actually use your computer. In Ollama, using gpt-oss-20b, the V100 was able to crank out 130 tokens per second, while the Radeon RX 7800 XT in the YouTuber's daily driver system only achieved about 90 tokens per second. Both cards have 16 GB of VRAM, and the RX 7800 XT is even newer with supposedly more efficient silicon, but then again, Nvidia is the gold standard for software support in these benchmarks. So, the host switched to an RTX 3060 12 GB (the best Nvidia GPU he had on hand) to compare against the V100, which is also built on newer Ampere architecture. Running Google's gemma4: e4b, the V100 topped out at 108 tokens per second, while the 3060 12 GB only managed about 76 tokens per second, but it did so consuming less power -- 293W on the V100 versus 235W on the RTX 3060. If we calculate tokens per watt, that comes out to around 0.37 for the V100, slightly more efficient than the 0.33 tokens per second per watt on the 3060. Power-limiting the V100 to 100W (it comes with 300W out of the box) dropped the power draw to 170W in the same test, while still producing 95 tok/s. To make the comparison fair, the YouTuber also limited the 3060 to 100W; it ended up consuming 171W and producing just 68 tokens per second. So, with both new results, the V100 achieves an efficiency score of 0.55 tokens/s per watt, while the 3060 12 GB was stuck at 0.39 tokens/s per watt. Even though the V100 proved much more efficient overall, despite being several generations old, its idle power draw is the real crux. It sips 45W just sitting doing nothing, compared to 35W on the RTX 3060. Finally, the YouTuber also tested Frigate NVR, which ended up performing really well on the V100, better than the RTX 3060, but consumed more power, as you'd expect. The host's previous setup for Frigate was an Intel-based N100 mini PC that struggled to ever detect his dog on mobilenetv2, but the V100 was able to identify it instantly. Monitoring just two cameras made the V100 pull over 100W, though; the RTX 3060 was similar in this regard, while the older N100 consumed only 26W when operating six different cameras. That marks the end of the benchmarking. This V100 experiment turned out to be a success overall, but the virality of the original video and the fact that we're writing this article mean these bad boys are about to go up in price. So, if you're interested, make sure to grab one before it's too late; the YouTuber found it for just $100 on eBay, and the PCIe adapters for early SMX sockets are cheap enough as well. The 32GB variant of the V100 goes for $500, however. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
[2]
NVIDIA's V100, An 8-Year Old GPU, Now Sells for $100 and Crushes Modern Consumer Cards in AI LLM Workloads
New GPUs are optimized substantially for AI workloads, but what if old GPUs like the 8-year-old NVIDIA V100, costing around $100, start to outperform recent offerings in LLMs? The NVIDIA Volta generation was the first purely dedicated data center series that wasn't available in the standard consumer gaming segment. Volta was the first family to feature the Tensor Core architecture, which has since become the staple for its AI advancements. The tensor core architecture was designed to handle AI tasks and has evolved massively since the Volta family, but Hardware Haven decided to test an 8-year-old V100 GPU to see how it holds up in modern-day AI LLMs. But first, let's recap the specifications of the NVIDIA Tesla V100 GPU. The Tesla V100 was available in two distinct form factors, an SXM board and a PCIe variant. The SXM models were housed primarily in data centers using a mezzanine connector, which allowed direct power and NVLink routing. The V100 tested is an SXM2 model, which features 5120 cores, 320 TMUs, 128 ROPs, and 640 Tensor Cores. It packed 6 MB of L2 cache, a clock speed of up to 1530 MHz, and either 16 or 32 GB of HBM2 memory across a 4096-bit wide bus interface, resulting in 898 GB/s bandwidth. The GPU had a 250W TDP, which feels minuscule vs the current 1KW+ Blackwell models. Back then, the NVIDIA Tesla V100 was priced at over $10,000 US, but today, it can be bought off eBay for just $100 US for the 16 GB variant. But the main problem isn't the price of the GPU, it's compatibility with a standard PC. No PC supports SXM2 standards. This required an SXM to PCIe adapter, which comes with its own dedicated 2x8-pin connector configuration and three 4-pin fan headers. The other hurdle was the cooling solution. The NVIDIA Tesla series is designed for large-scale data centers and runs passively with a large heatsink. The heatsink and backplate on the GPU are high-end, but not able to sustain a 24/7 operation inside standard PCs. This required the techtuber to come up with his own cooler duct, which was 3D printed, and a single Noctua fan that offered direct airflow to the heatsink. The total cost of the GPU and the add-ons ended up slightly over $200 US, which is still lower than the models used for comparison, such as the RTX 3060 12 GB and the RX 7800 XT 16 GB. The first AI LLM used for testing was GPT-oss with 20b parameters. Here, the NVIDIA V100 system was able to produce around 130 Tokens/s versus the RX 7800 XT, which only managed around 90 Tokens/s. Compared to the NVIDIA GeForce RTX 3060 12 GB, which is roughly 5-years old, the NVIDIA V100 was 42% faster in Gemma4:e4b (ollama+openwebui) in token generation speed. What's more impressive is the power efficiency of the 8-year-old GPU, which, although it had a higher power draw, was 12% ahead of the newer Ampere-based GPU. The GPU was also tested with a 100W power limit, where it once again outperformed the RTX 3060 with a 41% lead in power efficiency in token/sec/watt tests. While this proves that older GPUs are still viable for AI LLMs, offering great value and efficiency, they do require extra modding, which isn't up to everyone to perform. The 32 GB model does cost around $400 - $500 US, but the extra memory capacity can further help in bigger AI LLMs. With that said, the tech outlet aims to do additional testing in the future, so be sure to check out their channel and the complete video below:
Share
Copy Link
YouTuber Hardware Haven converted a $100 Nvidia V100 data center GPU with SMX interface into a PCIe card using a custom adapter and 3D-printed cooling. The modded Tesla V100 delivered 130 tokens per second in AI inference tests, outperforming both the RTX 3060 and RX 7800 XT while proving more efficient than modern midrange offerings at just 0.55 tokens per second per watt.
Running AI LLM workloads locally demands substantial VRAM, driving costs skyward for enthusiasts and developers alike. YouTuber Hardware Haven discovered an unexpected solution: the Nvidia V100, an 8-year-old data center GPU originally priced over $10,000, now available on eBay for just $100
1
. The modded Tesla V100 experiment required converting the server-grade AI GPU from its SMX interface to a standard PCIe card, but the results proved compelling for anyone seeking affordable local AI inference capabilities.
Source: Tom's Hardware
The Nvidia V100 features an SXM2 socket designed for rack-scale deployments, mounting flat against specialized baseboards through a mezzanine-based connector similar to CPU sockets
2
. Hardware Haven acquired an SMX-to-PCIe x16 adapter for approximately $100, bringing total project cost to $2001
. The custom PCB adapter includes 2x 8-pin PCIe power connectors and 3x 4-pin PWM headers, though it lacks secondary SMX sockets for NVLink connectivity.Since the Tesla V100 relies on passive cooling with large heatsinks designed for data center airflow, Hardware Haven engineered a 3D-printed cooling duct paired with an 80mm Noctua fan to direct airflow across the heatsink
1
. The V100 tested features 5120 cores, 640 Tensor Cores, and 16GB of HBM2 memory across a 4096-bit bus interface delivering 900 GB/s bandwidth2
.
Source: Wccftech
In Ollama testing using gpt-oss-20b, the Nvidia V100 achieved 130 tokens per second, significantly outpacing the RX 7800 XT which managed only 90 tokens per second despite both cards featuring 16GB of VRAM
1
. Against the RTX 3060 12GB running Google's gemma4:e4b, the V100 delivered 108 tokens per second compared to just 76 tokens per second from the newer Ampere-based card, representing a 42% advantage in token generation speed2
.The V100 consumed 293W at full performance versus 235W for the RTX 3060, translating to 0.37 tokens per second per watt for the V100 compared to 0.33 for the RTX 3060
1
. These results demonstrate that the aging AI GPU remains more efficient than modern midrange offerings in AI inference tasks, leveraging Nvidia's established software support and the Tensor Core architecture first introduced with the Volta generation2
.Related Stories
When power-limited to 100W from its 300W default, the V100 consumed 170W while maintaining 95 tokens per second output. The RTX 3060 under identical 100W limits drew 171W but produced only 68 tokens per second
1
. This resulted in the V100 achieving 0.55 tokens per second per watt versus 0.39 for the RTX 3060, marking a 41% lead in power efficiency2
. The V100's idle power draw of 45W compared to 35W on the RTX 3060 represents the primary tradeoff for users running continuous AI inference operations1
.The viral success of Hardware Haven's experiment signals potential price increases for used V100 units as demand surges
1
. The 32GB variant currently sells for $400-$500, offering expanded capacity for larger AI LLM workloads while remaining cost-competitive with new consumer hardware2
. This trend highlights how older data center silicon continues delivering value in specialized workloads, though the modding requirements and lack of display outputs limit accessibility for mainstream users. Watch for similar adapter solutions emerging for other server-grade GPUs as the AI boom drives demand for affordable VRAM-rich hardware capable of handling local AI inference at scale.Summarized by
Navi
08 Sept 2025•Technology

24 Apr 2025•Business and Economy

04 Apr 2025•Technology

1
Science and Research

2
Technology

3
Policy and Regulation
