NVIDIA and Stability AI Optimize Stable Diffusion 3.5 for RTX GPUs, Reducing VRAM Usage by 40%

Reviewed byNidhi Govil

2 Sources

Share

NVIDIA collaborates with Stability AI to optimize Stable Diffusion 3.5, reducing VRAM requirements and doubling performance on RTX GPUs. This breakthrough enables more widespread use of the powerful AI image generation model.

NVIDIA and Stability AI Collaborate on Stable Diffusion 3.5 Optimization

NVIDIA has partnered with Stability AI to significantly enhance the performance and accessibility of Stable Diffusion 3.5, one of the world's most popular AI image generation models. This collaboration has resulted in substantial improvements in both efficiency and VRAM usage, making the model more accessible to a wider range of NVIDIA RTX GPUs

1

2

.

Reducing VRAM Requirements

Source: TweakTown

Source: TweakTown

The base Stable Diffusion 3.5 Large model initially required over 18GB of VRAM, limiting its use to high-end GPUs. Through quantization techniques, NVIDIA and Stability AI have managed to reduce the VRAM requirement by 40%, bringing it down to 11GB

1

. This optimization allows the model to run on five GeForce RTX 50 Series GPUs instead of just one, significantly expanding its potential user base

2

.

Performance Enhancements with TensorRT

In addition to VRAM reduction, NVIDIA has applied its TensorRT software development kit (SDK) to optimize Stable Diffusion 3.5 Large and Medium models. These optimizations take full advantage of the Tensor Cores in RTX GPUs, resulting in impressive performance gains

1

:

  1. SD3.5 Large: 2.3x performance boost compared to BF16 PyTorch
  2. SD3.5 Medium: 1.7x performance increase over BF16 PyTorch

TensorRT for RTX: Reimagined for AI PCs

NVIDIA has also introduced a new version of TensorRT specifically designed for RTX AI PCs. This updated SDK offers several key improvements

1

:

  1. Just-in-time (JIT) on-device engine building
  2. 8x smaller package size
  3. Seamless AI deployment to over 100 million RTX AI PCs

The new TensorRT for RTX is now available as a standalone SDK for developers, allowing for easier integration and optimization of AI models on RTX hardware

1

.

Implications for AI Development and Deployment

These advancements have significant implications for AI developers and end-users:

  1. Wider accessibility: More GPUs can now run complex AI models like Stable Diffusion 3.5
  2. Improved efficiency: Reduced VRAM usage and increased performance allow for faster and more resource-efficient AI operations
  3. Easier deployment: The new TensorRT for RTX SDK simplifies the process of optimizing and deploying AI models on RTX hardware

Future Developments

NVIDIA and Stability AI are planning to release Stable Diffusion 3.5 as an NVIDIA NIM microservice in July, further simplifying access and deployment for creators and developers across various applications

1

.

As AI models continue to grow in complexity and capability, optimizations like these will play a crucial role in making advanced AI tools more accessible to a broader range of users and devices. With NVIDIA's announcement of over 100 million RTX AI PCs worldwide, the potential impact of these improvements is substantial

2

.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo