7 Sources
7 Sources
[1]
OpenAI's New Open Models Accelerated Locally on NVIDIA GeForce RTX and RTX PRO GPUs
The groundbreaking open-weight models are now available with optimizations for RTX AI PCs. In collaboration with OpenAI, NVIDIA has optimized the company's new open-source gpt-oss models for NVIDIA GPUs, delivering smart, fast inference from the cloud to the PC. These new reasoning models enable agentic AI applications such as web search, in-depth research and many more. With the launch of gpt-oss-20b and gpt-oss-120b, OpenAI has opened cutting-edge models to millions of users. AI enthusiasts and developers can use the optimized models on NVIDIA RTX AI PCs and workstations through popular tools and frameworks like Ollama, llama.cpp and Microsoft AI Foundry Local, and expect performance of up to 256 tokens per second on the NVIDIA GeForce RTX 5090 GPU. "OpenAI showed the world what could be built on NVIDIA AI -- and now they're advancing innovation in open-source software," said Jensen Huang, founder and CEO of NVIDIA. "The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI -- all on the world's largest AI compute infrastructure." The models' release highlights NVIDIA's AI leadership from training to inference and from cloud to AI PC. Both gpt-oss-20b and gpt-oss-120b are flexible, open-weight reasoning models with chain-of-thought capabilities and adjustable reasoning effort levels using the popular mixture-of-experts architecture. The models are designed to support features like instruction-following and tool use, and were trained on NVIDIA H100 GPUs. AI developers can learn more and get started using instructions from the NVIDIA Technical Blog. These models can support up to 131,072 context lengths, among the longest available in local inference. This means the models can reason through context problems, ideal for tasks such as web search, coding assistance, document comprehension and in-depth research. The OpenAI open models are the first MXFP4 models supported on NVIDIA RTX. MXFP4 allows for high model quality, offering fast, efficient performance while requiring fewer resources compared with other precision types. The easiest way to test these models on RTX AI PCs, on GPUs with at least 24GB of VRAM, is using the new Ollama app. Ollama is popular with AI enthusiasts and developers for its ease of integration, and the new user interface (UI) includes out-of-the-box support for OpenAI's open-weight models. Ollama is fully optimized for RTX, making it ideal for consumers looking to experience the power of personal AI on their PC or workstation. Once installed, Ollama enables quick, easy chatting with the models. Simply select the model from the dropdown menu and send a message. Because Ollama is optimized for RTX, there are no additional configurations or commands required to ensure top performance on supported GPUs. Ollama's new app includes other new features, like easy support for PDF or text files within chats, multimodal support on applicable models so users can include images in their prompts, and easily customizable context lengths when working with large documents or chats. Developers can also use Ollama via command line interface or the app's software development kit (SDK) to power their applications and workflows. Enthusiasts and developers can also try the gpt-oss models on RTX AI PCs through various other applications and frameworks, all powered by RTX, on GPUs that have at least 16GB of VRAM. NVIDIA continues to collaborate with the open-source community on both llama.cpp and the GGML tensor library to optimize performance on RTX GPUs. Recent contributions include implementing CUDA Graphs to reduce overhead and adding algorithms that reduce CPU overheads. Check out the llama.cpp GitHub repository to get started. Windows developers can also access OpenAI's new models via Microsoft AI Foundry Local, currently in public preview. Foundry Local is an on-device AI inferencing solution that integrates into workflows via the command line, SDK or application programming interfaces. Foundry Local uses ONNX Runtime, optimized through CUDA, with support for NVIDIA TensorRT for RTX coming soon. Getting started is easy: install Foundry Local and invoke "Foundry model run gpt-oss-20b" in a terminal. The release of these open-source models kicks off the next wave of AI innovation from enthusiasts and developers looking to add reasoning to their AI-accelerated Windows applications.
[2]
OpenAI and NVIDIA Propel AI Innovation With New Open Models Optimized for the World's Largest AI Inference Infrastructure
NVIDIA delivers industry-leading gpt-oss-120b performance of 1.5 million tokens per second on a single NVIDIA Blackwell GB200 NVL72 rack-scale system. Two new open-weight AI reasoning models from OpenAI released today bring cutting-edge AI development directly into the hands of developers, enthusiasts, enterprises, startups and governments everywhere -- across every industry and at every scale. NVIDIA's collaboration with OpenAI on these open models -- gpt-oss-120b and gpt-oss-20b -- is a testament to the power of community-driven innovation and highlights NVIDIA's foundational role in making AI accessible worldwide. Anyone can use the models to develop breakthrough applications in generative, reasoning and physical AI, healthcare and manufacturing -- or even unlock new industries as the next industrial revolution driven by AI continues to unfold. OpenAI's new flexible, open-weight text-reasoning large language models (LLMs) were trained on NVIDIA H100 GPUs and run inference best on the hundreds of millions of GPUs running the NVIDIA CUDA platform across the globe. With software optimizations for the NVIDIA Blackwell platform, the models offer optimal inference on NVIDIA GB200 NVL72 systems, achieving 1.5 million tokens per second -- driving massive efficiency for inference. "OpenAI showed the world what could be built on NVIDIA AI -- and now they're advancing innovation in open-source software," said Jensen Huang, founder and CEO of NVIDIA. "The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI -- all on the world's largest AI compute infrastructure." NVIDIA Blackwell Delivers Advanced Reasoning As advanced reasoning models like gpt-oss generate exponentially more tokens, the demand on compute infrastructure increases dramatically. Meeting this demand calls for purpose-built AI factories powered by NVIDIA Blackwell, an architecture designed to deliver the scale, efficiency and return on investment required to run inference at the highest level. NVIDIA Blackwell includes innovations such as NVFP4 4-bit precision, which enables ultra-efficient, high-accuracy inference while significantly reducing power and memory requirements. This makes it possible to deploy trillion-parameter LLMs in real time, which can unlock billions of dollars in value for organizations. Open Development for Millions of AI Builders Worldwide NVIDIA CUDA is the world's most widely available computing infrastructure, letting users deploy and run AI models anywhere, from the powerful NVIDIA DGX Cloud platform to NVIDIA GeForce RTX- and NVIDIA RTX PRO-powered PCs and workstations. There are over 450 million NVIDIA CUDA downloads to date, and starting today, the massive community of CUDA developers gains access to these latest models, optimized to run on the NVIDIA technology stack they already use. Demonstrating their commitment to open-sourcing software, OpenAI and NVIDIA have collaborated with top open framework providers to provide model optimizations for FlashInfer, Hugging Face, llama.cpp, Ollama and vLLM, in addition to NVIDIA Tensor-RT LLM and other libraries, so developers can build with their framework of choice. A History of Collaboration, Building on Open Source Today's model releases underscore how NVIDIA's full-stack approach helps bring the world's most ambitious AI projects to the broadest user base possible. It's a story that goes back to the earliest days of NVIDIA's collaboration with OpenAI, which began in 2016 when Huang hand-delivered the first NVIDIA DGX-1 AI supercomputer to OpenAI's headquarters in San Francisco. Since then, the companies have been working together to push the boundaries of what's possible with AI, providing the core technologies and expertise needed for massive-scale training runs. And by optimizing OpenAI's gpt-oss models for NVIDIA Blackwell and RTX GPUs, along with NVIDIA's extensive software stack, NVIDIA is enabling faster, more cost-effective AI advancements for its 6.5 million developers across 250 countries using 900+ NVIDIA software development kits and AI models -- and counting. Learn more by reading the NVIDIA Technical Blog and latest installment of the NVIDIA RTX AI Garage blog series.
[3]
OpenAI and NVIDIA set global AI benchmark with gpt-oss models
These models mark a major step forward in open AI development, offering state-of-the-art performance, broad flexibility, and efficiency across a wide range of deployment environments. Trained on NVIDIA's H100 GPUs and optimized for deployment across its massive CUDA ecosystem, the models run best on Blackwell-powered GB200 NVL72 systems, achieving inference speeds of 1.5 million tokens per second. Both models are released under the Apache 2.0 license, allowing full commercial and research use. "OpenAI showed the world what could be built on NVIDIA AI -- and now they're advancing innovation in open-source software," said Jensen Huang, founder and CEO of NVIDIA. "The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI -- all on the world's largest AI compute infrastructure." The gpt-oss-120b model achieves near-parity with OpenAI's o4-mini on core reasoning benchmarks and can run on a single 80 GB GPU, while the smaller gpt-oss-20b matches the performance of o3-mini and is optimized to run on edge devices with just 16 GB of memory.
[4]
OpenAI's new open-weight reasoning model can be run locally on an RTX card but you still need a pretty beefy rig to run it
If you like the premise of AI doing, well something, in your rig, but don't much fancy feeding your information back to a data set for future use, a local LLM is likely the answer to your prayers. With OpenAI's latest model, you can do just that, assuming you have the hardware to power it. Announced in collaboration with Nvidia, gpt-oss-10b and gpt-oss-120b are both live and available to download via Nvidia's website (or via Hugging Face). You can access a cloud-based demo of it with toggleable reasoning levels via gpt-oss.com. What makes these models even more interesting than usual is that they are open-weight. Effectively, weights are a value that determines the connection between neural networks and certain parameters. Training an AI based on information you give it is partially a measure of measuring output, reading the pattern between weights, and with a technique known as backpropagation, setting those weights at middle layers to then propagate weights through different layers. Getting access to those weights gives you more information on how an AI works and how you can expect it to work given different stimuli. This allows a level of fine-tuned training, which makes sense given it can and will run locally. Though open-weight models are a bit of a rarity, with OpenAI's last open-weight model being GPT-2 in 2019, DeepSeek models are open-weight. DeepSeek was disruptive for many reasons (like surprise-releasing with an unexpectedly advanced free model), and we put OpenAI and DeepSeek head-to-head to build a PC to decide which one was better. We crowned neither the winner. Both new OpenAI models (gpt-oss-20b and gpt-oss-120b) are open-weight and they are both reasoning models, which effectively 'think' before giving an answer. This is the same sort of model that's said to be behind agentic AI, essentially breaking down broader questions and tasks into a smaller chain. It's worth noting, these models aren't intended to replace GPT-5, OpenAI's upcoming advanced cloud-based version. Gpt-oss-120b can reportedly run on an 80 GB GPU, and OpenAI reports it offers similar performance to its o4-mini model. This means RTX Pro desktops can run it, but the likelihood of you getting one on your home rig is unlikely. Gpt-oss-20b, however, can run on a 16 GB GPU, and OpenAI claims it offers similar performance to the o3-mini "on common benchmarks." You won't be left behind if you're all Team Red either, as AMD CEO Lisa Su congratulated Sam Altman on X and stated, "AMD is proud to be a Day 0 partner enabling these models to run everywhere - across cloud, edge and clients. The power of open models is clear... and this is a big step forward." The Radeon 9070 XT, or any AMD AI CPU with 32 GB of memory, can also run the latest 20b model. If you've ever wanted to run a local AI and have power to spare, these new models may be worth playing around with. The AI Max+ 395, with a 128 GB RAM configuration, can run the full-fat 120b model, and we've been playing with it on a new 128 GB desktop machine sporting that chip, and it's certainly impressive. The release of these open-weight models comes at a particularly interesting time for AI in general, too. Recently, we saw Meta expanding AI capabilities by setting up data centers in tents (data tenters, if you will) to catch up with its competition. If you don't particularly care about the AI machinations of these tech giants (I get it), Nvidia launched its AI-powered gaming assistant recently, with Microsoft's going into beta. As well as this, Razer announced one of three AI hubs opening up around the world. AI is worming its way into most facets of digital life, and should you want it to make its way into your PC even when you're offline, you now have that option.
[5]
You can now Deploy gpt-oss-20b Offline on NVIDIA GeForce RTX GPUs with 16GB VRAM
With recent updates from NVIDIA and OpenAI, you can now run sophisticated language models entirely on your own PC -- no cloud account, no monthly fees. All you need is a GeForce RTX card with at least 16 GB of VRAM. This opens the door to powerful AI capabilities directly on your desktop, whether you're analyzing documents offline, generating code, or building custom agents. For most users, the sweet spot is the 20 billion-parameter gpt-oss-20b model. It sits comfortably on any RTX card with 16 GB of graphics memory -- think RTX 4080 or better. In our tests, an RTX 5090 handled about 256 tokens per second, which feels interactive for chatbots and small-scale data processing. If you need more firepower, the larger 120 billion-parameter model -- gpt-oss-120b -- demands GPUs with 80 GB of VRAM. That's found in data-center grade hardware like NVIDIA's Blackwell GB200 NVL72, which can crunch over 1.5 million tokens per second and handle dozens of users at once. Choosing your software stack You've got three main paths to bring these models online locally: * Ollama: It's drop-in simple. Pick your model, fire up your chat and you're good to go. Ollama even lets you feed in PDFs or lengthy instructions, keeping everything in context for a coherent conversation. * Microsoft AI Foundry Local: This is for coders who want full control. It builds on ONNX Runtime and taps into CUDA and TensorRT to squeeze maximum throughput from your GPU. If you're integrating AI into a larger application, this is a rock-solid choice. * llama.cpp with NVIDIA optimizations: Geared toward open-source enthusiasts, this setup brings Flash Attention, CUDA Graph acceleration and the new MBFP4 numerical format right to your GPU drivers. It's a bit more hands-on to configure, but the performance gains can be significant. Running AI locally isn't just a neat trick -- it has real practical upsides. You're free from subscription costs and bandwidth constraints, and you never have to send sensitive data to third-party servers. That makes this setup ideal for sectors like finance, healthcare or government, where privacy and compliance are top priorities. Plus, complete local control means you can fine-tune models, build specialized agents and stitch AI directly into your internal tools and workflows. Getting started * Confirm you have an RTX-series card with at least 16 GB VRAM. * Install your toolkit of choice -- Ollama for ease, AI Foundry for development or llama.cpp for open-source performance. * Download the desired model weights (gpt-oss-20b or gpt-oss-120b) and configure your runtime. * Begin experimenting -- load documents, craft prompts or code up your own autonomous assistant.
[6]
NVIDIA's RTX GPUs Deliver Fastest AI Performance On OpenAI's Latest "gpt-oss" Models
NVIDIA & OpenAI have brought the latest gpt-oss family of AI open models to consumers, offering the highest performance on RTX GPUs. NVIDIA's RTX 5090 Delivers 250 Tokens/s Performance on OpenAI's gpt-oss 20b AI Model, PRO GPUs Also Ready For gpt-oss 120b Press Release: Today, NVIDIA announced its collaboration with OpenAI to bring the new gpt-oss family of open models to consumers, allowing state-of-the-art AI that was once exclusive to cloud data centers to run with incredible speed on RTX-powered PCs and workstations. NVIDIA founder and CEO Jensen Huang underscored the importance of this launch: "OpenAI showed the world what could be built on NVIDIA AI -- and now they're advancing innovation in open-source software," said Jensen Huang, founder and CEO of NVIDIA. "The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI -- all on the world's largest AI compute infrastructure." The launch ushers in a new generation of faster, smarter on-device AI supercharged by the horsepower of GeForce RTX GPUs and PRO GPUs. Two new variants are available, designed to serve the entire ecosystem: * The gpt-oss-20b model is optimized to run at peak performance on NVIDIA RTX AI PCs with at least 16GB of VRAM, delivering up to 250 tokens per second on an RTX 5090 GPU. * The larger gpt-oss-120b model is supported on professional workstations accelerated by NVIDIA RTX PRO GPUs. Trained on NVIDIA H100 GPUs, these are the first models to support MXFP4 precision on NVIDIA RTX, a technique that increases model quality and accuracy at no incremental performance cost compared to older methods. Both models support up to 131,072 context lengths, among the longest available in local inference. They're built on a flexible mixture-of-experts (MoE) architecture, featuring chain-of-thought capabilities and support for instruction-following and tool use. This week's RTX AI Garage highlights how AI enthusiasts and developers can get started with the new OpenAI models on NVIDIA RTX GPUs:
[7]
OpenAI GPT-OSS Models Optimized for NVIDIA RTX GPUs
NVIDIA and OpenAI have collaborated to release the gpt-oss family of open-source AI models, optimized for NVIDIA RTX GPUs. These models, gpt-oss-20b and gpt-oss-120b, bring advanced AI capabilities to consumer PCs and workstations, enabling faster and more efficient on-device AI performance. OpenAI, has unveiled its gpt-oss family of open-weight AI models, specifically optimized for NVIDIA RTX GPUs. These models -- gpt-oss-20b and gpt-oss-120b -- are designed to deliver advanced AI capabilities to both consumer-grade PCs and professional workstations. By using NVIDIA's innovative GPU technology, the models provide faster on-device performance, enhanced efficiency, and greater accessibility for developers and AI enthusiasts. The latest OpenAI models feature cutting-edge architecture, extended context lengths, and support for various AI applications, making them accessible to developers and enthusiasts through tools like Ollama, llama.cpp, and Microsoft AI Foundry Local. The easiest way to test these models on RTX AI PCs, on GPUs with at least 24GB of VRAM, is using the new Ollama app. Ollama is fully optimized for RTX, making it ideal for consumers looking to experience the power of personal AI on their PC or workstation. The gpt-oss family consists of two distinct models, each tailored to meet specific hardware requirements and performance needs: Both models support extended context lengths of up to 131,072 tokens, allowing them to handle complex reasoning tasks and process large-scale documents. This capability is particularly advantageous for applications such as legal document analysis, academic research, and other tasks requiring long-form comprehension and detailed analysis. The gpt-oss models incorporate several technological advancements that enhance their performance and functionality. These innovations include: These innovations collectively contribute to the models' ability to deliver high-speed, accurate results across a variety of use cases, making them versatile tools for developers and organizations alike. The gpt-oss models are designed to support a wide range of applications and industries, making them highly adaptable tools for diverse needs. Key use cases include: The customizable context lengths allow users to tailor the models to specific requirements, whether summarizing extensive documents or generating detailed responses to complex queries. This adaptability makes the gpt-oss models suitable for both general-purpose use and specialized applications, from enterprise workflows to individual projects. To assist adoption and integration, OpenAI and NVIDIA have provided a comprehensive suite of developer tools. These resources simplify the deployment and testing of the gpt-oss models, making sure accessibility for developers of varying expertise levels. Key tools include: These tools empower developers to experiment with advanced AI solutions without requiring extensive expertise in AI infrastructure, fostering innovation and accessibility. The gpt-oss models were trained on NVIDIA H100 GPUs, using NVIDIA's state-of-the-art AI training infrastructure. Once trained, the models are optimized for inference on NVIDIA RTX GPUs, showcasing NVIDIA's leadership in end-to-end AI technology. This approach ensures high-performance AI capabilities on both cloud-based and local devices, making advanced AI more accessible to a broader audience. Additionally, the models use CUDA Graphs, a feature that minimizes computational overhead and enhances performance. This optimization is particularly valuable for real-time applications, where speed and efficiency are critical. The gpt-oss models are open-weight, allowing developers to customize and extend their capabilities. This openness encourages innovation and collaboration within the AI community, allowing the development of tailored solutions for specific use cases. NVIDIA has also contributed to open-source frameworks such as GGML and llama.cpp, further enhancing the accessibility and performance of the gpt-oss models. These frameworks provide developers with the tools needed to optimize AI models for a variety of hardware configurations, from consumer-grade PCs to enterprise-level systems. The release of the gpt-oss models highlights a pivotal moment in the evolution of AI technology. By harnessing the power of NVIDIA RTX GPUs, these models deliver exceptional performance, flexibility, and accessibility. Their open-source nature, combined with robust developer tools, positions them as valuable assets for driving innovation across a wide range of applications. Whether for individual developers or large organizations, the gpt-oss models offer a practical and efficient solution for advancing AI-driven projects. Browse through more resources below from our in-depth content covering more areas on AI models.
Share
Share
Copy Link
OpenAI and NVIDIA collaborate to release open-weight AI models, gpt-oss-20b and gpt-oss-120b, optimized for local deployment on NVIDIA GPUs, enabling developers to run advanced AI models offline on personal computers and workstations.
In a groundbreaking move, OpenAI and NVIDIA have joined forces to release two new open-weight AI reasoning models, gpt-oss-20b and gpt-oss-120b. This collaboration marks a significant step forward in democratizing access to advanced AI technologies, allowing developers, enthusiasts, and organizations to run sophisticated language models locally on their own hardware
1
2
.Source: NVIDIA Blog
The gpt-oss-20b model, designed for broader accessibility, can run on GPUs with at least 16GB of VRAM. It offers performance comparable to OpenAI's o3-mini model on common benchmarks
3
. For more demanding applications, the gpt-oss-120b model achieves near-parity with OpenAI's o4-mini on core reasoning benchmarks and requires an 80GB GPU3
.NVIDIA has optimized these models for their hardware, showcasing impressive performance metrics:
1
.2
.Source: Guru3D.com
Users have multiple options for deploying these models locally:
1
.1
.1
5
.The release of these open-weight models under the Apache 2.0 license allows for full commercial and research use, potentially accelerating AI innovation across various sectors
3
. Jensen Huang, founder and CEO of NVIDIA, emphasized the significance of this release:"OpenAI showed the world what could be built on NVIDIA AI -- and now they're advancing innovation in open-source software. The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI -- all on the world's largest AI compute infrastructure."
2
Related Stories
Source: pcgamer
While the gpt-oss-20b model is accessible to a wider range of users with RTX GPUs featuring at least 16GB of VRAM, the more powerful gpt-oss-120b model requires more substantial hardware. AMD has also announced support for these models, with CEO Lisa Su confirming compatibility with AMD AI CPUs and GPUs
4
.Running these models locally offers several advantages, including enhanced privacy, reduced dependence on cloud services, and the ability to work offline. This makes the technology particularly attractive for sectors like finance, healthcare, and government, where data sensitivity is a primary concern
5
.As AI continues to integrate into various aspects of computing and industry, the release of these open-weight models by OpenAI and NVIDIA represents a significant milestone in making advanced AI capabilities more accessible and customizable for developers and organizations worldwide.
Summarized by
Navi
[2]
[3]
06 Aug 2025•Technology
07 Aug 2025•Technology
18 Oct 2024•Technology
1
Business and Economy
2
Technology
3
Technology