2 Sources
[1]
How to run OpenAI's new gpt-oss-20b LLM on your computer
All you need is 24GB of RAM, and unless you have a GPU with its own VRAM quite a lot of patience Hands On Earlier this week, OpenAI released two popular open-weight models, both named gpt-oss. Because you can download them, you can run them locally. The lighter model, gpt-oss-20b, has 21 billion parameters and requires about 16GB of free memory. The heavier model, gpt-oss-120b, has 117 billion parameters and needs 80GB of memory to run. By way of comparison, a frontier model like DeepSeek R1 has 671 billion parameters and needs about ~875GBs to run, which is why LLM developers and their partners are building massive datacenters as fast as they can. Unless you're running a high-end AI server, you probably can't deploy gpt-oss-120b on your home system, but a lot of folks have the memory necessary to work with gpt-oss-20b. Your computer needs either a GPU with at least 16GB of dedicated VRAM, or 24GB or more system memory (leaving at least 8GB for the OS and software to consume). Performance will be heavily dependent on memory bandwidth, so a graphics card with GDDR7 or GDDR6X memory (1000+ GB/s) will far outperform a typical notebook or desktop's DDR4 or DDR5 (20 - 100 GB/s). Below, we'll explain how to use the new language model for free on Windows, Linux, and macOS. We'll be using Ollama, a free client app that makes downloading and running this LLM very easy. Its easy to run the new LLM on Windows. To do so, first download and install Ollama for Windows. After you open Ollama, you'll see a field marked "Send a message" and, at bottom right, a drop-down list of available models that uses gpt-oss:20b as its default. You can choose a different model, but let's stick with this one. Enter any prompt. I started with "Write a letter" and Ollama began downloading 12.4GB worth of model data. The download is not fast. Once the download completes, you can prompt gpt-oss-20b as you wish and just click the arrow button to submit your request. You can also run Ollama from the command prompt if you don't mind going without a GUI. I recommend doing so because the CLI offers a "verbose mode" that delivers performance statistics such as the time taken to complete a query. To run Ollama from the command prompt, first enter: (If this is the first time you've run this, it will need to download the model from the internet.) Then, at the prompt, enter: Finally, enter your prompt. If you're not already in a Linux terminal, start by launching one. Then, at the command prompt, enter the following: You'll then wait as the software downloads and installs. Then enter the following command to start Ollama with gpt-oss:20b as its model. Your system will have to download about 13GB of data before you can enter your first prompt. If you're on a modern-day (M1 or higher) Mac, running gpt-oss-20b is as simple as it is on Windows. Start bydownloading and running the macOS version of the Ollama installer. Launch Ollama and make sure that gpt-oss:20b is the selected model. Now enter your prompt, click the up arrow button, and you're good to go. To see just how well gpt-oss-20b performs on a local computer, we tested two different prompts on three different devices. We first asked gpt-oss-20b to "Write a fan letter to Taylor Swift, telling her how much I love her songs" and then followed up with the much-simpler "Who was the first president of the US?" We used the following hardware to test those prompts: On the ThinkPad X1 Carbon, the performance was very poor, in large part because Ollama isn't taking advantage of its integrated graphics or neural processing unit (NPU). It took a full 10 minutes and 13 seconds to output a 600-word letter to Taylor Swift. As with all prompts for gpt-oss-20b, the system spent the first minute or two showing its reasoning in a process it calls "thinking." After that, it shows the output. Getting the simple, two-sentence answer to "who was the first president of the US" took 51 seconds. But at least our letter to Taylor was full of heartfelt emo lines like this: "It's not just the songs, Taylor; it's your authenticity. You've turned your scars into verses and your triumphs into choruses." Though it had the same speed of memory, the MacBook way outperformed the ThinkPad, completing the fan letter in 26 seconds and answering the presidential question in just three seconds. As we might expect, the RTX 6000-powered desktop delivered our letter in just six seconds and our George Washington answer in less than half a second. Overall, you can expect that if you're running this on a system with a powerful GPU or on a recent Mac, you'll get good performance. If you're using an Intel or AMD-powered laptop with integrated graphics that Ollama doesn't support, processing will be offloaded to the CPU and you may want to go for lunch after entering your prompt. Or, you might try your luck with LMStudio, another popular application for running LLMs locally on your PC. ®
[2]
How to set up and run OpenAI's 'gpt-oss-20b' open weight model locally on your Mac - 9to5Mac
This week, OpenAI released its long-awaited open weight model called gpt-oss. Part of the appeal of gpt-oss is that you can run it locally on your own hardware, including Macs with Apple silicon. Here's how to get started and what to expect. First, gpt-oss comes in two flavors: gpt-oss-20b and gpt-oss-120b. The former is described as a medium open weight model, while the latter is considered a heavy open weight model. The medium model is what Apple silicon Macs with enough resources can expect to run locally. The difference? Expect the smaller model to hallucinate more compared to the much larger model due to the data set size difference. That's the tradeoff for an otherwise faster model that's actually capable of running on high end Macs. Still, the smaller model is a neat tool that's freely available if you have a Mac with enough resources and a curiosity about running large language models locally. You should also be aware of differences with running a local model compared to, say, ChatGPT. By default, the open weight local model lacks a lot of the modern chatbot features that make ChatGPT useful. For example, responses do not contain consideration for web results that can often limit hallucinations. OpenAI recommends at least 16GB RAM to run gpt-oss-20b, but Macs with 32GB RAM or more will obviously perform better. Based on early user feedback, 16GB RAM is really the floor for what's needed to just experiment. (AI is a big reason that Apple stopped selling Macs with 8GB RAM not that long ago.) Preamble aside, actually getting started is super simple. First, install Ollama on your Mac. This is basically the window for interfacing with gpt-oss-20b. You can find the app at ollama.com/download, or download the Mac version from this download link. Next, open Terminal on your Mac and enter this command: This will prompt your Mac to download gpt-oss-20b, which uses around 15GB of disk storage. Finally, you can launch Ollama and select gpt-oss-20b as your model. You can even put Ollama in airplane mode in the app's settings panel to ensure everything is happening locally. No sign-in required. To test gpt-oss-20b, just enter a prompt into the text field and watch the model get to work. Again, hardware resources dictate model performance here. Ollama will use every resource it can when running the model, so your Mac may slow to a crawl while the model is thinking. My best Mac is a 15-inch M4 MacBook Air with 16GB RAM. While the model functions, it's a tall order even for experimentation on my machine. Responding to 'hello' took a little more than five minutes. Responding to 'who was the 13th president' took a little longer at around 43 minutes. You really do want more RAM if you plan to spend more than a few minutes experimenting. Decide you want to remove the local model and reclaim that disk space? Enter this terminal command:
Share
Copy Link
OpenAI releases gpt-oss-20b, an open-weight language model that can be run locally on personal computers with sufficient hardware resources. The article explores the setup process, hardware requirements, and performance across different devices.
OpenAI has made a significant move in the AI landscape by releasing two open-weight models named gpt-oss 1. The lighter model, gpt-oss-20b, boasts 21 billion parameters and requires about 16GB of free memory, while the heavier gpt-oss-120b has 117 billion parameters and needs 80GB of memory to run 1.
Source: The Register
To run gpt-oss-20b locally, users need either a GPU with at least 16GB of dedicated VRAM or 24GB or more system memory 1. Performance heavily depends on memory bandwidth, with graphics cards featuring GDDR7 or GDDR6X memory significantly outperforming typical notebook or desktop DDR4 or DDR5 1.
Tests conducted on various devices revealed significant performance differences:
Setting up gpt-oss-20b is relatively straightforward using Ollama, a free client app that simplifies the download and running process 12. The setup process varies slightly for different operating systems:
While running gpt-oss-20b locally offers exciting possibilities, users should be aware of certain limitations:
The release of gpt-oss-20b represents a significant step towards making advanced AI models more accessible to individual users and researchers. By allowing local deployment, OpenAI is enabling a wider range of experiments and applications, potentially accelerating AI innovation and understanding 12.
However, the hardware requirements and performance limitations highlight the ongoing challenges in democratizing access to cutting-edge AI technologies. As the field progresses, we may see further developments aimed at optimizing these models for more widespread use on consumer-grade hardware.
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to challenge competitors in the premium handset market.
20 Sources
Technology
7 hrs ago
20 Sources
Technology
7 hrs ago
Google's Pixel 10 series introduces groundbreaking AI features, including Magic Cue, Camera Coach, and Voice Translate, powered by the new Tensor G5 chip and Gemini Nano model.
12 Sources
Technology
8 hrs ago
12 Sources
Technology
8 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
15 hrs ago
6 Sources
Technology
15 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, enhanced AI features, and improved health tracking capabilities.
17 Sources
Technology
7 hrs ago
17 Sources
Technology
7 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
7 hrs ago
7 Sources
Technology
7 hrs ago