Anurag is an experienced journalist and author who's been covering tech for the past 5 years, with a focus on Windows, Android, and Apple. He's written for sites like Android Police, Neowin, Dexerto, and MakeTechEasier. Anurag's always pumped about tech and loves getting his hands on the latest gadgets. When he's not procrastinating, you'll probably find him catching the newest movies in theaters or scrolling through Twitter from his bed.
I don't enjoy typing out long voice recordings, whether it's an interview clip, a meeting recap, or a rough idea I dictated while walking. Manually turning audio into text is slow, and absolutely no one should be doing it in the big year 2026. There are plenty of AI transcription tools that solve that problem, but they introduce another one. Your recordings get uploaded, processed, and stored on infrastructure that you do not control. As much as I dislike typing out voice notes, I would still rather not share them with just anyone. Instead of relying on those services, I run Whisper locally. It is an open-source speech recognition model released in 2022 under the MIT license, and it allows you to transcribe audio completely offline.
This free Obsidian plugin turns my voice into notes, and it all runs on my computer
Using Whisper plugin with its local LLM, I use Obsidian to transcribe my voice notes and audio files to text on my computer.
Posts 3
By Samir Makwana
Why do I use Whisper?
It runs locally and is open-source
Whisper was trained on 680,000 hours of multilingual audio and performs really well across accents, background noise, and mixed-language conversations. It is one of the few tools from OpenAI that is actually open, which means it can be downloaded and executed on your own machine. That also means your audio files never leave your device, and you do not need to create an account. Heck, it does not even have a GUI. You just run it in the terminal.
Tools such as Otter.ai and Fireflies.ai require you to upload recordings to their servers for processing. Even if they offer encryption and compliance guarantees, you are still placing trust in systems you cannot audit yourself. Running Whisper locally removes that layer of dependency entirely.
Now that I have mentioned those tools, here is my issue with how intrusive they can be. When someone adds them to a meeting, they often gain access to participant details and start sending follow-up emails, summaries, and reminders that not everyone explicitly signed up for. You might not have invited the tool yourself, yet you still end up inside its workflow. On top of that, both companies have faced scrutiny in the past over data handling and privacy concerns, which makes the whole arrangement harder to ignore.
Coming back to Whisper, it supports 99 languages and can automatically detect which language is being spoken. It can also translate speech into another language during transcription. Many cloud tools focus heavily on English, but Whisper handles multilingual audio without additional configuration.
Technically, Whisper uses a Transformer-based neural network architecture and is available in multiple model sizes. The smallest models are lightweight and fast, while the largest model, which contains around 1.5 billion parameters, delivers higher accuracy but requires significantly more memory and GPU resources. There are also English-only variants that perform slightly better on English audio compared to the multilingual versions. You can choose the model that matches your hardware and your accuracy needs.
Setting up Whisper locally on a PC
It is simpler than it sounds
Running Whisper locally does require some setup, but it is manageable. Since it is distributed as a Python library, you need Python and pip installed on your system. Once that is in place, you can install Whisper directly from its GitHub repository using:
pip install git+https://github.com/openai/whisper.git
Whisper also depends on FFmpeg for handling audio and video formats. On Debian or Ubuntu, you can install it with:
sudo apt update && sudo apt install ffmpeg
On macOS, Homebrew makes it straightforward with brew install ffmpeg. Windows users can install it using Chocolatey with choco install ffmpeg, assuming Chocolatey is already set up.
After installation, transcribing a file is done directly from the command line. Navigate to the folder containing your recording and run:
whisper --model base --language en --task transcribe your_audio_file.mp3
Replace the filename with your actual audio file. Whisper processes the recording locally and generates a transcript saved to your machine. If you prefer using it in a Python script, you can load the model and call model.transcribe() on the file programmatically.
It is worth noting that larger models demand more processing power and memory. If your system does not have a capable GPU, smaller models like base or small offer a practical balance between speed and accuracy.
XDA Report: Subscribe and never miss what matters
Stay ahead in the world of Windows, software, PC components, and more with XDA
Subscribe
By subscribing, you agree to receive newsletter and marketing emails, and accept our Terms of Use and Privacy Policy. You can unsubscribe anytime.
Once installed, Whisper supports a broad range of formats, including WAV, MP3, M4A, FLAC, and even video files such as MP4 or MKV. Thanks to its integration with FFmpeg, you do not need to extract audio manually from video files before processing.
When you run the transcription command, Whisper displays progress in the terminal and outputs text segments with timestamps. After completion, the transcript is saved locally in formats such as TXT or SRT, depending on your configuration.
By default, it uses a balanced model designed to provide reasonable accuracy without excessive resource usage. However, you can adjust model size, specify language, or modify output behavior depending on your workflow.
Self-hosted tools give you control
We have normalized uploading everything to someone else's infrastructure. Notes, recordings, documents, and conversations often pass through multiple third-party systems before we even think about it. Avoiding that entirely may not be realistic, especially if you rely on mainstream cloud platforms for other parts of your workflow. But transcription does not have to be one of those compromises.
I set up the perfect voice pipeline in Home Assistant, and here's how I did it
With custom add-ons available for free, Home Assistant gives me the power to control my entire voice pipeline.
Posts 2
By Adam Conway