The best AI speech recognition, translation, and multilingual dubbing solution 🚀
한국어 ∙ English ∙ 中文简体 ∙ 中文繁體 ∙ 日本語 ∙ Deutsch ∙ Español ∙ Português
Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.
* 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
* 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
* 📢 Multilingual text-to-speech: Edge-TTS, kokoro (Paid version includes Azure TTS)
* 🎥 YouTube processing & audio extraction: yt-dlp
* 🌍 Instant translation for 100+ languages: Deep-Translator (Paid version includes Azure Translator)
A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.
* Due to WeConnect development work, Voice-Pro development and updates are not possible for the time being.
* We have made all Voice-Pro code open source and completely free. Voice-Pro can now be freely distributed and modified by anyone.
* It works well on Windows with NVIDIA GPU. Operation on Mac and Linux has not been verified.
* Please leave your requests on the or pages.
* Troubleshooting: In most cases, issues can be resolved by deleting the folder and then running followed by .
* YouTube video downloads & audio extraction
* Voice separation with Demucs
* Supports 100+ languages for speech recognition & translation
* Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
* Text-to-Speech:
* Instant speech recognition
* Multilingual translation on the fly
* Customizable audio inputs
* All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
* Supports all ffmpeg-compatible formats
* Output options: WAV, FLAC, MP3
* Subtitles & recognition for 100+ languages
* TTS with speed, volume, & pitch controls
* Subtitle-focused: 90+ languages
* Video-integrated subtitle display
* Word-level highlighting & denoise options
* Translation for 100+ languages
* Supports subtitle files (ASS, SSA, SRT, etc.)
* Real-time voice recognition & translation
* Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
* Celeb voice podcasts & multilingual support
* Please request the voice you want to add on the Issues page. Issues
* OS: Windows 10/11 (64-bit), Linux, Mac
* GPU: NVIDIA with CUDA 12.4 (recommended)
* VRAM: 4GB+ (8GB+ preferred)
* RAM: 4GB+
* Storage: 20GB+ free space
* Internet: Required
Install Voice-Pro with ease using configure.bat and start.bat (use configure.sh and start.sh on Mac/Linux).
* Clone or download the latest release (Source code (zip)) from
* 🚀 update.bat: Refreshes Python environment (faster than reinstall)
* Run uninstall.bat or delete the folder (portable install)
* Close the Windows-Commnad window and run start.bat again.
* Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7870) in the address bar.
* Check the GPU memory status in Windows Task Manager - Performance tab.
* Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
* Set Compute Type to int type. The float type has better quality, but requires more GPU memory.
* The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
* Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
* If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.
* Due to WeConnect development work, there will be no Voice-Pro updates for the time being.
* All Voice-Pro code has been made open source. It is now completely free to use.
* WeConnect is a communication platform for global cultural exchange.
The following table lists SaaS platforms supporting subtitling, translation, and text-to-speech (TTS/dubbing) functionalities. Costs are calculated for processing a 60-minute Korean video, including subtitle generation, English translation, and English dubbing, based on the latest available pricing data as of April 15, 2025.
* Maestra: Premium Plan ($158/month, 1200 credits). 60-min video: 60 credits (subtitles) + 60 credits (translation) + 60 credits (dubbing) = 180 credits. Cost = (180/1200) * $158 = $23.70.
* Kapwing: Pro plan (~$24/month, limited minutes). Estimated $0.50~$0.67/min for subtitles+translation+dubbing (based on per-minute pricing trends). 60-min cost: $30~$40. Exact pricing requires confirmation.
* VEED.IO: Pro plan (~$24/month). Subtitles+translation estimated at $0.40~$0.60/min. No TTS, so partial processing. 60-min cost: $24~$36. Confirm at veed.io.
* HappyScribe: Pay-as-you-go (~$0.20/min transcription, $0.20/min translation, $0.20/min dubbing). 60-min cost: $36~$48 (assuming combined services). Confirm at happyscribe.com.
* Sonix: Standard plan (~$10/hour transcription, additional for translation/dubbing). Estimated $0.50~$0.67/min total. 60-min cost: $30~$40. Confirm at sonix.ai.
* Descript: Creator plan (~$24/month, limited hours). Estimated $0.60~$0.80/min for subtitles+translation+dubbing. 60-min cost: $36~$48. Confirm at descript.com.
* AppTek: Custom pricing for enterprise. No public per-minute rates. Contact apptek.ai for quotes.
* Transkriptor: Pay-as-you-go ($0.05~$0.10/min transcription, similar for translation). No TTS, so partial processing. 60-min cost: $12~$18. Confirm at transkriptor.com.
* Cost for 60-min Video: Costs are approximate and assume processing a 60-minute Korean video for subtitles, English translation, and English dubbing (where available). Platforms without TTS (e.g., VEED.IO, Transkriptor) reflect partial processing costs.
* Language Support: Most platforms support Korean and English. Verify specific language availability on their websites.
* Use Cases:
* Pricing Updates: Pricing may vary due to plan changes or promotions. Check official websites for the latest details.
* For contributions or specific use case recommendations, open an issue or submit a pull request in this repository!
Hello, I'm David from the Voice-Pro team. Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently. We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content.
Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team.
Thank you, ABUS Customer Service
* If you want to participate in and help us with this project, feel free to create an Issues
* If something goes wrong, please submit a Pull requests to improve this project.
* Any type of contribution is welcome.
* For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. ([email protected])."
* If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐
* You can support Voice-Pro with a donation here:
* Email: [email protected]
* Homepage (Korean): https://www.wctokyoseoul.com
* Paid Version Purchase: Shopify (Global), Naver (Korean)