NVIDIA Unveils Granary: A Groundbreaking Multilingual Dataset for Speech AI

NVIDIA's Granary: A Leap Forward in Multilingual Speech AI

NVIDIA has unveiled Granary, a groundbreaking open-source dataset aimed at revolutionizing multilingual speech AI development. This massive corpus of audio data, encompassing around 1 million hours, is set to address the longstanding challenge of limited language support in AI language models 1

Expanding Language Support

Out of approximately 7,000 languages worldwide, only a small fraction are currently supported by AI language models. Granary targets this issue by providing high-quality speech recognition and translation AI capabilities for 25 European languages, including those with limited available data such as Croatian, Estonian, and Maltese 1

Key Components of the Release

Granary Dataset: An open-source corpus containing nearly 650,000 hours of audio for speech recognition and over 350,000 hours for speech translation 1
1
.
NVIDIA Canary-1b-v2: A billion-parameter model trained on Granary, optimized for high-quality transcription and translation between English and 24 other supported languages 1
1
.
NVIDIA Parakeet-tdt-0.6b-v3: A streamlined 600-million-parameter model designed for real-time or large-volume transcription tasks 1
1
.

Innovative Data Processing

The development of Granary involved collaboration between NVIDIA's speech AI team, Carnegie Mellon University, and Fondazione Bruno Kessler. They utilized an innovative processing pipeline powered by the NVIDIA NeMo Speech Data Processor toolkit to transform unlabeled audio into structured, high-quality data without resource-intensive human annotation 1

Efficiency and Accuracy

Granary's clean, ready-to-use data allows developers to build models for transcription and translation tasks more efficiently. The research team demonstrated that compared to other popular datasets, Granary requires only about half as much training data to achieve target accuracy levels for automatic speech recognition (ASR) and automatic speech translation (AST) 1

Model Capabilities and Applications

Source: NVIDIA

The Canary-1b-v2 model, available under a permissive license, expands language support from four to 25 languages. It offers transcription and translation quality comparable to models three times larger while running inference up to 10 times faster 1

. At 1 billion parameters, it can run completely on-device on most next-gen flagship smartphones for speech translation on the fly 2

Parakeet-tdt-0.6b-v3, on the other hand, prioritizes high throughput and can transcribe 24-minute audio segments in a single inference pass. It automatically detects the input audio language and transcribes without additional prompting steps 1

Broader Impact and Availability

Source: SiliconANGLE

By sharing the methodology behind Granary and these models, NVIDIA aims to accelerate speech AI innovation globally. The dataset and models are now available on Hugging Face, with additional information on GitHub for developers interested in fine-tuning models using Granary 1

This release represents a significant step towards more inclusive speech technologies that better reflect linguistic diversity, particularly for European languages underrepresented in human-annotated datasets. It enables developers to create AI applications supporting global users with fast, accurate speech technology for various use cases, including multilingual chatbots, customer service voice agents, and near-real-time translation services 1

NVIDIA Unveils Granary: A Groundbreaking Multilingual Dataset for Speech AI

NVIDIA's Granary: A Leap Forward in Multilingual Speech AI

Expanding Language Support

Key Components of the Release

Innovative Data Processing

Efficiency and Accuracy

Model Capabilities and Applications

Broader Impact and Availability

References

Now We're Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

Nvidia releases massive, high-quality AI-ready European language dataset and tools - SiliconANGLE

Related Stories

NVIDIA and DataStax Revolutionize Multilingual AI with NeMo Retriever Microservices

NVIDIA Unveils Massive Open-Source AI Model to Rival GPT-4

Meta Releases Omnilingual ASR: Open-Source Speech Recognition for 1,600+ Languages

Recent Highlights

Google launches Gemini 3 Flash as default AI model, delivering speed with Pro-grade reasoning

OpenAI launches GPT Image 1.5 as AI image generator war with Google intensifies

OpenAI launches ChatGPT app store, opening doors for third-party developers to build AI-powered apps

Recent Highlights

Today's Top Stories

Yann LeCun vs Demis Hassabis: Public Clash Over General Intelligence Reopens AGI Debate

AI advertising is going invisible as ChatGPT and Meta build hyper-personalized ad systems

Nvidia's largest Southeast Asia partner faces probe over alleged chip smuggling to China

Anthropic releases Bloom, an open-source tool to test AI models for bias and dangerous behaviors