NVIDIA Unveils Granary: A Groundbreaking Multilingual Dataset for Speech AI

Reviewed byNidhi Govil

2 Sources

Share

NVIDIA releases Granary, a massive open-source dataset for multilingual speech AI, along with new AI models to support 25 European languages, addressing the challenge of limited language support in AI applications.

NVIDIA's Granary: A Leap Forward in Multilingual Speech AI

NVIDIA has unveiled Granary, a groundbreaking open-source dataset aimed at revolutionizing multilingual speech AI development. This massive corpus of audio data, encompassing around 1 million hours, is set to address the longstanding challenge of limited language support in AI language models

1

.

Expanding Language Support

Out of approximately 7,000 languages worldwide, only a small fraction are currently supported by AI language models. Granary targets this issue by providing high-quality speech recognition and translation AI capabilities for 25 European languages, including those with limited available data such as Croatian, Estonian, and Maltese

1

.

Key Components of the Release

  1. Granary Dataset: An open-source corpus containing nearly 650,000 hours of audio for speech recognition and over 350,000 hours for speech translation

    1

    .

  2. NVIDIA Canary-1b-v2: A billion-parameter model trained on Granary, optimized for high-quality transcription and translation between English and 24 other supported languages

    1

    .

  3. NVIDIA Parakeet-tdt-0.6b-v3: A streamlined 600-million-parameter model designed for real-time or large-volume transcription tasks

    1

    .

Innovative Data Processing

The development of Granary involved collaboration between NVIDIA's speech AI team, Carnegie Mellon University, and Fondazione Bruno Kessler. They utilized an innovative processing pipeline powered by the NVIDIA NeMo Speech Data Processor toolkit to transform unlabeled audio into structured, high-quality data without resource-intensive human annotation

1

.

Efficiency and Accuracy

Granary's clean, ready-to-use data allows developers to build models for transcription and translation tasks more efficiently. The research team demonstrated that compared to other popular datasets, Granary requires only about half as much training data to achieve target accuracy levels for automatic speech recognition (ASR) and automatic speech translation (AST)

1

.

Model Capabilities and Applications

Source: NVIDIA Blog

Source: NVIDIA Blog

The Canary-1b-v2 model, available under a permissive license, expands language support from four to 25 languages. It offers transcription and translation quality comparable to models three times larger while running inference up to 10 times faster

1

. At 1 billion parameters, it can run completely on-device on most next-gen flagship smartphones for speech translation on the fly

2

.

Parakeet-tdt-0.6b-v3, on the other hand, prioritizes high throughput and can transcribe 24-minute audio segments in a single inference pass. It automatically detects the input audio language and transcribes without additional prompting steps

1

2

.

Broader Impact and Availability

Source: SiliconANGLE

Source: SiliconANGLE

By sharing the methodology behind Granary and these models, NVIDIA aims to accelerate speech AI innovation globally. The dataset and models are now available on Hugging Face, with additional information on GitHub for developers interested in fine-tuning models using Granary

1

2

.

This release represents a significant step towards more inclusive speech technologies that better reflect linguistic diversity, particularly for European languages underrepresented in human-annotated datasets. It enables developers to create AI applications supporting global users with fast, accurate speech technology for various use cases, including multilingual chatbots, customer service voice agents, and near-real-time translation services

1

.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo