Sarvam AI outperforms Gemini and ChatGPT on India-centric tasks with homegrown models

4 Sources

Share

Bengaluru-based Sarvam AI has delivered breakthrough results with its Vision OCR tool and Bulbul V3 text-to-speech engine, beating global systems from Google and OpenAI on benchmarks focused on Indian languages. The startup's performance marks a significant moment for India's sovereign AI ambitions and linguistic diversity challenges.

Sarvam AI Outperformed Global Giants on India-Centric Tasks

Bengaluru-based Sarvam AI has captured attention across India's technology ecosystem after its AI models delivered superior performance compared to systems from Google and OpenAI on benchmarks designed for Indian languages. The startup's Sarvam Vision OCR tool achieved 84.3% accuracy on document recognition tasks, while its advanced version reached 93.28% accuracy, outperforming Gemini and ChatGPT on specific India-centric tasks

2

. Founded in 2023 by Dr Vivek Raghavan and Dr Pratyush Kumar, Sarvam AI set out to create compact, efficient foundational models capable of running on phones and modest infrastructure while effectively handling Indian linguistic diversity

1

. This homegrown AI revolution demonstrates that locally built systems can compete with major global technology firms when tailored for specific linguistic contexts.

Sarvam Vision Delivers Breakthrough in Document Intelligence

The 3-billion-parameter Sarvam Vision model focuses on OCR, layout understanding, and visual reasoning across India's various languages and scripts

4

. In evaluations against widely used global models from leading AI research labs, the Vision tool registered higher accuracy on benchmarks for Indian language document recognition

1

. The vision-language model interprets nested tables, scene-based text, and chart data across various Indian language scripts and layouts, addressing real-world challenges in sectors such as banking, education, and public services where paper-based and multilingual communication remains common. Sarvam AI has made its APIs free for developers through February 2026, signaling confidence in the model's capabilities

4

.

Source: Analytics Insight

Source: Analytics Insight

Bulbul V3 Text-to-Speech Engine Expands to 11 Indian Languages

Sarvam AI's Bulbul V3 model, launched ahead of the India-AI Impact Summit 2026, generates expressive text-to-speech output across 11 Indian languages, with plans to expand to 22 languages

2

. Independent blind listening studies showed that Bulbul V3 handled numerals, named entities, and code-mixed text more effectively than several competitive systems, including those from OpenAI

1

. While ElevenLabs ranked highest in overall sound quality, Bulbul V3 beat competitors like Cartesia Sonic-3 in general evaluations and performed best in telephony quality tests

3

. The text-to-speech engine supports real-time audio output, natural pauses, emphasis, and pace, making it suitable for live conversations, call centres, and interactive applications. The model also includes consent-based voice cloning with safeguards built for large enterprise use, accessible through the Sarvam Dashboard with unlimited API usage available until February 28, 2026

3

.

Sovereign AI Ambitions and the IndiaAI Mission

Sarvam AI's approach aligns with growing interest in sovereign AI solutions built within India to meet local regulatory and privacy expectations. The startup is among 12 entities selected under the Rs 10,300 crore IndiaAI Mission, where sovereign Indian AI models are expected to be unveiled

3

. This mandate explains the company's focus on Indic OCR, multilingual voice synthesis, and document intelligence—the foundational infrastructure for governance, fintech, and citizen services

4

. By focusing on India's unique challenges and strengths, this philosophy contrasts with dominant global AI narratives that prioritise breadth of capability over local specificity. Supporters argue this focus could reduce dependency on foreign AI infrastructure while driving innovation attuned to cultural and linguistic diversity

1

.

Completing the Indic Stack with Speech Recognition

Sarvam Audio, launched earlier in the same week as Vision and Bulbul V3, extends speech recognition across 22 Indian languages with strong performance on accents, noise, and multi-speaker environments

4

. The rapid-fire trio of models spanning vision, speech recognition, and text-to-speech represents a comprehensive Indic stack designed to address real needs that large generic systems sometimes overlook. Voice technologies that speak and understand India's vernacular languages can broaden the reach of digital services, especially in regions where English is not predominant, making AI more inclusive and relevant for a broader population

1

. Sarvam AI joined the AI Alliance in 2024, announcing itself as a serious player from India on the world stage

4

. The real test will be adoption—if government services, enterprises, and developers begin integrating these models at scale, Sarvam could become the reference layer for India's AI ecosystem, much like UPI did for fintech

4

.

Source: Digit

Source: Digit

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo