4 Sources
4 Sources
[1]
India's homegrown AI revolution: How Sarvam AI outperformed global giants in key India-Centric tasks
Bengaluru-based Sarvam AI is redefining India's role in artificial intelligence by building foundational models that excel on tasks tailored for the nation's linguistic diversity. In recent evaluations, its OCR tool and Indic voice synthesis systems registered performance that beat well-known systems from global players on benchmarks focused on Indian languages. In an era dominated by large artificial intelligence systems developed by major global technology firms, the emergence of a locally built AI suite that performs competitively on India-centric benchmarks marks a significant moment for the country's technology ecosystem. Bengaluru-based Sarvam AI has drawn attention after its tools delivered strong results in tasks that matter for real-world Indian applications. These accomplishments have sparked discussion among technologists, business leaders, and users about what it means to build artificial intelligence rooted in local language and use-case needs. Founded in 2023 by a team including Dr Vivek Raghavan and Dr Pratyush Kumar Sarvam, AI set out to create compact, efficient models capable of running on phones and modest infrastructure while effectively handling India's complex linguistic landscape. At its core, the company focuses on language models, speech processing, and optical character recognition systems tailored for Indian languages rather than exclusively on general-purpose large language models that require massive cloud resources. One of the defining achievements reported recently involves the performance of Sarvam AI's Vision tool, an optical character recognition model designed to read and interpret documents in native Indian scripts. In evaluations against widely used global models, including those offered by leading AI research labs, the Vision model registered higher accuracy on benchmarks for Indian language document recognition. For many use cases across government and business where understanding diverse formats of handwritten text and mixed-language content is essential, this represents a practical breakthrough. Alongside document recognition, Sarvam AI also highlighted progress in voice synthesis technologies, particularly with its Bulbul V3 model designed to generate expressive text-to-speech output in a range of Indian languages. Independent tests, including blind listening studies and automated error analysis, showed that Bulbul V3 handled numerals named entities and code-mixed text more effectively than several competitive systems. This focus on quality and clarity of synthesised speech is important for applications such as voice agents, customer support systems and accessibility tools where natural engagement matters. While media coverage and company announcements note that these results reflect performance on specific tasks rather than a comprehensive comparison across all capabilities, the outcomes have nonetheless captured attention because they demonstrate that tailored engineering and careful data curation can deliver strong results for complex localised problems. Experts have emphasised that successes on benchmarks do not automatically equate to overall superiority across every AI domain, but do validate the potential of focused models to address real needs that large generic systems sometimes overlook. The practical implications for Indian users are significant. Tools that can reliably recognise text across diverse document layouts and languages can streamline workflows in sectors such as banking, education and public services, where paper-based and multilingual communication is common. Similarly, voice technologies that speak and understand India's vernacular languages can broaden the reach of digital services, especially in regions where English is not predominant. These innovations promise to make AI more inclusive and relevant for a broader population. Sarvam AI's approach also aligns with the growing interest in what is often described as sovereign AI solutions that are built within the country and designed to meet local regulatory and privacy expectations. By focusing on India's unique challenges and strengths, this philosophy contrasts with dominant global AI narratives that tend to prioritise breadth of capability and scale over local specificity. Supporters argue that this focus could reduce dependency on foreign AI infrastructure while driving innovation that is attuned to cultural and linguistic diversity. Questions remain about how these localised AI systems will evolve and compete with global offerings in broader tasks beyond document reading and speech synthesis. Independent benchmarking and adoption by third parties will be key indicators of how far the technology can scale. For now, Sarvam AI's results have provided strong evidence that targeted solutions built with a deep understanding of specific linguistic contexts can generate performance that resonates with users and stakeholders across India's rapidly growing AI community. Sarvam AI's recent achievements highlight a shift in the artificial intelligence landscape in India from primarily adopting global models to innovating locally for tasks that matter most within the country. As enterprises, governments and developers seek tools that understand India's linguistic and cultural diversity, the emergence of capable indigenous AI solutions opens new opportunities for digital transformation and inclusion.
[2]
Sarvam AI Outshines Gemini and ChatGPT with 84.3% OCR Accuracy, Global Eyes on India
Sarvam AI Gains Global Backing as Vision Hits 93.28% Accuracy and Bulbul V3 Expands to 11 Indian Languages India takes a major step forward in artificial intelligence at a global scale. Sarvam AI, a Bengaluru-based startup, has surprised the tech community with AI models that perform better than and ChatGPT on specific India-focused tasks. Sarvam AI has delivered a breakthrough that changes long-held perceptions. This startup has shown that India can build world-class artificial intelligence models from the ground up. This achievement has shifted attention toward India as a serious AI innovator.
[3]
Better than Google Gemini and ChatGPT? Indian startup Sarvam AI claims to beat global models
Launched ahead of the India-AI Impact Summit 2026, Bulbul V3 strengthens India's homegrown AI ecosystem, with real-time speech, enterprise features, and consent-based voice cloning. Bengaluru-based startup Sarvam AI has recently launched Bulbul V3, which is a new text-to-speech model designed for Indian languages, accents, and real-world use cases. The company says the model delivers more natural and stable speech than global rivals and has already outperformed tools from Google and OpenAI in key evaluations. With Bulbul V3, Sarvam is positioning itself as a serious player in voice AI, an area long dominated by US-based companies. Moreover, Bulbul V3 is one of several tools Sarvam has launched in a 14-day rollout ahead of the India-AI Impact Summit 2026 in New Delhi. The startup is also among the 12 entities selected under the Rs 10,300 crore India AI Mission, where sovereign Indian AI models are expected to be unveiled later this month. Also read: Google Pixel 10a India launch soon: Price, pre-order date, specs and more Sarvam says Bulbul V3 is designed around the realities of Indian speech. People often mix languages in a single sentence, pronounce the same word differently across regions, and use names or expressions that global systems struggle to handle. According to the company, Bulbul V3 manages these challenges without breaking flow or meaning. As per the reports, the model is capable of generating speech with natural pauses, emphasis, and pace. Furthermore, it also supports real-time audio output, which is useful for live conversations, call centres, and interactive apps. Sarvam says that the fast response time is highly important in such settings, as delayed responses can hurt the user experience. Also read: OpenAI co-founder says agentic engineering is the next big thing in AI coding Bulbul V3 was tested by an independent third party through blind listening studies across 11 languages. Human listeners compared audio clips from different AI models without knowing which system produced them. While ElevenLabs ranked highest in overall sound quality, Bulbul V3 beat competitors like Cartesia Sonic-3 in general evaluations. Sarvam also said Bulbul V3 performed best in telephony quality tests, which are important for phone-based services. The model showed fewer skipped words and mispronunciations compared to rivals. In related document and speech tasks through Sarvam Vision, the company has earlier claimed better results than Google Gemini and ChatGPT on certain benchmarks. Also read: Apple iOS 26.4 beta may release this month with smarter Siri: Check details The new model also allows users to create custom AI voices through consent-based voice cloning. Sarvam says the feature includes safeguards and is built for large enterprise use. Developers can access the model through the Sarvam Dashboard, with unlimited API usage available until February 28, 2026.
[4]
Bulbul to Vision: Sarvam AI challenges global models with Indic stack
Indigenous models push India toward sovereign AI leadership If India's AI ambitions needed a pre-India AI Impact Summit flex, Sarvam AI delivered it loud and clear. Days before the India AI Impact Summit 2026 kicks off in New Delhi, the Bengaluru-based startup has rolled out a rapid-fire trio of models spanning vision, speech recognition and text-to-speech. The timing of the announcements from Sarvam AI isn't accidental. It's a signal that India's indigenous AI stack is serious about earning its seat at the global table. At the centre of the announced updates is Sarvam Vision. It's a 3-billion-parameter vision-language model built around multilingual document intelligence. According to Sarvam AI, Vision is designed to better understand images, charts and scanned documents across India's various languages. Specifically, the Sarvam Vision model focuses on OCR, layout understanding and visual reasoning, according to the release notes. What's new here isn't just another VLM (vision language model), but a VLM that claims to be distinctly tuned for making sense of the haphazard maze of Indian paperwork and public-facing digital infrastructure. Sarvam Vision claims leading performance on global OCR and document benchmarks, while outperforming models like Gemini-class systems and other OCR engines on Indian language accuracy - especially in low-resource languages. The Sarvam Vision model is capable of interpreting nested tables, scene-based text and chart data across various Indian language scripts and layouts, and to prove this Sarvam has made its APIs free for developers through February 2026 - which goes to show just how confident they are about the model's performance. The second key announcement from Sarvam AI is Bulbul V3, their newest text-to-speech engine. Built for over a dozen Indian languages (expanding to 22), this text-to-voice model focuses on production-grade voice that challenges the likes of ElevenLabs, rather than something that's just demo-friendly. Sarvam AI highlights improvements in Bulbul V3 with respect to natural speech generation across regional accents and scripts, and it's billed as a major step forward for Indic voice generation and synthesis. Also read: India AI Impact Summit 2026: Top tech leaders set to attend Sarvam claims Bulbul V3 outperforms several global competitors in robustness and telephony-grade scenarios, where speech is mixed with deliberate numeric pronunciations for added complexity. Add real-time streaming, voice cloning and 35+ voice options, and everything from customer support in call centres to conversational agents in public services at scale is possible with Sarvam AI's Bulbul V3 - not just AI-based narration in YouTube videos. Completing the stack is Sarvam Audio, launched earlier in the same week, extending speech recognition across 22 Indian languages with strong performance on accents, noise and multi-speaker environments. Sarvam has already joined the AI Alliance back in 2024, announcing itself as a serious AI player from India on the world stage. What's unmistakable with these announcements is that Sarvam isn't chasing ChatGPT users but trying to solve for true India-scale usability. This matters because Sarvam sits at the heart of India's sovereign AI ambitions. In case you didn't know, Sarvam AI has already been selected under the IndiaAI Mission to help build a homegrown foundational model for the country, where achieving linguistic diversity and strategic autonomy is key. That mandate explains the company's focus on Indic OCR, multilingual voice and document intelligence - which is undoubtedly the plumbing of governance, fintech and citizen services. In practical terms, Sarvam's latest launches push India closer to owning its full AI stack - from speech and vision to foundational models - rather than renting intelligence from Silicon Valley. The real test will be adoption. If government services, enterprises and developers begin integrating these models at scale, Sarvam could become the reference layer for India's AI ecosystem - much like UPI did for fintech.
Share
Share
Copy Link
Bengaluru-based Sarvam AI has delivered breakthrough results with its Vision OCR tool and Bulbul V3 text-to-speech engine, beating global systems from Google and OpenAI on benchmarks focused on Indian languages. The startup's performance marks a significant moment for India's sovereign AI ambitions and linguistic diversity challenges.
Bengaluru-based Sarvam AI has captured attention across India's technology ecosystem after its AI models delivered superior performance compared to systems from Google and OpenAI on benchmarks designed for Indian languages. The startup's Sarvam Vision OCR tool achieved 84.3% accuracy on document recognition tasks, while its advanced version reached 93.28% accuracy, outperforming Gemini and ChatGPT on specific India-centric tasks
2
. Founded in 2023 by Dr Vivek Raghavan and Dr Pratyush Kumar, Sarvam AI set out to create compact, efficient foundational models capable of running on phones and modest infrastructure while effectively handling Indian linguistic diversity1
. This homegrown AI revolution demonstrates that locally built systems can compete with major global technology firms when tailored for specific linguistic contexts.The 3-billion-parameter Sarvam Vision model focuses on OCR, layout understanding, and visual reasoning across India's various languages and scripts
4
. In evaluations against widely used global models from leading AI research labs, the Vision tool registered higher accuracy on benchmarks for Indian language document recognition1
. The vision-language model interprets nested tables, scene-based text, and chart data across various Indian language scripts and layouts, addressing real-world challenges in sectors such as banking, education, and public services where paper-based and multilingual communication remains common. Sarvam AI has made its APIs free for developers through February 2026, signaling confidence in the model's capabilities4
.
Source: Analytics Insight
Sarvam AI's Bulbul V3 model, launched ahead of the India-AI Impact Summit 2026, generates expressive text-to-speech output across 11 Indian languages, with plans to expand to 22 languages
2
. Independent blind listening studies showed that Bulbul V3 handled numerals, named entities, and code-mixed text more effectively than several competitive systems, including those from OpenAI1
. While ElevenLabs ranked highest in overall sound quality, Bulbul V3 beat competitors like Cartesia Sonic-3 in general evaluations and performed best in telephony quality tests3
. The text-to-speech engine supports real-time audio output, natural pauses, emphasis, and pace, making it suitable for live conversations, call centres, and interactive applications. The model also includes consent-based voice cloning with safeguards built for large enterprise use, accessible through the Sarvam Dashboard with unlimited API usage available until February 28, 20263
.Related Stories
Sarvam AI's approach aligns with growing interest in sovereign AI solutions built within India to meet local regulatory and privacy expectations. The startup is among 12 entities selected under the Rs 10,300 crore IndiaAI Mission, where sovereign Indian AI models are expected to be unveiled
3
. This mandate explains the company's focus on Indic OCR, multilingual voice synthesis, and document intelligence—the foundational infrastructure for governance, fintech, and citizen services4
. By focusing on India's unique challenges and strengths, this philosophy contrasts with dominant global AI narratives that prioritise breadth of capability over local specificity. Supporters argue this focus could reduce dependency on foreign AI infrastructure while driving innovation attuned to cultural and linguistic diversity1
.Sarvam Audio, launched earlier in the same week as Vision and Bulbul V3, extends speech recognition across 22 Indian languages with strong performance on accents, noise, and multi-speaker environments
4
. The rapid-fire trio of models spanning vision, speech recognition, and text-to-speech represents a comprehensive Indic stack designed to address real needs that large generic systems sometimes overlook. Voice technologies that speak and understand India's vernacular languages can broaden the reach of digital services, especially in regions where English is not predominant, making AI more inclusive and relevant for a broader population1
. Sarvam AI joined the AI Alliance in 2024, announcing itself as a serious player from India on the world stage4
. The real test will be adoption—if government services, enterprises, and developers begin integrating these models at scale, Sarvam could become the reference layer for India's AI ecosystem, much like UPI did for fintech4
.
Source: Digit
Summarized by
Navi
[1]
[2]
[3]
1
Policy and Regulation

2
Technology

3
Technology
