2 Sources
[1]
Neysa and Pipeshift team up for AI inference play in India
Bengaluru's Neysa and AI startup Pipeshift are joining forces. They aim to meet India's growing need for AI inference services. This partnership will tackle increasing costs and delays in AI adoption. India's AI market is projected to reach billions. The collaboration focuses on specialized, local AI systems. Bengaluru GPU service provider Neysa and AI startup Pipeshift, which offers managed AI inference services provider have partnered to tap into the growing demand in India for AI inference amid rising AI costs, and latency due to AI adoption. Karan Kirpalani, chief product officer, Neysa, said that India's inference landscape at the scale and diversity requires complex AI ecosystems. "We are seeing a lot of demand for multi-modal, multilingual AI systems that are specialised and local. The partnership between Pipeshift and Neysa addresses those concerns," he said. Kirpalani said that while there is no single report on the Indian inference market, the ballpark is about $28-30 billion as of 2025 and pegs the global inference market to be about $125 billion, in 2025, by conservative estimates.
[2]
Neysa and Pipeshift launch real-time inference for open-source AI models, fully deployed within India
The partnership enables Indian enterprises to run production-scale AI with lower latency, predictable economics, and in-country data control India is emerging as one of the world's largest inference-heavy AI markets, driven by the rise of voice agents, enterprise copilots, AI assistants, and reasoning workflows across sectors. Yet much of the infrastructure serving those workloads continues to sit outside the country. Against this backdrop, Neysa, a purpose built AI Compute and Acceleration Cloud provider, and Pipeshift, a managed inference platform for open-source AI models, today announced a partnership to launch production-grade real-time inference infrastructure fully deployed within India. As enterprises scale AI adoption across customer support, software development, analytics, operations, and enterprise workflows, inference is increasingly becoming a recurring dependency layer tied to foreign infrastructure, foreign pricing, and dollar-denominated APIs. While shared token-based APIs helped companies get started with AI adoption, many Indian enterprises are now encountering production-scale challenges around unpredictable latency, escalating token costs, shared infrastructure bottlenecks, and overseas data routing. The partnership addresses this gap by extending Velocis, Neysa's AI Acceleration Cloud System, with dedicated, low-latency real-time inference for enterprises deploying production AI applications. Pipeshift's inference platform running on Neysa's AI-native GPU infrastructure, enables enterprises to deploy single-tenant inference environments for open-source models including Gemma, Qwen, GPT-OSS, Llama, DeepSeek, and Mistral through OpenAI-compatible APIs, without managing underlying GPU infrastructure themselves. Designed for latency-sensitive workloads including voice AI, enterprise search, copilots, workflow automation, and reasoning-based systems, the platform is tuned at the kernel and inference-engine level for production traffic, dynamically auto-scales during demand spikes, and keeps prompts, inference, and enterprise data fully within India. The platform also supports workloads including speech-to-text, text-to-speech, OCR, and enterprise automation systems within a unified infrastructure environment. The platform also eliminates shared rate limits, cold-start delays, and cross-region routing overheads that often affect shared inference environments, while enabling enterprises to transition between newer GPU generations and open-source model releases without rearchitecting applications. Commenting on the announcement, Karan Kirpalani, Chief Product Officer, Neysa, said, "Scaling open-source models introduces a dual bottleneck: volatile token economics and high Time-to-First-Token (TTFT) driven by shared rate limits and cross-region routing. By integrating Pipeshift's inference-engine optimizations directly onto Neysa's single-tenant, optimized bare metal, we eliminate this friction entirely. The upshot for enterprises is a seamless, OpenAI-compatible drop-in replacement that guarantees cold-starts, predictable and highly optimized token latency, and absolute sovereign data control at scale." "There is a clear line between AI that works in a demo and AI that works in production. Crossing that line takes more than a good model. It takes infrastructure that holds latency under load and keeps costs predictable at scale. That is the line our partnership with Neysa helps Indian companies cross," said Arko Chattopadhyay, Co-Founder and CEO, Pipeshift. Typical deployment timelines from evaluation to production are under two weeks, allowing enterprises to move production AI workloads without rebuilding applications or reconfiguring existing API integrations. Early deployments on the platform already include production AI workloads across voice AI and enterprise automation. Nurix AI achieved a 3x reduction in Time to First Token (TTFT) for its voice AI deployments in India. "We needed sub-second LLM latency for voice agents in production, and real-time inference from Neysa and Pipeshift cut our TTFT 3x versus our prior setup in India," said Pushkar Patel, Nurix AI. Arrowhead AI is using the platform for multilingual inference workloads. "We fine-tuned a custom LLM for colloquial Indian languages and needed a deployment partner who could give us predictable tail latency in production. Neysa and Pipeshift had our fine-tuned model live as an inference endpoint within a day, and now also host the SLMs we use for predictive caching and the custom containers running ASR models," said Vengadanathan Srinivasan, CTO, Arrowhead AI. The platform is immediately available for enterprises evaluating production-scale open-source AI deployments across customer support, voice AI, enterprise copilots, workflow automation, and regulated AI workloads.
Share
Copy Link
Bengaluru-based Neysa and AI startup Pipeshift have partnered to launch production-grade real-time AI inference infrastructure entirely within India. The collaboration addresses growing demand for AI inference services in India, targeting a market estimated at $28-30 billion in 2025. The platform enables enterprises to deploy open-source AI models with lower latency, predictable costs, and full in-country data control.
Bengaluru GPU service provider Neysa and managed inference platform Pipeshift have announced a partnership to launch production-grade real-time inference for open-source AI models fully deployed within India
2
. The collaboration aims to tap into surging demand for AI inference services in India as enterprises scale AI adoption across customer support, software development, analytics, and enterprise workflows1
. According to Karan Kirpalani, Chief Product Officer at Neysa, India's AI inference market is estimated at $28-30 billion as of 2025, while the global market stands at approximately $125 billion1
.
Source: ET
As AI adoption in India accelerates, many enterprises face production-scale challenges including unpredictable latency, escalating token costs, shared infrastructure bottlenecks, and overseas data routing
2
. While shared token-based APIs helped companies initiate AI projects, much of the infrastructure serving these workloads continues to sit outside the country, creating dependency on foreign infrastructure and dollar-denominated APIs2
. The partnership between Neysa and Pipeshift directly addresses this gap by extending Velocis, Neysa's AI Acceleration Cloud System, with dedicated, low-latency infrastructure for production-grade AI applications2
.Pipeshift's managed inference platform running on Neysa's AI-native GPU infrastructure enables enterprises to deploy single-tenant inference environments for open-source AI models including Gemma, Qwen, GPT-OSS, Llama, DeepSeek, and Mistral through OpenAI-compatible APIs
2
. The platform is designed for latency-sensitive workloads including voice AI, enterprise search, enterprise copilots, workflow automation, and reasoning-based systems2
. Critically, the infrastructure keeps prompts, inference, and enterprise data fully within India, providing in-country data control that addresses sovereignty concerns2
.Kirpalani explained that scaling open-source models introduces a dual bottleneck: volatile token economics and high Time-to-First-Token driven by shared rate limits and cross-region routing. "By integrating Pipeshift's inference-engine optimizations directly onto Neysa's single-tenant, optimized bare metal, we eliminate this friction entirely," he said
2
. The platform eliminates shared rate limits, cold-start delays, and cross-region routing overheads that often affect shared inference environments, while dynamically auto-scaling during demand spikes2
.Related Stories
Early production deployments demonstrate the platform's capabilities for local AI systems. Nurix AI achieved a 3x reduction in Time to First Token for its voice AI deployments in India. "We needed sub-second LLM latency for voice agents in production, and real-time inference from Neysa and Pipeshift cut our TTFT 3x versus our prior setup in India," said Pushkar Patel from Nurix AI
2
. Arrowhead AI is using the platform for multilingual inference workloads, with CTO Vengadanathan Srinivasan noting that Neysa and Pipeshift had their fine-tuned model live as an inference endpoint within a day2
.Kirpalani noted that India's inference landscape requires complex AI ecosystems at significant scale and diversity. "We are seeing a lot of demand for multi-modal, multilingual AI systems that are specialised and local. The partnership between Pipeshift and Neysa addresses those concerns," he said
1
. Arko Chattopadhyay, Co-Founder and CEO of Pipeshift, emphasized the production readiness focus: "There is a clear line between AI that works in a demo and AI that works in production. Crossing that line takes more than a good model. It takes infrastructure that holds latency under load and keeps costs predictable at scale"2
. The platform is immediately available for enterprises evaluating production-scale deployments across customer support, workflow automation, and regulated AI workloads, with typical deployment timelines from evaluation to production under two weeks2
.Summarized by
Navi
22 Oct 2024•Business and Economy

16 Feb 2026•Startups

17 Feb 2026•Business and Economy

1
Business and Economy

2
Technology

3
Policy and Regulation
