Sail Research raises $80M to slash costs for running AI agents by up to 10 times

2 Sources

Share

Sail Research, founded by ex-Apple and ex-NVIDIA engineers, has emerged from stealth with $80 million in funding to tackle one of AI's most expensive problems. The startup claims it can serve tokens at up to 10 times lower cost by building AI infrastructure specifically for autonomous AI agents that run for hours or days, rather than quick chatbot exchanges.

Ex-Apple Engineers Target AI's Most Expensive Problem

Sail Research has emerged from stealth with $80 million funding at a $450M valuation to fundamentally rethink how AI infrastructure supports autonomous AI agents

1

2

. The San Francisco startup, founded by Neil Movva and Samir Menon—both ex-Apple engineers—secured combined seed and Series A rounds led by Sequoia and Kleiner Perkins respectively, with participation from Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures

1

. The company's pitch addresses a pressing industry challenge: enterprise AI bills have tripled even as per-token prices have fallen, because agentic workflows consume tokens at a rate 50 to 500 times higher than simple chat interactions

2

.

Source: Fortune

Source: Fortune

Why Running AI Agents Cheaper Matters Now

The cost challenges of AI agent deployments stem from a fundamental mismatch in design philosophy. Current AI infrastructure was built for humans waiting at a prompt, where speed matters most. AI agents operate differently—they work autonomously for hours or days, executing thousands of concurrent calls on a single task and burning through billions of tokens in the process

1

. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030, making the economics of long-running agents a critical bottleneck for enterprise adoption

2

. This gap between infrastructure capabilities and agent requirements is what stops many promising applications from leaving the lab.

Optimizing for Throughput, Not Speed

Sail Research's approach centers on a deliberate architectural trade-off: sacrificing real-time responsiveness to maximize computing efficiency. "Most inference infrastructure was designed to minimise latency on a single request, but that's the wrong optimisation for agents," said Samir Menon, co-founder and CTO

1

. The company rebuilt its inference engine from the chip level up, focusing on throughput over latency to sustain thousands of calls over extended periods without spiraling costs

1

. Movva, who previously worked at NVIDIA, Apple, and Together AI, claims customers often see between 3x to 10x cost improvements over comparable alternatives

2

.

The platform delivers two core products. First, a specialized inference engine that customizes open-source tools to push GPU performance toward maximum efficiency while spreading workloads across providers for resilience

1

. Second, "Sailboxes"—sandbox environments designed to run for hours or days that only charge for active working time, eliminating the dead-time costs that accumulate during long tasks. Sail's inference engine topped BrowseComp-Plus, a deep-research evaluation, hitting 90.72% accuracy at up to 10 times lower cost than leading alternatives

1

.

Early Traction in the Agentic AI Era

Sail launched its inference service in March and has already scaled to processing trillions of tokens per week

2

. Early customer Detail.dev uses Sail to run code-review agents that spend three to four hours—sometimes longer—analyzing entire codebases for bugs that quick reviews miss. "The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases," Movva told Fortune

2

. The platform's API integrates with existing OpenAI workflows and supports open models including DeepSeek, Gemma, GLM, Kimi, and Nemotron

1

.

The company's thesis—what it calls "abundant intelligence"—argues that the more compute and context an agent receives, the better its output. The challenge is making that compute cheap enough to distribute freely

1

. "Sail exists to make intelligence abundant," said Movva. "Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits"

1

.

Betting on Infrastructure for a Different Future

The investor lineup signals confidence in Sail's direction. Kleiner Perkins partner Aditya Naganath, who led the Series A, had been developing an investment thesis that the next wave of AI would center on software working autonomously rather than chatbots. "The infrastructure layer for the agent era is one of the most important bets in AI right now," Naganath said

1

. The company also attracted notable angels including John Hennessy, chairman of Alphabet, Lip-Bu Tan, CEO of Intel, and Tri Dao, chief scientist at Together AI, along with individuals from Anthropic, OpenAI, SpaceX, and Thinking Machines

1

.

The timing aligns with broader market dynamics. Inference has become the most valuable layer in AI infrastructure, evidenced by Nebius recently paying $643 million for 20-person startup Eigen AI

1

. Token prices have remained flat or rising for six months despite earlier predictions of continued decline, while demand for compute grows faster than supply

2

. However, competitive risks loom. Frontier labs like Anthropic, OpenAI, and Google are building their own inference infrastructure, potentially commoditizing the layer Sail targets. Movva's counter focuses on specialization: "We feel an emotional pain when we see a GPU be idle or wasted in any way," he said

2

.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved