2 Sources
[1]
Sail raises $80M to make AI agents cheaper to run
Sail Research has raised $80m to make AI agents cheaper to run. The startup, founded by ex-Apple and ex-NVIDIA engineers, says it can serve the tokens agents burn through at up to 10 times lower cost. AI agents are hungry. Leave one running for hours and it can chew through billions of tokens on a single task. That gets expensive fast, and the bill is what stops many agents from leaving the lab. A new startup called Sail Research thinks it can fix the economics. Sail has raised $80m in combined seed and Series A funding at a $450m valuation. Sequoia led the seed round and Kleiner Perkins led the Series A. Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A* and Abstract Ventures also joined. The angel list reads like its own headline. It includes John Hennessy, the chairman of Alphabet, Lip-Bu Tan, the chief executive of Intel, and Tri Dao, the chief scientist at Together AI. The San Francisco company also drew angels from Anthropic, OpenAI, SpaceX and Thinking Machines. Built for agents, not people Sail's pitch starts with a simple observation. Engineers built today's AI infrastructure for a human waiting at a prompt. That user wants one thing above all: speed. An agent is different. It works on its own for hours or days, and it cares about scale, reliability and cost. That gap is the whole opportunity. A person needs a fast reply. An agent needs to sustain thousands of calls over a long stretch without the price spiralling. Sail argues the existing stack optimises for the wrong thing. "Most inference infrastructure was designed to minimise latency on a single request, but that's the wrong optimisation for agents," said Samir Menon, co-founder and chief technology officer. Agents, he says, need to hold throughput across thousands of concurrent calls over hours. Sail rebuilt the stack around that constraint. The thesis has a name. Sail calls it "abundant intelligence," the idea that the more compute and context an agent gets, the better its work. The job is to make that compute cheap enough to hand over freely. How it claims to cut the cost Sail sells two things. First comes the inference engine. Sail rebuilt it for throughput, not speed, to serve agents spending billions of tokens on one task. The company claims it delivers up to 10 times lower cost per token than rivals. The second is a sandbox it calls Sailboxes. These environments run for hours or days, not seconds. Crucially, they only charge for the time an agent is actually working, which trims the dead-time costs that pile up on long tasks. The savings come from squeezing the whole stack. Sail customises open-source inference engines to push GPU performance toward the frontier. It spreads workloads across providers for resilience. It also hunts for cheap, underused compute wherever it sits. There is a benchmark to point to. Sail says its inference topped BrowseComp-Plus, a deep-research evaluation. It hit 90.72% accuracy at up to 10 times lower cost than leading alternatives. The platform also plugs in easily. Its API works with existing OpenAI workflows and supports open models including DeepSeek, Gemma, GLM, Kimi and Nemotron. The founders and the bet The team comes from the hardware side of AI. Co-founder and chief executive Neil Movva spent years at NVIDIA pushing GPU performance to its limits, then worked on infrastructure at Apple and Together AI. Menon also comes from Apple, where he built systems at large scale. That background shapes the product. Sail's edge, the founders argue, comes from tight integration all the way from the silicon to the API. Control the full path and you can open up the trade-off between cost and latency in a way a single layer cannot. "Sail exists to make intelligence abundant," Movva said. "Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits." The framing is deliberately big. The company wants to sound like plumbing for a much larger future. Kleiner Perkins is buying the premise. "The infrastructure layer for the agent era is one of the most important bets in AI right now," said partner Aditya Naganath. He praised the founders' mix of compute expertise and systems rigour, the kind that comes from building at the limits of scale. A crowded, costly market The timing fits a clear trend. Inference, the cost of actually running a model, has become the most valuable layer in AI infrastructure. Nebius recently paid $643m for the 20-person startup Eigen AI, a sign of how badly the industry wants people who can make chips produce more tokens for less. The money is chasing a real problem. Token prices have collapsed, yet enterprise AI bills have tripled, because agents consume so many more tokens per task. Cutting the price per token is one of the few levers that bends the curve back down. Sail is not alone in pulling it. Others attack the same cost from different angles. Fractile is building inference chips as an alternative to NVIDIA, while GPU clouds like RunPod rent raw compute by the hour. The layer is filling up fast. The capital backs that up. Inference specialist Baseten recently raised $1.5bn at a valuation as high as $13bn. Against those numbers, Sail's $450m valuation looks modest, which leaves it plenty of room to grow if the thesis holds. The open question The backdrop is enormous. Forecasters expect global AI spending to hit $2.5tn in 2026, yet the most ambitious agent workloads remain out of reach for most companies. Sail wants to be the reason that changes. It already has paying customers to point to. The web-data firm Parallel, the code-review platform Detail.dev and the startup Jack and Jill all run on Sail. Detail.dev says it has pushed trillions of tokens through the platform and likes the economics. The risk is that efficiency is a moving target. Every rival is chasing the same 10x, and frontier labs keep cutting their own prices. A cost edge built on clever engineering can erode as the whole field gets cheaper. Sail is betting its full-stack approach is harder to copy than a single trick. If agents really do become the main way AI gets used, the company that makes them affordable to run could matter enormously. Whether that company is Sail, at the scale of trillions of tokens, is the question this round leaves open.
[2]
Exclusive: A former Apple engineer thinks AI infrastructure is built for the wrong future. Investors just gave him $80 million to fix it | Fortune
For months, Kleiner Perkins partner Aditya Naganath had been mulling over his investing thesis that the next wave of AI wasn't going to be a chatbot -- it was going to be software that does the work autonomously, for hours at a time, across thousands of tasks at once. The trouble was, nobody had built the plumbing for it yet. Then he met Neil Movva. "It felt obvious to both of us that you're going to need a different, specific inference platform built for these long-running agents," Naganath told Fortune. Now, six months after Naganath and Movva first chatted, Movva's startup, Sail Research, has launched from stealth with $80 million in seed and Series A funding at a $450 million valuation, Fortune learned exclusively. Kleiner Perkins led the Series A. Sequoia, Redpoint, Theory Ventures, Vine Ventures, and CRV also participated. Sail Research wants to fix one of AI's expensive problems. AI infrastructure was designed for quick, single exchanges -- think a chatbot answering a question. But enterprises are increasingly deploying AI agents that run autonomously for hours, reading entire codebases, screening hundreds of job candidates, or researching complex topics without a human in the loop. At that scale, enterprise AI bills have tripled even as per-token prices have fallen, because agentic workflows consume tokens at a rate 50 to 500 times higher than simple chat. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030. Movva's solution is an end-to-end infrastructure platform built from the lowest level of the chip up. Sail writes the software that orchestrates and optimizes how AI models run on existing chips. Think of it like a highly efficient traffic system that tells the hardware exactly how to allocate its resources, squeezing far more work out of the same physical computing power. Most AI serving platforms optimize for low latency, meaning they prioritize getting you an answer fast. Sail does the opposite, sacrificing real-time responsiveness to pack far more computing work into every unit of power. The tradeoff is deliberate: Sail can't power a voice assistant or a live chatbot. But for agents that run for hours? Movva claims customers often seen between 3x to 10x cost improvements over comparable alternatives. "We only care about efficiency," Movva told Fortune. "It's quite difficult to build an inference engine for both throughput and latency at the same time. Everyone else is optimizing for latency, and we just care about throughput." Movva, 28, is one of a small number of engineers who has worked at every meaningful layer of the AI stack. He watched NVIDIA pivot from gaming chips to AI silicon in 2016 and 2017. He joined Apple to work on the chip powering computer vision on a billion iPhones -- then grew frustrated that Apple's ambition topped out at animoji (the animated characters users can apply on FaceTime). From there, he went to Together AI, one of the leading open-source model inference providers, to get back to GPU-level work. What he saw there crystallized Sail's thesis: Together had been built for interactive applications and had made every architectural trade-off accordingly. Long-horizon agents needed something built from scratch with different priorities. Co-founder and CTO Samir Menon also comes from Apple, where he worked in security engineering at scale. The two met on the first day of freshman year at Stanford -- they took the same classes, and saw the same academic counselor. Movva jokes that Menon got slightly better grades. They reunited in late 2025 to rebuild the inference stack from scratch. Sail launched its inference service in March and has already ramped to processing trillions of tokens per week. One early customer, Detail.dev, uses Sail to run code-review agents that spend three to four hours -- sometimes longer -- digging through an entire codebase hunting for bugs that five-minute reviews miss. "The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases," Movva said. But the competitive risk is real. Together AI is a formidable incumbent, and it's also a Kleiner Perkins portfolio company. Naganath's view is that the two are not in conflict: Together owns the interactive, chat-based market; Sail owns the long-running agent workload. "Being specific and purpose-built should win out in the long run," he said. The larger threat may come from the frontier labs -- Anthropic, OpenAI, and Google -- which are building their own inference infrastructure and could, in theory, commoditize the layer Sail is betting on. Movva's counter: token prices have been flat or rising for six months, demand for compute is growing faster than supply, and the world needs someone focused obsessively on squeezing the most intelligence out of every available GPU. "We feel an emotional pain when we see a GPU be idle or wasted in any way," he said. Naganath's bull case is simple: "The belief that inference is going to be a 10x -- even 100x -- bigger market than it is today."
Share
Copy Link
Sail Research, founded by ex-Apple and ex-NVIDIA engineers, has emerged from stealth with $80 million in funding to tackle one of AI's most expensive problems. The startup claims it can serve tokens at up to 10 times lower cost by building AI infrastructure specifically for autonomous AI agents that run for hours or days, rather than quick chatbot exchanges.
Sail Research has emerged from stealth with $80 million funding at a $450M valuation to fundamentally rethink how AI infrastructure supports autonomous AI agents
1
2
. The San Francisco startup, founded by Neil Movva and Samir Menon—both ex-Apple engineers—secured combined seed and Series A rounds led by Sequoia and Kleiner Perkins respectively, with participation from Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures1
. The company's pitch addresses a pressing industry challenge: enterprise AI bills have tripled even as per-token prices have fallen, because agentic workflows consume tokens at a rate 50 to 500 times higher than simple chat interactions2
.
Source: Fortune
The cost challenges of AI agent deployments stem from a fundamental mismatch in design philosophy. Current AI infrastructure was built for humans waiting at a prompt, where speed matters most. AI agents operate differently—they work autonomously for hours or days, executing thousands of concurrent calls on a single task and burning through billions of tokens in the process
1
. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030, making the economics of long-running agents a critical bottleneck for enterprise adoption2
. This gap between infrastructure capabilities and agent requirements is what stops many promising applications from leaving the lab.Sail Research's approach centers on a deliberate architectural trade-off: sacrificing real-time responsiveness to maximize computing efficiency. "Most inference infrastructure was designed to minimise latency on a single request, but that's the wrong optimisation for agents," said Samir Menon, co-founder and CTO
1
. The company rebuilt its inference engine from the chip level up, focusing on throughput over latency to sustain thousands of calls over extended periods without spiraling costs1
. Movva, who previously worked at NVIDIA, Apple, and Together AI, claims customers often see between 3x to 10x cost improvements over comparable alternatives2
.The platform delivers two core products. First, a specialized inference engine that customizes open-source tools to push GPU performance toward maximum efficiency while spreading workloads across providers for resilience
1
. Second, "Sailboxes"—sandbox environments designed to run for hours or days that only charge for active working time, eliminating the dead-time costs that accumulate during long tasks. Sail's inference engine topped BrowseComp-Plus, a deep-research evaluation, hitting 90.72% accuracy at up to 10 times lower cost than leading alternatives1
.Sail launched its inference service in March and has already scaled to processing trillions of tokens per week
2
. Early customer Detail.dev uses Sail to run code-review agents that spend three to four hours—sometimes longer—analyzing entire codebases for bugs that quick reviews miss. "The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases," Movva told Fortune2
. The platform's API integrates with existing OpenAI workflows and supports open models including DeepSeek, Gemma, GLM, Kimi, and Nemotron1
.The company's thesis—what it calls "abundant intelligence"—argues that the more compute and context an agent receives, the better its output. The challenge is making that compute cheap enough to distribute freely
1
. "Sail exists to make intelligence abundant," said Movva. "Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits"1
.The investor lineup signals confidence in Sail's direction. Kleiner Perkins partner Aditya Naganath, who led the Series A, had been developing an investment thesis that the next wave of AI would center on software working autonomously rather than chatbots. "The infrastructure layer for the agent era is one of the most important bets in AI right now," Naganath said
1
. The company also attracted notable angels including John Hennessy, chairman of Alphabet, Lip-Bu Tan, CEO of Intel, and Tri Dao, chief scientist at Together AI, along with individuals from Anthropic, OpenAI, SpaceX, and Thinking Machines1
.The timing aligns with broader market dynamics. Inference has become the most valuable layer in AI infrastructure, evidenced by Nebius recently paying $643 million for 20-person startup Eigen AI
1
. Token prices have remained flat or rising for six months despite earlier predictions of continued decline, while demand for compute grows faster than supply2
. However, competitive risks loom. Frontier labs like Anthropic, OpenAI, and Google are building their own inference infrastructure, potentially commoditizing the layer Sail targets. Movva's counter focuses on specialization: "We feel an emotional pain when we see a GPU be idle or wasted in any way," he said2
.Summarized by
Navi
[1]
1
Technology

2
Policy and Regulation

3
Technology
