2 Sources
2 Sources
[1]
This startup is betting tokenmaxxing will create the next compute giant | TechCrunch
"Give me tokens. Just give me tokens. I want them fast. I want them cheap. I want them now." That's the mantra for developers building software on generative AI models, or at least what Parasail CEO Mike Henry hears. Parasail provides a cloud computing service to companies running AI models for inference, and Henry told TechCrunch it generates 500 billion tokens a day. How's that for tokenmaxxing? Henry was an executive at Groq, the LLM-focused chipmaker, where he built the company's cloud offering, an early recognition that developers building software on AI models would want cloud processing specialized to their needs. Now, after coming out of stealth a year ago, Parasail has raised a $32 million Series A to do that at scale. Henry has a background in physical chip design, but Parasail isn't committed to owning its own chips. While some of its GPUs are its own, the company mainly rents processing time at 40 data centers in 15 countries around the globe, and buys more from liquidity markets, orchestrating that all behind the scenes to drive down the cost of inference requests. By allocating workloads cleverly and avoiding demand peaks, the company aims to compete with firms that own their own silicon and might be constrained by existing customer commitments and workloads. The company's potential relies on the continued proliferation of open-source models and agents outside of frontier labs. Parasail's executives and investors say this is driven by the growing cost and friction of using offerings from companies like Anthropic and OpenAI. Instead, a hybrid architecture is emerging, according to Andreas Stuhlmüller, the CEO of Elicit, a startup that has raised a $22 million Series A to develop a research assistant for scientific literature. His customers at top pharmaceutical companies use the LLM-based tool to review and analyze data from tens of thousands of scientific papers. "We've moved more towards open models because it's pretty rough sending 100,000s of requests to an API endpoint," Stuhlmüller told TechCrunch, especially now that the company is relying on agents to improve its offering, splitting up tasks and working more strategically over longer time horizons. Open models handle the initial screening to drive down the cost of the work, before a more capable frontier model provides a final answer. The proliferation of model queries, as agents become an increasingly common part of software development, is driving the investment in companies like Parasail that provide the infrastructure for cheap inference. Samir Kumar, a partner at Touring Capital who co-led this round, told TechCrunch he expects inference to be at least 20% of the cost of building software in the future. How much of that market could be Parasail's? In the crowded cloud compute space, Henry argues that his firm's focus on inference (no training allowed) and willingness to take on startup customers without long-term commitments sets his offering apart from larger cloud-computing companies focused on enterprise business, and even better-funded competitors in the cloud inference space, like Fireworks AI and Baseten. Of course, there's a different kind of risk when all of your customers are seed and Series B startups in the unpredictable AI sector. Steve Jang, a partner at Kindred Ventures, the other co-leader in this fundraising, says the economics of deploying models will demand the kind of compute brokerage Parasail provides. And that's before widespread use of models for content generation and robotics. "Everyone thought there was an AI bubble. There's no AI bubble," he told TechCrunch. "Inference demand is far outstripping supply."
[2]
Parasail raises $32M for its pay-per-token inference cloud - SiliconANGLE
Artificial intelligence infrastructure startup Parasail Inc. today announced that it has raised $32 million in funding. Touring Capital and Kindred Ventures jointly led the Series A round. They were joined by several other funds including Samsung Electronics Co.'s startup investment arm. Renting graphics processing units from a cloud provider often requires companies to sign long-term procurement agreements. That's often not practical for startups with limited resources and enterprises working on small-scale AI pilot projects. Parasail operates an inference-optimized cloud platform, the AI Supercloud, that enables customers to buy GPU capacity on a pay-per-token basis without long-term contracts. The most advanced graphics card that the company currently offers is the H200, which Nvidia Corp. launched in early 2024. The chipmaker has since introduced two newer GPU generations that offer significantly better performance. Some of Parasail's GPUs run in internally-operated clusters, while others are hosted by partners. The company reportedly has access to GPUs in 40 data centers across more than 15 countries. Doing away with long-term GPU contracts isn't the only way that Parasail's platform streamlines inference. The company says that developers can deploy AI workloads with as few as 5 lines of code. Once a model is up and running, Parasail automates administrative tasks such as kernel configuration. Kernels are the GPU-optimized code snippets that make up an AI model. Parasail enables customers to access chips in multiple ways. There are two serverless hosting options that automate much of the work usually involved in managing GPU clusters. Parasail also offers dedicated endpoints, hardware environments that trade off some simplicity for better performance. Developers can tailor their dedicated endpoints' configuration to each AI workload. Parasail makes it possible to define how and when new GPU capacity is added when traffic to an AI model grows. Additionally, dedicated endpoints support a neural network compression method called quantization that reduces inference costs. Parasail's lineup of infrastructure offerings is rounded out by a batch processing service. It's geared towards AI workloads that process large data volumes and prioritize cost-efficiency over performance. A scientific publisher, for example, could use the service to summarize academic paper archives. "AI is becoming the core infrastructure for modern software. But the infrastructure layer itself hasn't kept up," said Parasail founder and Chief Executive Officer Mike Henry. "We built Parasail so teams can deploy custom AI at massive scale without negotiating contracts, managing fragmented GPU supply, or hiring performance engineering teams."
Share
Share
Copy Link
AI infrastructure startup Parasail has raised $32 million in Series A funding led by Touring Capital and Kindred Ventures to expand its cloud computing service for AI inference. The company processes 500 billion tokens daily across 40 data centers in 15 countries, offering developers GPU capacity without long-term contracts. Parasail's pay-per-token model targets the growing demand for cost-efficient inference as AI agents proliferate.
Parasail, an AI infrastructure startup, has closed a $32 million Series A funding round co-led by Touring Capital and Kindred Ventures, with participation from Samsung Electronics' startup investment arm and other investors
1
2
. The startup funding will fuel the expansion of Parasail's cloud computing service, which currently generates 500 billion tokens per day for developers running AI models1
. Mike Henry, CEO and founder of Parasail, previously built the cloud offering at Groq, the LLM-focused chipmaker, where he recognized that developers building software on AI models needed specialized cloud processing tailored to their needs1
.The AI inference cloud platform, branded as AI Supercloud, addresses a critical pain point in the market by offering GPU capacity on a pay-per-token model without requiring long-term contracts
2
. Traditional cloud providers often require companies to sign extended procurement agreements for renting GPUs, which proves impractical for startups with limited resources and enterprises running small-scale AI pilot projects2
. Parasail operates across 40 data centers in more than 15 countries, renting processing time and purchasing additional capacity from liquidity markets to orchestrate cost-efficient AI inference behind the scenes1
2
.
Source: SiliconANGLE
Parasail simplifies the deployment of AI workloads by enabling developers to launch models with as few as five lines of code
2
. The platform automates administrative tasks such as kernel configuration, streamlining workload management for developers2
. Customers can access GPUs through multiple hosting options, including two serverless configurations that automate cluster management and dedicated endpoints that offer customizable performance2
. The company currently offers Nvidia's H200 as its most advanced graphics card, though newer GPU generations have since been released2
.
Source: TechCrunch
Related Stories
Parasail's growth strategy relies on the continued proliferation of open-source models and agents outside frontier labs, driven by the increasing cost and friction of using offerings from companies like Anthropic and OpenAI
1
. Andreas Stuhlmüller, CEO of Elicit, a startup that raised a $22 million Series A to develop a research assistant for scientific literature, explained that his company has moved toward open models because "it's pretty rough sending 100,000s of requests to an API endpoint"1
. Elicit's customers at top pharmaceutical companies use the LLM-based tool to review and analyze data from tens of thousands of scientific papers, employing a hybrid architecture where open models handle initial screening before frontier AI models provide final answers1
.The proliferation of model queries, as agents become an increasingly common part of software development, is driving investment in companies like Parasail that provide infrastructure for cheap inference
1
. Samir Kumar, a partner at Touring Capital, expects inference to account for at least 20% of the cost of building software in the future1
. Steve Jang, a partner at Kindred Ventures, emphasized the market opportunity: "Everyone thought there was an AI bubble. There's no AI bubble. Inference demand is far outstripping supply"1
. Henry argues that Parasail's focus on inference—with no training allowed—and willingness to serve startup customers without long-term commitments distinguishes the company from larger cloud-computing firms focused on enterprise business and better-funded competitors like Fireworks AI and Baseten1
. "AI is becoming the core infrastructure for modern software. But the infrastructure layer itself hasn't kept up," Henry said2
.Summarized by
Navi
03 Apr 2025•Technology

11 Dec 2024•Technology

20 Feb 2025•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Technology
