Inferact raises $150M to commercialize vLLM and build next-gen AI inference infrastructure

3 Sources

Share

The creators of vLLM have launched Inferact with $150 million in seed funding at an $800 million valuation, led by Andreessen Horowitz and Lightspeed Venture Partners. As AI shifts from training to deployment, technologies that make AI inference faster and more affordable are attracting significant investor attention. The startup plans to enhance the open-source project while building commercial infrastructure.

Inferact Secures Major Funding to Transform AI Inference

Inferact has emerged from stealth with $150 million in seed funding at an $800 million valuation, marking one of the most significant early-stage raises in the AI infrastructure space

1

. The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund, Databricks' venture capital arm, and UC Berkeley Chancellor's Fund

2

3

. The startup was founded by the maintainers of vLLM, the leading open-source inference engine that has become essential infrastructure for AI model deployment across the industry.

Source: AIM

Source: AIM

From Open Source to Commercial Powerhouse

Inferact CEO Simon Mo, along with co-founders Woosuk Kwon, Kaichao You, and Roger Wang, built vLLM at UC Berkeley's Sky Computing Lab under the guidance of Databricks co-founder Ion Stoica in 2023

3

. The decision to commercialize vLLM reflects a broader industry trend as AI shifts from training models to deploying them in applications through AI inference

1

. The open-source project has attracted over 2,000 contributors and supports more than 500 model architectures and 200 accelerator types

2

. Production users include Meta, Google, Character AI, Amazon's cloud service, and the shopping app

1

2

.

Technical Innovation Driving Inference Performance

The vLLM project addresses critical bottlenecks in AI inference through sophisticated memory management and optimization techniques. PagedAttention, a particularly important feature, enables storing KV cache data in non-adjacent sections of server RAM, significantly reducing memory waste and lowering hardware consumption for large language models (LLMs)

3

. When an LLM processes a prompt, it performs calculations incrementally and saves results to a KV cache, which traditionally requires substantial memory consumption

3

. The tool also employs quantization to compress AI models' weights and enables models to generate multiple tokens simultaneously rather than one at a time, reducing loading times

3

.

Source: SiliconANGLE

Source: SiliconANGLE

Building the Next-Generation Commercial Inference Engine

Inferact plans to develop a next-generation commercial inference engine that makes deploying AI models as simple as spinning up a serverless database

2

. "We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building," Woosuk Kwon posted

2

. Job postings indicate the company will equip its software with observability, troubleshooting, and disaster recovery features, likely running on Kubernetes

3

.

Source: TechCrunch

Source: TechCrunch

Strategic Positioning in a Competitive Landscape

Inferact's debut mirrors the recent commercialization of SGLang as RadixArk, which secured capital at a $400 million valuation led by Accel

1

. Both projects were incubated at Ion Stoica's UC Berkeley lab, highlighting the university's role as an incubator for critical AI infrastructure

1

. Andreessen Horowitz emphasized their investment represents "an explicit bet that the future will bring incredible diversity of AI apps, agents, and workloads running on a variety of hardware platforms"

2

. The team will use funding to provide financial and developer resources to handle increasing model complexity, hardware diversity, and deployment scale while continuing to enhance hardware support for emerging architectures

2

. Inferact is actively hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo