3 Sources
3 Sources
[1]
Inference startup Inferact lands $150M to commercialize vLLM
The creators of the open-source project vLLM have announced that they transitioned the popular tool into a VC-backed startup, Inferact, raising $150 million in seed funding at an $800 million valuation. The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners, confirming TechCrunch's earlier reporting that vLLM has raised capital from a16z. Inferact's debut mirrors the recent commercialization of the SGLang project as RadixArk, which sources told us secured capital at a $400 million valuation led by Accel, as we reported on Wednesday. As the focus in AI shifts from training models to deploying them in applications, a process known as inference, technologies like vLLM and SGLang that make these AI tools run faster and more affordably are attracting investor attention. Both vLLM and SGLang were incubated in 2023 at the UC Berkeley lab of Databricks co-founder Ion Stoica. Inferact CEO Simon Mo, one of the project's original creators, told Bloomberg that existing users of vLLM include Amazon's cloud service and the shopping app.
[2]
Andreessen-Backed Inferact Raises $150 Mn to Develop Next-Gen Commercial Inference Engine | AIM
Inferact is a new startup founded by vLLM project maintainers Simon Mo, Kwon, Kaichao You, and Roger Wang. Inferact, an AI startup founded by the creators of the open-source vLLM, has secured $150 million in seed funding, valuing the company at $800 million. This funding round was spearheaded by venture capital firms Andreessen Horowitz (a16z) and Lightspeed, with support from Sequoia Capital, Altimeter Capital, Redpoint Ventures, and ZhenFund, the company announced on January 22. According to the company, vLLM is a key player at the intersection of models and hardware, collaborating with vendors to provide immediate support for new architectures and silicon. Used by various teams, it supports over 500 model architectures and 200 accelerator types, with a strong ecosystem of over 2,000 contributors. The company aims to support the growth of vLLM by providing financial and developer resources to handle increasing model complexity, hardware diversity and deployment scale. "We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building," Woosuk Kwon, co-founder of Inferact, posted on X. The startup also plans to develop a next-generation commercial inference engine that works with existing providers to improve software performance and flexibility. Inferact is led by the maintainers of the vLLM project, including Simon Mo, Kwon, Kaichao You, and Roger Wang. vLLM is the leading open-source inference engine and one of the largest open-source projects of any kind, used in production by companies like Meta, Google, Character AI, and many others. The team plans to further enhance vLLM's performance, deepen support for emerging model architectures, and expand coverage across advanced hardware. They believe the AI industry requires inference infrastructure that is not confined within proprietary limitations. "For a16z infra, investing in the vLLM community is an explicit bet that the future will bring incredible diversity of AI apps, agents, and workloads running on a variety of hardware platforms," a16z said on X. Inferact is also hiring engineers and researchers to work at the frontier of inference, "where models meet hardware at scale," Kwon said.
[3]
Inferact launches with $150M in funding to commercialize vLLM
A group of artificial intelligence researchers today launched Inferact Inc., a new startup that will commercialize the open-source vLLM project. The company is backed by $150 million in seed funding. Andreessen Horowitz and Lightspeed led the round with participation from Databricks Inc.'s venture capital arm, UC Berkeley Chancellor's Fund and several other backers. Their investment values Inferact at $800 million. Inferact's founding team includes computer science professor and Databricks co-founder Ion Stoica. He is currently the director of the University of California at Berkeley's Sky Computing Lab, which developed the original version of vLLM in 2023. The project's pool of code contributors has since grown to more than 2,000 developers. Software teams use vLLM to speed up inference workloads. The tool boosts performance by applying a long list of optimizations to large language models. Many of those optimizations, including a particularly important vLLM feature called PagedAttention, focus on reducing models' memory use. When an LLM receives a prompt, it completes a small portion of the calculations needed to produce an answer and saves the results to a so-called KV cache. It then performs another portion of the calculations, updates the KV cache with the new results and repeats the process until a prompt response is generated. Storing all those results requires a significant amount of memory. PagedAttention makes it possible to store KV cache data in non-adjacent sections of a server's RAM. That feature and certain other capabilities significantly reduce memory waste, which lowers LLMs' hardware consumption. For added measure, vLLM uses a method called quantization to compress AI models' weights and thereby shrink their memory footprint. Besides optimizing RAM use, vLLM can also boost inference speeds. LLMs usually generate prompt responses one token at a time. With LLM, developers can configure their models to generate multiple tokens at once to reduce loading times for users. "We see a future where serving AI becomes effortless," Inferact co-founder Woosuk Kwon wrote in a blog post. "Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building." The blog post hints that Inferact plans to launch a paid serverless version of vLLM. Many startups focused on commercializing open-source projects take that route. Usually, managed versions of open-source technologies automate administrative tasks such as provisioning infrastructure and downloading updates. An Inferact job posting indicates that it plans to equip its software with observability, troubleshooting and disaster recovery features. The listing suggests that the software will run on Kubernetes. Kwon wrote in today's blog post that the Inferact team, which includes several core vLLM maintainers, will also enhance the upstream open-source version. The company plans to release new performance optimizations and support for emerging AI model architectures. Additionally, Inferact will enable vLLM to run on more types of data center hardware.
Share
Share
Copy Link
The creators of vLLM have launched Inferact with $150 million in seed funding at an $800 million valuation, led by Andreessen Horowitz and Lightspeed Venture Partners. As AI shifts from training to deployment, technologies that make AI inference faster and more affordable are attracting significant investor attention. The startup plans to enhance the open-source project while building commercial infrastructure.
Inferact has emerged from stealth with $150 million in seed funding at an $800 million valuation, marking one of the most significant early-stage raises in the AI infrastructure space
1
. The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund, Databricks' venture capital arm, and UC Berkeley Chancellor's Fund2
3
. The startup was founded by the maintainers of vLLM, the leading open-source inference engine that has become essential infrastructure for AI model deployment across the industry.
Source: AIM
Inferact CEO Simon Mo, along with co-founders Woosuk Kwon, Kaichao You, and Roger Wang, built vLLM at UC Berkeley's Sky Computing Lab under the guidance of Databricks co-founder Ion Stoica in 2023
3
. The decision to commercialize vLLM reflects a broader industry trend as AI shifts from training models to deploying them in applications through AI inference1
. The open-source project has attracted over 2,000 contributors and supports more than 500 model architectures and 200 accelerator types2
. Production users include Meta, Google, Character AI, Amazon's cloud service, and the shopping app1
2
.The vLLM project addresses critical bottlenecks in AI inference through sophisticated memory management and optimization techniques. PagedAttention, a particularly important feature, enables storing KV cache data in non-adjacent sections of server RAM, significantly reducing memory waste and lowering hardware consumption for large language models (LLMs)
3
. When an LLM processes a prompt, it performs calculations incrementally and saves results to a KV cache, which traditionally requires substantial memory consumption3
. The tool also employs quantization to compress AI models' weights and enables models to generate multiple tokens simultaneously rather than one at a time, reducing loading times3
.
Source: SiliconANGLE
Related Stories
Inferact plans to develop a next-generation commercial inference engine that makes deploying AI models as simple as spinning up a serverless database
2
. "We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building," Woosuk Kwon posted2
. Job postings indicate the company will equip its software with observability, troubleshooting, and disaster recovery features, likely running on Kubernetes3
.
Source: TechCrunch
Inferact's debut mirrors the recent commercialization of SGLang as RadixArk, which secured capital at a $400 million valuation led by Accel
1
. Both projects were incubated at Ion Stoica's UC Berkeley lab, highlighting the university's role as an incubator for critical AI infrastructure1
. Andreessen Horowitz emphasized their investment represents "an explicit bet that the future will bring incredible diversity of AI apps, agents, and workloads running on a variety of hardware platforms"2
. The team will use funding to provide financial and developer resources to handle increasing model complexity, hardware diversity, and deployment scale while continuing to enhance hardware support for emerging architectures2
. Inferact is actively hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale2
.Summarized by
Navi
[2]
[3]
18 Oct 2024•Technology

20 Feb 2025•Technology

17 Sept 2025•Startups

1
Policy and Regulation

2
Technology

3
Technology
