LMArena raises $150M at $1.7B valuation, tripling worth in seven months with AI evaluation platform

4 Sources

Share

LMArena, the AI evaluation platform that started as a UC Berkeley research project, has raised $150 million in Series A funding at a $1.7 billion valuation. The startup, which lets users compare AI models through crowdsourced evaluations, tripled its valuation in just seven months and now serves over 5 million monthly users across 150 countries.

LMArena Secures $150 Million Series A Funding at $1.7 Billion Valuation

LMArena, the AI startup that transformed from a UC Berkeley research project into a commercial powerhouse, announced on Tuesday that it raised $150 million in Series A funding at a post-money valuation of $1.7 billion

1

2

. The round was co-led by Felicis and UC Investments, with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, and Laude Ventures

1

. This marks a remarkable trajectory for the AI evaluation platform, which tripled its valuation in approximately seven months after raising $100 million at a $600 million valuation in May

1

.

Source: The Next Web

Source: The Next Web

Crowdsourced AI Model Performance Testing Drives Platform Growth

LMArena, formerly known as Chatbot Arena, operates a web-based platform that allows users to compare large language models through anonymous, crowdsourced evaluations

2

. The platform sends each user prompt to two different AI models and displays their responses side-by-side, with users selecting which model performed better

4

. This approach addresses a critical gap in traditional benchmarks, which often fail to capture how AI systems behave in real-world, open-ended human interactions

3

. The platform now serves more than 5 million monthly users across 150 countries, generating 60 million conversations per month

1

.

Source: SiliconANGLE

Source: SiliconANGLE

AI Model Leaderboards Become Industry Standard for Human Preference

The startup's AI model leaderboards have become essential infrastructure for the AI industry, ranking various models from OpenAI GPT, Google Gemini, Anthropic Claude, and Grok across multiple tasks including text, web development, vision, and text-to-image generation

1

. Gemini 3 Pro currently tops the leaderboard, followed by Gemini 3 Flash and xAI's Grok 4.1

4

. What distinguishes LMArena from traditional benchmarks is its focus on human preference rather than isolated accuracy scores

3

. The platform captures how people respond to tone, clarity, and real-world usefulness, providing a living signal that changes as prompts become less predictable

3

.

Source: TechCrunch

Source: TechCrunch

Commercial Launch Achieves $30 Million Annualized Run Rate

In September, LMArena publicly launched AI Evaluations, a commercial service that allows enterprises, model labs, and developers to hire the company to perform model evaluations through its community

1

. This service achieved an annualized consumption rate of $30 million as of December, less than four months after launch

1

3

. The rapid revenue growth demonstrates strong demand for neutral, third-party AI evaluation infrastructure as enterprises struggle to determine which AI models to trust

3

.

Addressing Data Contamination and Benchmark Limitations

Traditional AI benchmarks face significant quality issues, particularly data contamination, which occurs when a model finds existing answers to benchmark questions in external sources

4

. LMArena mitigates these issues by using continuously refreshed prompts crowdsourced from real users rather than static question sets

4

. As generative AI models grew larger and more similar, benchmark improvements became marginal, with models optimizing for tests themselves rather than real use cases

3

. "To measure the real utility of AI, we need to put it in the hands of real users. LMArena does exactly this," said Anastasios Angelopoulos, co-founder and CEO of LMArena

2

.

UC Berkeley Researchers Build Trust Through User Feedback

Founded by UC Berkeley researchers Anastasios Angelopoulos and Wei-Lin Chiang, LMArena began as Chatbot Arena, an open research project originally funded through grants and donations

1

. The platform challenges the assumption that trust in AI will emerge naturally as models improve, instead treating trust as social and contextual, built through experience rather than vendor claims

3

. AI developers including OpenAI have used the platform to test new models before broad release, with GPT-5 tested under the codename "summit"

4

.

Investment Signals AI Evaluation Becoming Critical Infrastructure

The $150 million Series A funding reflects investor confidence that AI evaluation itself is becoming essential infrastructure as the number of models explodes

3

. LMArena will use the fresh capital to operate its platform, expand its technical team, and strengthen research capabilities

2

. For regulators and policymakers, human-anchored signals matter as oversight frameworks need evidence reflecting real usage rather than idealized scenarios

3

. While competitors like Scale AI's SEAL Showdown have emerged offering more granular rankings, and academic research notes voting-based leaderboards can be susceptible to manipulation, the demand for richer, human-grounded signals beyond traditional benchmarks continues to grow

3

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo