4 Sources
4 Sources
[1]
LMArena lands $1.7B valuation four months after launching its product | TechCrunch
LMArena, a startup that originally launched as UC Berkeley research project in 2023, announced on Tuesday that it raised an $150 million Series A at a post-money valuation of $1.7 billion. The round was led by Felicis and the university's fund UC Investments. The startup bolted out of the gate as a commercial venture with a $100 million seed round in May at a $600 million valuation. This new rounds means it raised $250 million in about seven months. LMArena is best known for its crowdsourced AI model performance leaderboards. Its consumer website lets a user type a prompt that it sends to two models, with the user then choosing which model did a better job. Those results, which now span more than 5 million monthly users across 150 countries and 60 million conversations a month, the company says, fuel the leaderboards. It ranks various models on a variety of tasks including text, web development, vision, text-to-image, and other criteria. The models it tests include various flavors of OpenAI GPT, Google Gemini, Anthropic Claude, and Grok, as well as ones that are geared toward specialties like image generation, text to image, or reasoning. The company began as Chatbot Arena, an open research project built by UC Berkeley researchers Anastasios Angelopoulos and Wei-Lin Chiang and was originally funded through grants and donations. LMArena's leaderboards became something of an obsession among model makers. When LMArena started pursuing revenue, it partnered with select model companies such as OpenAI, Google, and Anthropic to make their flagship models available for its community to evaluate. In April, a group of competitors published a paper alleging that this helped those model makers game the startup's benchmarks, an allegation LMArena has vehemently denied. In September, it publicly launched a commercial service, AI Evaluations, in which enterprises, model labs, and developers can hire the company to perform model evaluations through its community. This gave LMArena an annualized "consumption rate" -- as the company describes its annual recurring revenue (ARR) -- of $30 million as of December, less than four months after launch That trajectory, and the startup's popularity, were enough for VCs to pile in for the Series A, which included participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners and Laude Ventures.
[2]
AI startup LMArena triples its valuation to $1.7 billion in latest fundraise
Jan 6 (Reuters) - LMArena said on Tuesday its valuation had tripled to $1.7 billion in about eight months, following a new funding round where it raised $150 million, as investors continue to pour money into artificial intelligence startups. Investor enthusiasm for generative AI surged after ChatGPT's launch in 2022 showed its commercilization potential, leading to a race for adoption and a scramble on Wall Street for exposure to key firms in the boom. LMArena, formerly known as Chatbot Arena, is a web‑based platform that allows users to compare large language models, including OpenAI's ChatGPT, Anthropic's Claude and Google's Gemini, through anonymous, crowd-sourced evaluations. "To measure the real utility of AI, we need to put it in the hands of real users. LMArena does exactly this," said Anastasios Angelopoulos, co-founder and CEO of LMArena. The fundraise was co-led by Felicis and UC Investments (University of California), with participation from Silicon Valley giant Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners and Laude Ventures. LMArena said it will use the fresh capital to operate its platform, expand its technical team and strengthen research capabilities. In the previous funding round in May, the company raised $100 million at the seed-round level, led by a16z and UC Investments. Reporting by Pragyan Kalita in Bengaluru; Editing by Vijay Kishore Our Standards: The Thomson Reuters Trust Principles., opens new tab
[3]
LMArena raises $150M at $1.7B valuation to rethink AI evaluation
The AI industry has become adept at measuring itself. Benchmarks improve, model scores rise, and every new release arrives with a list of metrics meant to signal progress. And yet, somewhere between the lab and real life, something keeps slipping. Which model actually feels better to use? Which answers would a human trust? Which system would you put in front of customers, employees, or citizens and feel comfortable standing behind it? That gap is where LMArena has quietly built its business, and why investors just put $150 million behind it at a $1.7 billion valuation, in a Series A round. The lead investors were Felicis and UC Investments, with participation from major venture firms (Andreessen Horowitz, Kleiner Perkins, Lightspeed, The House Fund, Laude Ventures). For years, benchmarks were the currency of AI credibility: accuracy scores, reasoning tests and standardized datasets. They worked until they didn't. As models grew larger and more similar, benchmark improvements became marginal. Worse, models began to optimize for the tests themselves rather than real use cases. Static evaluations struggled to reflect how AI behaves in open-ended, messy human interactions. At the same time, AI systems moved out of labs and into everyday workflows: drafting emails, writing code, powering customer support, assisting with research and advising professionals. The question shifted from "Can the model do this?" to "Should we trust it when it does?" That's a different kind of measurement problem. LMArena's answer was simple and radical: stop scoring models in isolation. On its platform, users submit a prompt and receive two anonymized responses. No branding. No model names. Just answers. Then the user picks the better one, or neither. One vote. One comparison. Repeated millions of times. The result isn't a definitive "best," but a living signal of human preference , how people respond to tone, clarity, verbosity and real-world usefulness. When the prompt isn't clean or predictable, that signal changes. And it captures something benchmarks often miss. LMArena isn't about whether a model produces a factually correct answer. It's about whether humans prefer it when it does. That distinction is subtle but meaningful in practice. Rankings on the Arena leaderboard are now referenced by developers and labs before releases and product decisions. Major models from OpenAI, Google and Anthropic are regularly evaluated there. Without traditional marketing, LMArena became a mirror the industry watches. The $150 million round isn't just a vote of confidence in LMArena's product. It signals that AI evaluation itself is becoming infrastructure. As the number of models explodes, enterprise buyers face a new question: not how to get AI, but which AI to trust. Vendor claims and classical benchmarks don't always translate to real-world reliability. Internal testing is expensive and slow. A neutral, third-party signal, something that sits between model builders and users is emerging as a critical layer. That's where LMArena lives. In September 2025, it launched AI Evaluations, a commercial service that turns its crowdsourced comparison engine into a product enterprises and labs can pay to access. LMArena says this service achieved an annualized run rate of about $30 million within months of launch. For regulators and policymakers, this kind of human-anchored signal matters too. Oversight frameworks need evidence that reflects real usage, not idealized scenarios. LMArena's approach isn't without debate. Platforms that rely on public voting and crowdsourced signals can reflect the preferences of active users, which may not align with the needs of specific professional domains. In response, competitors like Scale AI's SEAL Showdown have emerged, aiming to offer more granular, representative model rankings across languages, regions and professional contexts. Academic research also notes that voting-based leaderboards can be susceptible to manipulation if safeguards aren't in place, and that such systems may favor superficially appealing responses over technically correct ones if quality control isn't rigorous. These debates highlight that no single evaluation method captures every dimension of model behavior, but they also underscore the demand for richer, human-grounded signals beyond traditional benchmarks. There's a quiet assumption in AI that trust will emerge naturally as models improve. Better reasoning, so the logic goes, will lead to better outcomes. That framing treats alignment as a technical problem with technical solutions. LMArena challenges that idea. Trust, in real contexts, is social and contextual. It's built through experience, not claims. It's shaped by feedback loops that don't collapse under scale. By letting users, not companies, decide what works, LMArena introduces friction where the industry often prefers momentum. It slows things down just enough to ask, "Is this actually better, or just newer?" That's an uncomfortable question in a market driven by constant release cycles. It's also why LMArena's rise feels inevitable. LMArena doesn't promise safety. It doesn't declare models good or bad. It doesn't replace regulation or responsibility. What it does is simpler and more powerful: it keeps score in public. As AI systems become embedded in everyday decisions, tracking performance over time becomes less optional. Someone has to notice regressions, contextual shifts and usability patterns. In sports, referees and statisticians fill this role. In markets, auditors and rating agencies do. In AI, we're still inventing that infrastructure.
[4]
AI evaluation startup LMArena raises $150M at $1.7B valuation - SiliconANGLE
LMArena, a startup that helps artificial intelligence developers benchmark their models' output quality, has raised $150 million in funding. The company announced the Series A investment today. The round was led by Felicis and UC Investments, the University of California system's asset management arm. LMArena was founded in 2023 by two UC Berkeley researchers. The round also drew contributions from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners and Laude Ventures. Many of the firms on the list also backed LMArea's $100 million seed round in May. The company disclosed that its valuation has tripled in the 7 months since to $1.7 billion. AI benchmarks usually comprise sample prompts and correct answers to those prompts. To test an AI model's output quality, developers provide it with the sample prompts and then compare the model's responses to the correct answers. The percentage of the questions that the model answers accurately serves as a measure of its performance. In practice, AI benchmarks don't always provide an accurate picture of model quality. One of the reasons is a phenomenon called data contamination. It occurs when a model finds an existing answer to a benchmark question in an external source. LMArena, officially Arena Intelligence Inc., operates a cloud platform that helps AI developers mitigate benchmark quality issues such as data contamination. Instead of relying on static question sets to evaluate AI models, the platform uses a continuously refreshed set of prompts. Those prompts are crowdsourced from consumers. LMArena's platform provides a chatbot interface that enables users to search the web, generate code and perform other tasks. LMArena sends each prompt to two different AI models and displays the algorithms' output side-by-side. From there, the user selects the best prompt response. This feedback is used to benchmark neural networks' performance. LMArena uses the data it collects to power a regularly updated ranking of top-performing AI models. Gemini 3 Pro, a reasoning model that Google LLC released in November, is currently at the top of the list. It's followed by a scaled-down version of the algorithm called Gemini 3 Flash and xAI Corp.'s Grok 4.1. AI developers use the company's leaderboard to collect feedback on new models before broadly releasing them. OpenAI Group PBC, for example, tested GPT-5 on LMArena under the codename "summit" before its release. Additionally, LMArena provides AI developers with research datasets that can be used for tasks such as mapping out model jailbreaking tactics. "We cannot deploy AI responsibly without knowing how it delivers value to humans," said LMArena co-founder and Chief Executive Officer Anastasios Angelopoulos. "To measure the real utility of AI, we need to put it in the hands of real users." Today's funding round comes about four months after the company launched its first commercial service. AI Evaluations, as the offering is called, helps AI developers assess their models' using feedback from LMArena users. In addition to evaluation results, the service provides access to samples of the underlying feedback data that can be used to verify the numbers.
Share
Share
Copy Link
LMArena, the AI evaluation platform that started as a UC Berkeley research project, has raised $150 million in Series A funding at a $1.7 billion valuation. The startup, which lets users compare AI models through crowdsourced evaluations, tripled its valuation in just seven months and now serves over 5 million monthly users across 150 countries.
LMArena, the AI startup that transformed from a UC Berkeley research project into a commercial powerhouse, announced on Tuesday that it raised $150 million in Series A funding at a post-money valuation of $1.7 billion
1
2
. The round was co-led by Felicis and UC Investments, with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, and Laude Ventures1
. This marks a remarkable trajectory for the AI evaluation platform, which tripled its valuation in approximately seven months after raising $100 million at a $600 million valuation in May1
.
Source: The Next Web
LMArena, formerly known as Chatbot Arena, operates a web-based platform that allows users to compare large language models through anonymous, crowdsourced evaluations
2
. The platform sends each user prompt to two different AI models and displays their responses side-by-side, with users selecting which model performed better4
. This approach addresses a critical gap in traditional benchmarks, which often fail to capture how AI systems behave in real-world, open-ended human interactions3
. The platform now serves more than 5 million monthly users across 150 countries, generating 60 million conversations per month1
.
Source: SiliconANGLE
The startup's AI model leaderboards have become essential infrastructure for the AI industry, ranking various models from OpenAI GPT, Google Gemini, Anthropic Claude, and Grok across multiple tasks including text, web development, vision, and text-to-image generation
1
. Gemini 3 Pro currently tops the leaderboard, followed by Gemini 3 Flash and xAI's Grok 4.14
. What distinguishes LMArena from traditional benchmarks is its focus on human preference rather than isolated accuracy scores3
. The platform captures how people respond to tone, clarity, and real-world usefulness, providing a living signal that changes as prompts become less predictable3
.
Source: TechCrunch
In September, LMArena publicly launched AI Evaluations, a commercial service that allows enterprises, model labs, and developers to hire the company to perform model evaluations through its community
1
. This service achieved an annualized consumption rate of $30 million as of December, less than four months after launch1
3
. The rapid revenue growth demonstrates strong demand for neutral, third-party AI evaluation infrastructure as enterprises struggle to determine which AI models to trust3
.Traditional AI benchmarks face significant quality issues, particularly data contamination, which occurs when a model finds existing answers to benchmark questions in external sources
4
. LMArena mitigates these issues by using continuously refreshed prompts crowdsourced from real users rather than static question sets4
. As generative AI models grew larger and more similar, benchmark improvements became marginal, with models optimizing for tests themselves rather than real use cases3
. "To measure the real utility of AI, we need to put it in the hands of real users. LMArena does exactly this," said Anastasios Angelopoulos, co-founder and CEO of LMArena2
.Related Stories
Founded by UC Berkeley researchers Anastasios Angelopoulos and Wei-Lin Chiang, LMArena began as Chatbot Arena, an open research project originally funded through grants and donations
1
. The platform challenges the assumption that trust in AI will emerge naturally as models improve, instead treating trust as social and contextual, built through experience rather than vendor claims3
. AI developers including OpenAI have used the platform to test new models before broad release, with GPT-5 tested under the codename "summit"4
.The $150 million Series A funding reflects investor confidence that AI evaluation itself is becoming essential infrastructure as the number of models explodes
3
. LMArena will use the fresh capital to operate its platform, expand its technical team, and strengthen research capabilities2
. For regulators and policymakers, human-anchored signals matter as oversight frameworks need evidence reflecting real usage rather than idealized scenarios3
. While competitors like Scale AI's SEAL Showdown have emerged offering more granular rankings, and academic research notes voting-based leaderboards can be susceptible to manipulation, the demand for richer, human-grounded signals beyond traditional benchmarks continues to grow3
.Summarized by
Navi
[3]
22 May 2025•Startups

01 May 2025•Technology

21 Feb 2025•Technology

1
Policy and Regulation

2
Technology

3
Technology