AI Betting Losses Expose KellyBench Reality Gap

Advanced AI Systems Struggle with Real-World Prediction Tasks

AI models consistently lost money when challenged to bet on Premier League football matches in a comprehensive new study that questions the current trajectory of AI capabilities. London-based AI start-up General Reasoning released KellyBench this week, a long-horizon test that placed eight frontier AI models including systems from Google, OpenAI, Anthropic, and xAI into a virtual recreation of the 2023-24 Premier League season 1

. The results paint a sobering picture of AI betting performance and highlight the gap between AI hype and reality that has dominated Silicon Valley discourse.

Source: Benzinga

Every Major AI Model Failed to Turn a Profit

The study provided each AI system with detailed historical data and statistics about teams and previous games, instructing them to build models that would maximize returns and manage risk. Anthropic's Claude Opus 4.6 performed best among the tested models, with an average loss of 11 per cent and nearly breaking even on one attempt 1

. However, even this relatively strong performance still represented a net loss. xAI's Grok 4.20 fared worst, going bankrupt once and failing to complete the other two attempts. Google's Gemini 3.1 Pro showed inconsistent results, managing to turn a 34 per cent profit on one attempt but going bankrupt on another 1

General Reasoning rated each model on a 44-point sophistication rubric developed with quantitative betting experts, and no model scored higher than a third of available points 2

. The researchers concluded that "models struggle to behave coherently over long time horizons, often failing to act upon their analysis or failing to adapt as the world changes" 2

KellyBench Reveals Limitations in Autonomous AI Financial Decision-Making

The KellyBench study exposes critical weaknesses in how AI systems handle long-term prediction and adaptation to evolving circumstances. Ross Taylor, one of the study's authors and General Reasoning's chief executive, noted that "there is so much hype about AI automation but there's not a lot of measurement of putting AI into a longtime horizon setting" 1

. The former Meta AI researcher emphasized that many benchmarks typically used to test AI are flawed because they operate in "very static environments" that bear little resemblance to the chaos and complexity of the real world 1

This disconnect between controlled testing environments and real-world scenarios raises questions about autonomous AI financial decision-making capabilities that many industry leaders have touted. The AI agents were given three attempts each to turn a profit, with access to updated player data as the season progressed, yet "every frontier model we evaluated lost money over the season and many experienced ruin," with the AI "systematically underperforming humans" in this scenario 1

Implications for AI Impact on White-Collar Employment

The findings offer perspective on concerns about AI displacing white-collar professionals, even as nearly 80,000 tech workers were laid off in the first quarter of 2026 alone, with almost half of those cuts attributed to AI automation 2

. The Citrini scenario, which holds that AI agents will rapidly displace white-collar workers and trigger a credit and deflationary spiral, may need reconsideration in light of these results. On Kalshi, traders currently price the Citrini scenario at around 23 per cent, a market that has attracted over $25 million in volume 2

Taylor pointed out the contrast between AI's impressive software engineering capabilities and its struggles with other real-world tasks: "If you try AI on some real-world tasks, it does really badly. Yes, software engineering is very important and economically valuable, but there are lots of other activities with longer time horizons that are important to look at" 1

. A Polymarket contract on whether the AI market bubble bursts by December 31, 2026, currently sits at 20 per cent, with $2.5 million traded 2

Top AI models lose money on Premier League bets, exposing limits in real-world prediction

Advanced AI Systems Struggle with Real-World Prediction Tasks

Every Major AI Model Failed to Turn a Profit

KellyBench Reveals Limitations in Autonomous AI Financial Decision-Making

Implications for AI Impact on White-Collar Employment

References

AI punters lose their shirts on Premier League bets

AI Can Code, But It Can't Bet: Why Top Models Are Going Broke On Sports Markets - Amazon.com (NASDAQ:AMZN

Related Stories

AI Models Battle in Crypto Trading Showdown: DeepSeek and Grok Lead the Pack

Sam Altman Warns of AI Bubble While OpenAI Seeks $500B Valuation

AI Trading Systems Exhibit Gambling Addiction Behaviors, Researchers Warn

Recent Highlights

Meta unveils Muse Spark AI model as Superintelligence Labs makes its debut

Anthropic restricts Mythos AI model release, citing unprecedented cybersecurity risks

Anthropic sends Claude AI to psychiatrist, discovers functional emotions that shape behavior

Recent Highlights

Today's Top Stories

US Officials Summon Wall Street Banks to Address Anthropic's Mythos AI Cyber Risks

Alibaba reveals HappyHorse 1.0 AI video model that topped global benchmarks after stealth debut

Google AI Mode expands restaurant booking globally with new interface redesign

Anthropic's Claude Cowork enters enterprise arena with advanced controls and desktop automation