Researchers propose new framework to protect users when AI agents fumble with their money

2 Sources

Share

A consortium of researchers from Microsoft Research, Google DeepMind, Columbia University, and AI startups has unveiled the Agentic Risk Standard, a financial protection framework designed to compensate users when AI agents misexecute trades or cause financial losses. The open-source protocol introduces escrow, collateral, and underwriting mechanisms to address what researchers call a 'guarantee gap' between probabilistic AI reliability and the enforceable guarantees users need for high-stakes tasks.

A New Financial Safety Net for AI Agents

As AI agents increasingly handle payments, trades, and financial transactions, a fundamental question has emerged: who pays when these systems make mistakes? Researchers from Microsoft Research, Google DeepMind, Columbia University, Virtuals Protocol, and t54 Labs have proposed the Agentic Risk Standard (ARS), a financial protection framework designed to compensate users when AI agents fail

1

. Published on April 8 and made available as an open-source protocol on GitHub, ARS aims to do for AI agents what escrow, insurance, and clearinghouses do for traditional financial markets

1

.

Source: Decrypt

Source: Decrypt

The framework addresses what researchers identify as a critical vulnerability in the emerging agentic economy. Unlike conventional AI safety research that focuses on reducing the probability of failure, ARS takes a complementary approach by formalizing what happens financially when systems inevitably fail

1

. "Most trustworthy AI research aims to reduce the probability of failure," said Wenyue Hua, Senior Researcher at Microsoft Research. "That work is essential, but probability is not a guarantee. ARS takes a complementary approach: instead of trying to make the model perfect, we formalize what happens financially when it isn't"

1

.

The Guarantee Gap and Why Current Safety Measures Fall Short

The core problem ARS tackles is what researchers call a "guarantee gap"—a disconnect between the probabilistic reliability that AI safety techniques provide and the enforceable guarantees users need before delegating high-stakes tasks

1

. Large language models are inherently stochastic, meaning that no matter how well trained or tuned an AI agent is, it can still hallucinate and make mistakes

1

. When that agent is executing financial API calls or managing brokerage accounts, even a single AI agent failure can produce immediate, realized loss

1

.

"Technical safeguards can offer only probabilistic reliability, whereas users in high-stakes settings often require enforceable guarantees over outcomes," the paper stated

2

. Without a way to bound potential losses, users rationally limit AI delegation to low-risk tasks, constraining broader adoption of agent-based services

1

. The financial ecosystem currently has no way to operate other than to defer all liability back to a human, according to t54 founder Chandler Fang

1

.

How the Settlement Framework Actually Works

The Agentic Risk Standard borrows directly from centuries of financial engineering to manage financial risk through a layered settlement framework

1

. The system distinguishes between two types of AI jobs, each requiring different levels of user protection

1

.

For standard service tasks like generating slide decks or writing reports that carry limited financial exposure, escrow-based settlement is sufficient

1

. Service fees are held in escrow vaults and released only upon verified task delivery

1

. For high-risk financial operations involving currency trading, leveraged positions, or other tasks requiring access to user funds before outcomes can be verified, the framework introduces underwriting

1

.

Underwriting brings in a risk-bearing third party that evaluates the danger of an AI failure, requires service providers to post collateral, charges a premium, and commits to reimbursing users if things go wrong

1

2

. It's the same logic that governs derivatives markets, where clearinghouses stand between counterparties so that a single default doesn't cascade

1

.

Regulatory Attention and Real-World Implementation Challenges

Financial regulators are already circling. FINRA's 2026 regulatory oversight report, released in December, included a first-ever section on generative AI, warning broker-dealers to develop procedures specifically targeting AI hallucinations and to scrutinize AI agents that may act "beyond the user's actual or intended scope and authority"

1

. The SEC and other agencies are watching closely

1

.

ARS is pitched not as a set of rules, but as a protocol—a standardized state machine that governs how funds are locked, how claims are filed, and how reimbursements are triggered when AI agents fail

1

. The researchers tested the system using a simulation that ran 5,000 trials, though they acknowledged the experiment was limited and not designed to reflect real-world failure rates

2

.

The real bottleneck, researchers acknowledge, will be building accurate risk-pricing models for agentic behavior

1

. Accurate failure-rate estimates remain the main challenge, as both over- and underestimation create systemic risks

2

. The paper noted that non-financial harms such as hallucination, defamation, or psychological harm remain outside the framework

2

. Future work will focus on risk modeling for diverse failure modes, empirical measurement of failure frequencies under deployment-like conditions, and designing underwriting and collateral schedules that remain robust under detector error and strategic behavior

2

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved