EVMbench: AI Agents Tackle Smart Contract Security

OpenAI and Paradigm Introduce EVMbench for Smart Contract Security

ChatGPT maker OpenAI has partnered with crypto investment firm Paradigm and security firm OtterSec to launch EVMbench, a smart contract security benchmark designed to evaluate how AI agents detect, patch, and exploit vulnerabilities in Ethereum smart contracts 1

. The tool arrives as smart contracts secure over $100 billion in crypto assets and attackers stole $3.4 billion worth of funds in 2025 3

. EVMbench draws on 120 curated vulnerabilities from 40 professional audits, with most sourced from open audit competitions such as Code4rena 1

. The benchmark also includes scenarios from security auditing work for Tempo, Stripe's purpose-built layer-1 blockchain focused on high-throughput, low-cost stablecoin payments 1

Source: CoinGape

Testing AI Performance in Economically Meaningful Environments

OpenAI emphasized that measuring AI performance in economically meaningful environments has become critical as models evolve into powerful tools for both cyber attackers and defenders 1

. The weekly number of Ethereum smart contracts deployed reached an all-time high of 1.7 million in November 2025, with 669,500 deployed last week alone, according to Token Terminal 1

. EVMbench evaluates AI models across three distinct modes: detect, patch, and exploit 1

. In detection mode, agents audit repositories and receive scores based on their recall of confirmed smart contract vulnerabilities. In patch mode, agents must eliminate vulnerabilities without breaking intended functionality. The exploit mode phase tests agents on end-to-end fund-draining attacks in a sandboxed blockchain environment, with grading performed via deterministic transaction replay 1

Source: Cointelegraph

GPT-5.3-Codex Leads in Exploit Mode Performance

In exploit mode testing, GPT-5.3-Codex running via OpenAI's Codex CLI achieved a score of 72.2%, compared to 31.9% for GPT-5, which was released just six months earlier 1

. However, performance was weaker in the detect and patch tasks, where agents sometimes failed to audit exhaustively or struggled to preserve full contract functionality 1

. In separate testing focused on detection capabilities, Anthropic's Claude Opus 4.6 came out on top with an average detect award of $37,824, followed by OpenAI's OC-GPT-5.2 at $31,623 and Google's Gemini 3 Pro at $25,112 2

. OpenAI researchers cautioned that EVMbench does not fully capture real-world security complexity, but the benchmark provides a foundation for tracking AI progress in spotting and mitigating vulnerabilities at scale 1

AI for Cyber Defense and the Future of Stablecoin Payments

OpenAI stated that smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders 2

. The company added that it expects agentic stablecoin payments to grow, grounding the benchmark in a domain of emerging practical importance 2

. Circle CEO Jeremy Allaire predicted on January 22 that billions of AI agents will be transacting with stablecoins for everyday payments on behalf of users within five years 2

. Former Binance boss Changpeng Zhao also recently suggested that crypto would become the native currency for AI agents 2

. Dragonfly's managing partner Haseeb Qureshi believes the future of crypto transactions will be facilitated by AI-intermediated, self-driving wallets that manage complex operations and threats on behalf of users 2

OpenAI Expands Agent Development Team

As OpenAI pushed EVMbench into public view, the company also expanded its agent development team by hiring Peter Steinberger, founder of the viral open-source AI agent project OpenClaw, previously known as Clawdbot 3

. Sam Altman confirmed on X that Steinberger will join OpenAI to lead work on the next generation of personal agents 3

. OpenClaw will transition into a foundation model project supported by OpenAI, with the open-source project continuing under that structure 3

. The hiring signals OpenAI's increased focus on autonomous and next-generation personal AI agent capabilities as the convergence between AI and crypto accelerates.

OpenAI and Paradigm Launch EVMbench to Test AI Agents on Ethereum Smart Contract Security

OpenAI and Paradigm Introduce EVMbench for Smart Contract Security

Testing AI Performance in Economically Meaningful Environments

GPT-5.3-Codex Leads in Exploit Mode Performance

AI for Cyber Defense and the Future of Stablecoin Payments

OpenAI Expands Agent Development Team

References

Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground - Decrypt

OpenAI Researches AI Agents Detecting Smart Contract Flaws

OpenAI Introduces Smart Contract Benchmark for AI Agents as AI and Crypto Converge

Related Stories

Anthropic's AI agents autonomously generate $4.6M in smart contract exploits, raising alarm

OpenAI Launches Aardvark: GPT-5-Powered Autonomous Security Agent for Vulnerability Detection

OpenAI Codex Security scans 1.2M commits, finds 792 critical software vulnerabilities

Recent Highlights

OpenAI Releases GPT-5.4, New AI Model Built for Agents and Professional Work

Anthropic sues Pentagon over supply chain risk label after refusing autonomous weapons use

OpenAI secures $110 billion funding round as questions swirl around AI bubble and profitability

Recent Highlights

Today's Top Stories

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

Google uses AI and 5 million news reports to predict flash floods across 150 countries

Perplexity launches Personal Computer, an AI agent that runs 24/7 on your Mac mini

AI autocomplete covertly shifts human opinions on social issues, even when users ignore suggestions