OpenAI has unveiled a benchmarking framework aimed at measuring how effectively AI agents can detect, mitigate, and even exploit security vulnerabilities in crypto smart contracts. The project, titled “EVMbench: Evaluating AI Agents on Smart Contract Security,” was released in collaboration with Paradigm and OtterSec, two organizations with deep exposure to blockchain security and investment. The study assesses AI agents against a curated set of 120 potential weaknesses drawn from 40 smart contract audits, seeking to quantify not just detection and patching capabilities but also the theoretical exploit potential of these agents in a controlled environment.
Detect awards for AI agents are detailed in the OpenAI PDF accompanying the study, which also describes the evaluation methodology and the scenarios used to simulate real-world smart-contract risk. The authors emphasize that while AI agents have evolved to automate a wide range of routine tasks, assessing their performance in “economically meaningful environments” is essential to understanding how they’ll perform under pressure in production systems.
OpenAI notes that it expects agentic technologies to broaden the scope of payments and settlement, including stablecoins used in automated workflows. The discussion around AI-enabled payments extends beyond security testing to the broader question of how autonomous systems will participate in daily financial activity. The company’s own projections suggest that agentic payments could become more commonplace, grounding AI capabilities in practical use cases that touch everyday consumer transactions.
In tandem with the benchmark results, Circle CEO Jeremy Allaire has publicly forecast that billions of AI agents could be transacting with stablecoins for everyday payments within the next five years. That view intersects with a recurring theme in crypto circles: the potential for crypto to become the native currency of AI agents, a narrative that has gained notable attention from industry leaders and investors alike. While such predictions remain speculative, the underlying trend is clear—AI automation is moving from the lab to the transaction layer, where it could reshape how value moves across networks.
The study arrives at a moment when crypto security continues to be a significant risk factor for investors. The data point about 2025’s assault on crypto funds—where attackers pulled roughly $3.4 billion—highlights the urgency of improved tooling and faster, more reliable patching mechanisms. The EVMbench framework is positioned, in part, as a way to measure whether AI agents can meaningfully contribute to defensive capabilities at scale, reducing exploitation opportunities and accelerating threat mitigation.
To build the benchmark, researchers drew on 120 curated vulnerabilities spanning 40 smart contract audits, with many weaknesses traced back to open-source audit challenges. OpenAI argues the benchmark will help track AI progress in recognizing and mitigating contract-level weaknesses at scale, offering a standardized way to compare future AI models as they evolve. The study also provides a lens into how AI might be applied to normalizing risk assessment across a wide range of smart-contract architectures, rather than focusing solely on isolated cases.
In a contemporaneous thread on X, Haseeb Qureshi, a partner at Dragonfly, argued that crypto’s promise of replacing property rights and traditional contracts never materialized not because the technology failed, but because it was never designed with human intuition in mind. He has highlighted the persistent fear associated with signing large transactions in an environment where drainer wallets and other attack vectors remain a constant threat, in stark contrast to the comparatively smoother experience of traditional bank transfers.
Qureshi contends that the next phase of crypto transactions could be enabled by AI-intermediated, self-driving wallets. Such wallets would monitor risk, manage complex operations, and autonomously respond to threats on behalf of users, potentially reducing the friction and fear that characterize large transfers today.
The broader takeaway from this thread is that AI agents may play a critical role in transforming how people interact with crypto—shifting from manual, error-prone transactions to automated, risk-aware processes that can scale with adoption. As AI agents begin to demonstrate more competence in handling security concerns, users could see improved reliability and resilience in decentralized finance workflows, even as the underlying technologies continue to mature.
The EVMbench study demonstrates that large language models and related AI agents are beginning to perform meaningful security work in the smart contract space, with clearly quantifiable differences across models. Claude Opus 4.6’s lead in average detect awards signals that certain architectures may be more adept at spotting and mitigating vulnerabilities within complex contract logic, while others trail, offering a spectrum of capabilities that researchers will likely want to refine. The inclusion of multiple industry partnerships in the project underscores the growing consensus that AI-enabled security and automated risk management could become essential to scale in decentralized environments.
As the field evolves, observers will be watching for how quickly AI agents can transition from detection to remediation, and whether these agents can operate reliably in live systems without introducing new risks. The conversation about AI-driven wallets and autonomous payments touches on a broader set of questions around security governance, user consent, and regulatory alignment. If the trajectory suggested by OpenAI and its partners continues, AI-assisted tools could become a core component of future crypto infrastructure, changing both the risk calculus and the user experience in meaningful ways. The next round of benchmarks, alongside real-world deployments, will help determine how quickly this vision materializes and what safeguards must accompany it.
This article was originally published as OpenAI Pits AI Agents Against Each Other to Red-Team Smart Contracts on Crypto Breaking News – your trusted source for crypto news, Bitcoin news, and blockchain updates.


