OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts

OpenAI Drops EVMbench After Claude Vibe Code Disaster

2026/02/20 02:30
4 min read

OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit.

Smart contracts protect over $100 billion in open-source crypto assets. That number alone should explain why OpenAI’s latest move is drawing serious attention. The company, working alongside crypto investment firm Paradigm, rolled out EVMbench, a benchmark designed to test how well AI agents detect, exploit, and patch high-severity smart contract vulnerabilities.

The benchmark draws from 120 curated vulnerabilities pulled across 40 audits. Most of those came from open code audit competitions. What makes it different is the scope. EVMbench tests three distinct capability modes: detect, patch, and exploit, each measured separately and graded through a Rust-based harness that replays transactions in a sandboxed local environment. No live networks involved.

You might also like: Claude-Generated Code Linked to $1.78M DeFi Hack

The Number That Should Worry Everyone

In exploit mode, GPT-5.3-Codex via Codex CLI scored 72.2%. Six months back, GPT-5 sat at 31.9% on the same metric. That gap is not small. OpenAI confirmed the figures in its official announcement on X, framing EVMbench as both a measurement tool and a call to action for the security community.

Detect and patch scores remain lower. Agents in the detection setting sometimes identify a single vulnerability and then stop. They do not exhaust the codebase. In patch mode, the challenge is preserving full contract functionality while removing the flaw. That balance is still giving models trouble.

Must read: Trust Wallet Security Hack: How to Safeguard Your Assets

A $1.78M Oracle Error Nobody Caught

The backdrop to all of this matters. Security researcher evilcos flagged on X that the DeFi lending protocol Moonwell suffered a loss of approximately $1.78 million. The cause was an Oracle configuration error. A price feed formula was written incorrectly, setting cbETH’s value at $1.12 instead of approximately $2,200.

That is a low-level mistake. The kind of careful audit should catch. The GitHub pull request for proposal MIP-X43 showed commits co-authored by Claude Opus 4.6. Anthropic’s latest and most capable model at the time.

Smart contract auditor pashov posted on ,X calling it possibly the first exploit tied to vibe-coded Solidity. He was careful to note that human reviewers still hold final responsibility. A security auditor signs off before anything goes on-chain. But something in that chain broke down.

What EVMbench Is Actually Built to Do

The benchmark includes vulnerability scenarios from the security audit of the Tempo blockchain, a purpose-built L1 designed for high-throughput stablecoin payments. That extension pushes EVMbench into payment-oriented contract code, an area where OpenAI expects agentic stablecoin activity to grow.

Each exploit task runs in an isolated Anvil instance. Transactions replay deterministically. The grading setup restricts unsafe RPC methods and was red-teamed internally to stop agents from gaming results. Vulnerabilities used are historical and publicly documented.

OpenAI is also committing $10M in API credits to accelerate cyber defense, with priority given to open-source software and critical infrastructure. Its security research agent Aardvark, is expanding into private beta. Free codebase scanning for widely used open-source projects is part of that push.

The Vibe-Coding Question With Real Stakes

Pashov’s post on X raised what many in the DeFi space had been avoiding. When AI writes production Solidity code and humans approve it fast, the review layer gets thin. The Moonwell incident showed exactly how thin it can get.

OpenAI acknowledged that cybersecurity is inherently dual-use. Its response is evidence-based. Safety training, automated monitoring, and access controls for advanced capabilities are part of that. But a 72.2% exploit score on a public benchmark is the kind of number that does not stay quiet.

EVMbench’s full task set, tooling, and evaluation code are now public. The goal is to let researchers track AI cyber capabilities as they grow, and build defenses at the same pace. Whether that pace is fast enough is the question nobody has answered yet.

The post OpenAI Drops EVMbench After Claude Vibe Code Disaster appeared first on Live Bitcoin News.

Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0.004503
$0.004503$0.004503
-2.15%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Costco (COST) Stock: Evercore and Citi Raise Price Targets After Q2 Beat

Costco (COST) Stock: Evercore and Citi Raise Price Targets After Q2 Beat

TLDR Costco stock is trading near $1,000 after rising ~15% in 2026, outpacing the S&P 500. January net sales hit $21.33 billion, up 9.3% year over year. E-commerce
Share
Coincentral2026/02/22 16:39
XRP News: Altcoin Sees Biggest Realized Loss Since 2022

XRP News: Altcoin Sees Biggest Realized Loss Since 2022

Key Takeaways XRP prints biggest realized loss spike since 2022 (-$1.93B). Similar past event was followed by a strong multi-month […] The post XRP News: Altcoin
Share
Coindoo2026/02/22 15:52
The GENIUS Act Is Already Law. Banks Shouldn’t Try to Rewrite It Now

The GENIUS Act Is Already Law. Banks Shouldn’t Try to Rewrite It Now

The post The GENIUS Act Is Already Law. Banks Shouldn’t Try to Rewrite It Now appeared on BitcoinEthereumNews.com. Healthy competition drives innovation and better products for consumers; it is at the center of American economic leadership. Unfortunately, now that the bipartisan GENIUS Act has been signed into law, major legacy financial institutions seem to be having second thoughts about the innovations that stablecoins can bring to financial markets. Bank lobbying groups and public affairs teams have been peppering Congress with complaints about the law, urging members to reopen debate and introduce changes to the legislation that will ensure the stablecoin market doesn’t grow too quickly, protecting banks’ profits and stifling consumer choice. This reactionary response is both overblown and unnecessary. What legacy financial firms should do instead is embrace competition and offer exciting new products and services that consumers want, not try to kneecap emerging players through anti-innovation rules and regulations. The GENIUS Act was carefully designed with a thorough bipartisan process to strengthen consumer safeguards, ensure regulatory oversight, and preserve financial stability. Efforts to roll back its provisions are less about protecting families and more about protecting entrenched banking interests from the competition that helps ensure the U.S. banking system stays the strongest and most innovative in the world. Critics warn that allowing stablecoins to provide rewards could lead to massive deposit outflows from community banks, with figures as high as $6.6 trillion cited. But closer examination shows this fear is unfounded. A July 2025 analysis by consulting firm Charles River Associates found no statistically significant relationship between stablecoin adoption and community bank deposit outflows. In fact, the overwhelming majority of stablecoin reserves remain in the traditional financial system — either in commercial bank accounts or in short-term Treasuries — where they continue to support liquidity and credit in the broader U.S. economy. The dire estimates rely on unrealistic assumptions that every dollar of stablecoin issuance permanently…
Share
BitcoinEthereumNews2025/09/18 09:39