TLDR OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws Built with Paradigm and OtterSec, it draws onTLDR OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws Built with Paradigm and OtterSec, it draws on

OpenAI EVMbench Results: How Claude, GPT-5 and Gemini Ranked on Crypto Security

2026/02/19 20:30
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

TLDR

  • OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws
  • Built with Paradigm and OtterSec, it draws on 120 real vulnerabilities from 40 audits
  • Anthropic’s Claude Opus 4.6 ranked first with a detect award of $37,824
  • OpenAI’s GPT-5.2 placed second at $31,623, Google’s Gemini 3 Pro third at $25,112
  • Crypto hackers stole $3.4 billion in 2025, making AI security tools more pressing

OpenAI has launched a new benchmark called EVMbench, built to test how well AI models can detect, exploit, and fix vulnerabilities in smart contracts.

The tool was created alongside crypto investment firm Paradigm and security firm OtterSec. Results were published in a research paper on Wednesday, February 18.

Smart contracts are permanent pieces of code that run on blockchains like Ethereum. They control billions of dollars across lending platforms and decentralized exchanges. Once deployed, they cannot easily be changed, so a single flaw can lead to major losses.

EVMbench used 120 real vulnerabilities pulled from 40 smart contract audits, most sourced from open-source security competitions.

Each AI model was scored using a “detect award,” which estimates the dollar value an AI could theoretically recover by correctly identifying a flaw in a contract.

How Each AI Model Ranked

Anthropic’s Claude Opus 4.6 took the top spot with an average detect award of $37,824.

OpenAI’s own OC-GPT-5.2 came in second at $31,623. Google’s Gemini 3 Pro placed third at $25,112.

The benchmark tested three core skills: finding security bugs, exploiting those bugs in a controlled setting, and patching the broken code without disrupting the contract.

Why OpenAI Built This Tool

Crypto attackers stole $3.4 billion in 2025, a slight increase from the year before. OpenAI said testing AI performance in “economically meaningful environments” is becoming more important as AI adoption grows.

OpenAI also noted it expects AI agents to play a growing role in stablecoin payments. Circle CEO Jeremy Allaire predicted in January that billions of AI agents will be transacting with stablecoins within five years.

What Comes Next

Dragonfly managing partner Haseeb Qureshi posted on X that smart contracts were never designed for human intuition. He said signing large transactions still feels “terrifying” due to threats like drainer wallets, unlike a standard bank transfer.

Qureshi believes AI-managed wallets will eventually handle these risks for everyday users. He compared the pairing to GPS meeting the smartphone.

OpenAI said it hopes EVMbench becomes a long-term standard for tracking AI progress in blockchain security.

Claude Opus 4.6 holding the top detect award score remains the latest data point from the published study.

The post OpenAI EVMbench Results: How Claude, GPT-5 and Gemini Ranked on Crypto Security appeared first on Blockonomi.

Market Opportunity
4 Logo
4 Price(4)
$0.007862
$0.007862$0.007862
-0.50%
USD
4 (4) Live Price Chart

SPACEX(PRE) Launchpad

SPACEX(PRE) LaunchpadSPACEX(PRE) Launchpad

Register for a chance to win a free lucky draw

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be

Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be

The post Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be appeared on BitcoinEthereumNews.com. Jordan Love and the Green Bay Packers are off to a 2-0 start. Getty Images The Green Bay Packers are, once again, one of the NFL’s better teams. The Cleveland Browns are, once again, one of the league’s doormats. It’s why unbeaten Green Bay (2-0) is a 8-point favorite at winless Cleveland (0-2) Sunday according to betmgm.com. The money line is also Green Bay -500. Most expect this to be a Packers’ rout, and it very well could be. But Green Bay knows taking anyone in this league for granted can prove costly. “I think if you look at their roster, the paper, who they have on that team, what they can do, they got a lot of talent and things can turn around quickly for them,” Packers safety Xavier McKinney said. “We just got to kind of keep that in mind and know we not just walking into something and they just going to lay down. That’s not what they going to do.” The Browns certainly haven’t laid down on defense. Far from. Cleveland is allowing an NFL-best 191.5 yards per game. The Browns gave up 141 yards to Cincinnati in Week 1, including just seven in the second half, but still lost, 17-16. Cleveland has given up an NFL-best 45.5 rushing yards per game and just 2.1 rushing yards per attempt. “The biggest thing is our defensive line is much, much improved over last year and I think we’ve got back to our personality,” defensive coordinator Jim Schwartz said recently. “When we play our best, our D-line leads us there as our engine.” The Browns rank third in the league in passing defense, allowing just 146.0 yards per game. Cleveland has also gone 30 straight games without allowing a 300-yard passer, the longest active streak in the NFL.…
Share
BitcoinEthereumNews2025/09/18 00:41
Mutuum Finance (MUTM) Update: V1 Protocol Goes Live, Key Mechanisms Explained

Mutuum Finance (MUTM) Update: V1 Protocol Goes Live, Key Mechanisms Explained

The start of April 2026 marks a significant turning point for the decentralized world. While many older networks are struggling with slow growth and high fees,
Share
Techbullion2026/04/02 19:46
Nedbank taps AI-powered lending to reach underserved South Africans

Nedbank taps AI-powered lending to reach underserved South Africans

The Johannesburg-headquartered lender, which operates in six other African markets, has integrated JUMO’s lending technology into its mobile banking platform.
Share
Techcabal2026/06/04 22:10

RealStocks Now Live

RealStocks Now LiveRealStocks Now Live

Trade real U.S. stock via regulated brokerage