TLDR OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws Built with Paradigm and OtterSec, it draws onTLDR OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws Built with Paradigm and OtterSec, it draws on

OpenAI EVMbench Results: How Claude, GPT-5 and Gemini Ranked on Crypto Security

2026/02/19 20:30
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

TLDR

  • OpenAI released EVMbench, a benchmark that tests AI models on finding and fixing smart contract security flaws
  • Built with Paradigm and OtterSec, it draws on 120 real vulnerabilities from 40 audits
  • Anthropic’s Claude Opus 4.6 ranked first with a detect award of $37,824
  • OpenAI’s GPT-5.2 placed second at $31,623, Google’s Gemini 3 Pro third at $25,112
  • Crypto hackers stole $3.4 billion in 2025, making AI security tools more pressing

OpenAI has launched a new benchmark called EVMbench, built to test how well AI models can detect, exploit, and fix vulnerabilities in smart contracts.

The tool was created alongside crypto investment firm Paradigm and security firm OtterSec. Results were published in a research paper on Wednesday, February 18.

Smart contracts are permanent pieces of code that run on blockchains like Ethereum. They control billions of dollars across lending platforms and decentralized exchanges. Once deployed, they cannot easily be changed, so a single flaw can lead to major losses.

EVMbench used 120 real vulnerabilities pulled from 40 smart contract audits, most sourced from open-source security competitions.

Each AI model was scored using a “detect award,” which estimates the dollar value an AI could theoretically recover by correctly identifying a flaw in a contract.

How Each AI Model Ranked

Anthropic’s Claude Opus 4.6 took the top spot with an average detect award of $37,824.

OpenAI’s own OC-GPT-5.2 came in second at $31,623. Google’s Gemini 3 Pro placed third at $25,112.

The benchmark tested three core skills: finding security bugs, exploiting those bugs in a controlled setting, and patching the broken code without disrupting the contract.

Why OpenAI Built This Tool

Crypto attackers stole $3.4 billion in 2025, a slight increase from the year before. OpenAI said testing AI performance in “economically meaningful environments” is becoming more important as AI adoption grows.

OpenAI also noted it expects AI agents to play a growing role in stablecoin payments. Circle CEO Jeremy Allaire predicted in January that billions of AI agents will be transacting with stablecoins within five years.

What Comes Next

Dragonfly managing partner Haseeb Qureshi posted on X that smart contracts were never designed for human intuition. He said signing large transactions still feels “terrifying” due to threats like drainer wallets, unlike a standard bank transfer.

Qureshi believes AI-managed wallets will eventually handle these risks for everyday users. He compared the pairing to GPS meeting the smartphone.

OpenAI said it hopes EVMbench becomes a long-term standard for tracking AI progress in blockchain security.

Claude Opus 4.6 holding the top detect award score remains the latest data point from the published study.

The post OpenAI EVMbench Results: How Claude, GPT-5 and Gemini Ranked on Crypto Security appeared first on Blockonomi.

Market Opportunity
4 Logo
4 Price(4)
$0.009311
$0.009311$0.009311
-4.88%
USD
4 (4) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

The post CEO Sandeep Nailwal Shared Highlights About RWA on Polygon appeared on BitcoinEthereumNews.com. Polygon CEO Sandeep Nailwal highlighted Polygon’s lead in global bonds, Spiko US T-Bill, and Spiko Euro T-Bill. Polygon published an X post to share that its roadmap to GigaGas was still scaling. Sentiments around POL price were last seen to be bearish. Polygon CEO Sandeep Nailwal shared key pointers from the Dune and RWA.xyz report. These pertain to highlights about RWA on Polygon. Simultaneously, Polygon underlined its roadmap towards GigaGas. Sentiments around POL price were last seen fumbling under bearish emotions. Polygon CEO Sandeep Nailwal on Polygon RWA CEO Sandeep Nailwal highlighted three key points from the Dune and RWA.xyz report. The Chief Executive of Polygon maintained that Polygon PoS was hosting RWA TVL worth $1.13 billion across 269 assets plus 2,900 holders. Nailwal confirmed from the report that RWA was happening on Polygon. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 The X post published by Polygon CEO Sandeep Nailwal underlined that the ecosystem was leading in global bonds by holding a 62% share of tokenized global bonds. He further highlighted that Polygon was leading with Spiko US T-Bill at approximately 29% share of TVL along with Ethereum, adding that the ecosystem had more than 50% share in the number of holders. Finally, Sandeep highlighted from the report that there was a strong adoption for Spiko Euro T-Bill with 38% share of TVL. He added that 68% of returns were on Polygon across all the chains. Polygon Roadmap to GigaGas In a different update from Polygon, the community…
Share
BitcoinEthereumNews2025/09/18 01:10
Is Bitcoin Treasury Hype Fading? Data Suggests So

Is Bitcoin Treasury Hype Fading? Data Suggests So

Bitcoin treasury companies have seen a record-breaking 2025 so far, but CryptoQuant data shows momentum has started to slow down. Bitcoin Treasuries May Be Observing A Slowdown In a new post on X, on-chain analytics firm CryptoQuant has discussed how the latest trend is looking when it comes to Bitcoin corporate treasuries. Popularized by Michael […]
Share
Bitcoinist2025/09/18 06:00
Israel is losing close to $3 billion a week since fighting broke out with Iran, and markets are barely flinching

Israel is losing close to $3 billion a week since fighting broke out with Iran, and markets are barely flinching

Israel is losing close to $3 billion a week since fighting broke out with Iran, and markets are barely flinching. That figure comes from Israel’s Finance Ministry
Share
Cryptopolitan2026/03/05 05:20