Anthropic has unveiled the results of a study on how modern AI models can identify vulnerabilities in smart contracts. The developers tested Claude Sonnet 4.5, Claude Opus 4.5 and GPT-5 on the SCONE-bench set, which includes Ethereum and BNB Chain contract vulnerabilities from 2020-2025.
During the tests, the models successfully simulated exploits for about half of the historical incidents. In terms of assets held in the affected contracts at the time of the attacks, the total notional valuation exceeded $550 million.
Vulnerability search results using various AI models. Data: Anthropic.
A separate block of tests included contracts hacked after March 2025, the model knowledge cutoff date. On this sample, AI agents identified 19 vulnerabilities out of 34, corresponding to an estimated value of about $4.6 million.
These cases were not known to the models in advance and included several new types of flaws, company officials said.
The Claude Opus 4.5 model performed best in the SCONE-bench benchmark tests. It generated exploits for 17 cases, which is 50% of the sample, and would potentially mean $4.5 million in notional “revenue.”
The older models — Claude Sonnet 4.5 and GPT-5 — along with Opus 4.5 were able to detect 19 vulnerabilities out of 34 contracts tested. That’s about 55.8% of the test suite and about $4.6 million in notional funds.
Anthropic also tested whether the AI could find previously unknown issues in recently deployed contracts. Two such zero-day vulnerabilities were found among the new addresses. This, experts said, showed the models’ ability to identify bugs without prior signals or historical data.
The company notes that the research is not aimed at exploiting vulnerabilities, but was created to develop tools to evaluate the ability of AI systems to recognize flaws in code. Anthropic plans to use SCONE-bench as an open standard for testing and comparing LLM capabilities.
The authors of the paper envision that such models can be applied to the development and auditing of smart contracts, helping to detect bugs before deployment to the network.
Anthropic also points out that the study does not reflect the full level of risk because the analysis is limited to a sample of historical contracts and a controlled environment. The company will continue to expand the benchmark and explore the use of AI tools to support teams working with blockchain protocol security.



