Categories: Cyber Security News

OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments

OpenAI, in collaboration with crypto investment firm Paradigm, has introduced EVMbench, a new benchmark designed to evaluate the ability of AI agents to detect, patch, and exploit high-severity vulnerabilities in smart contracts.

The release marks a significant step in measuring AI capabilities within economically consequential environments, as smart contracts routinely secure over $100 billion in open-source crypto assets.

EVMbench draws on 120 curated vulnerabilities sourced from 40 security audits, with the majority derived from open code audit competitions on platforms such as Code4rena.

The benchmark also incorporates vulnerability scenarios from the security auditing process of the Tempo blockchain, a purpose-built Layer 1 designed for high-throughput stablecoin payments, extending EVMbench’s scope into payment-oriented smart contract code an area where agentic stablecoin transactions are expected to grow substantially.

Three Evaluation Modes

EVMbench evaluates AI agents across three distinct capability modes, each targeting a different phase of the smart contract security lifecycle.

Mode Description
Detect Agents audit a smart contract repository and are scored on recall of ground-truth vulnerabilities and associated audit rewards
Patch Agents modify vulnerable contracts while preserving intended functionality, verified through automated tests and exploit checks
Exploit Agents execute end-to-end fund-draining attacks against deployed contracts in a sandboxed blockchain environment, graded via transaction replay and on-chain verification

To support reproducible evaluation, OpenAI developed a Rust-based harness that deploys contracts deterministically and restricts unsafe RPC methods. All exploit tasks run in an isolated local Anvil environment rather than on live networks.

Frontier model performance on EVMbench reveals clear behavioral differences across task types. In the exploit mode, GPT‑5.3‑Codex achieved a score of 72.2%, a substantial improvement over GPT‑5, which scored 31.9% approximately six months prior.

Agents consistently perform best on exploit tasks, where the objective is explicit: drain funds and iterate until successful. Detect and patch modes remain harder, with agents sometimes stopping after identifying a single vulnerability rather than completing a full audit, and struggling to remove subtle flaws without breaking existing contract functionality.

OpenAI acknowledged that EVMbench does not fully reflect the difficulty of real-world smart contract security, and that its grading system cannot currently distinguish between true vulnerabilities and false positives when agents find issues beyond the human-auditor baseline.

Alongside the benchmark release, OpenAI committed $10 million in API credits through its Cybersecurity Grant Program to accelerate defensive security research, particularly for open-source software and critical infrastructure.

The company also announced the expansion of Aardvark, its security research agent, through a private beta program. EVMbench’s tasks, tooling, and evaluation framework have been released publicly to support continued research into AI-driven cyber capabilities.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

The post OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments appeared first on Cyber Security News.

rssfeeds-admin

Recent Posts

The Next James Bond: Casting Director of Game of Thrones and Star Wars Sequels Leading the Search for Daniel Craig’s Replacement

The casting search for the next actor to play James Bond is officially underway. Amazon…

7 minutes ago

Get an $1,800 Power Lift Recliner and Massage Chair for Just $375 During the Wayfair Memorial Day Sale

I can think of few activities I'd enjoy more than playing a video game on…

7 minutes ago

DC’s Absolute Universe Dominates the 2026 Eisner Award Nominations

The list of nominees for the 2026 Will Eisner Comic Industry Awards has been revealed.…

1 hour ago

New Malware Framework Enables Screen Control, Browser Artifact Access, and UAC Bypass

A newly uncovered malware framework is raising serious alarms across the cybersecurity community. Researchers have…

2 hours ago

node-ipc npm Package with 822K Weekly Downloads Compromised in Supply Chain Attack

A widely used JavaScript inter-process communication library has been weaponized again. Socket and Stepsecurity have…

2 hours ago

Anthropic’s Mythos AI Reportedly Found macOS Vulnerabilities that Could Bypass Apple Security

Security researchers at Calif, a Palo Alto-based cybersecurity firm, have used techniques derived from an…

2 hours ago

This website uses cookies.