Categories: Cyber Security News

OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments

OpenAI, in collaboration with crypto investment firm Paradigm, has introduced EVMbench, a new benchmark designed to evaluate the ability of AI agents to detect, patch, and exploit high-severity vulnerabilities in smart contracts.

The release marks a significant step in measuring AI capabilities within economically consequential environments, as smart contracts routinely secure over $100 billion in open-source crypto assets.

EVMbench draws on 120 curated vulnerabilities sourced from 40 security audits, with the majority derived from open code audit competitions on platforms such as Code4rena.

The benchmark also incorporates vulnerability scenarios from the security auditing process of the Tempo blockchain, a purpose-built Layer 1 designed for high-throughput stablecoin payments, extending EVMbench’s scope into payment-oriented smart contract code an area where agentic stablecoin transactions are expected to grow substantially.

Three Evaluation Modes

EVMbench evaluates AI agents across three distinct capability modes, each targeting a different phase of the smart contract security lifecycle.

Mode	Description
Detect	Agents audit a smart contract repository and are scored on recall of ground-truth vulnerabilities and associated audit rewards
Patch	Agents modify vulnerable contracts while preserving intended functionality, verified through automated tests and exploit checks
Exploit	Agents execute end-to-end fund-draining attacks against deployed contracts in a sandboxed blockchain environment, graded via transaction replay and on-chain verification

To support reproducible evaluation, OpenAI developed a Rust-based harness that deploys contracts deterministically and restricts unsafe RPC methods. All exploit tasks run in an isolated local Anvil environment rather than on live networks.

Frontier model performance on EVMbench reveals clear behavioral differences across task types. In the exploit mode, GPT‑5.3‑Codex achieved a score of 72.2%, a substantial improvement over GPT‑5, which scored 31.9% approximately six months prior.

Agents consistently perform best on exploit tasks, where the objective is explicit: drain funds and iterate until successful. Detect and patch modes remain harder, with agents sometimes stopping after identifying a single vulnerability rather than completing a full audit, and struggling to remove subtle flaws without breaking existing contract functionality.

OpenAI acknowledged that EVMbench does not fully reflect the difficulty of real-world smart contract security, and that its grading system cannot currently distinguish between true vulnerabilities and false positives when agents find issues beyond the human-auditor baseline.

Alongside the benchmark release, OpenAI committed $10 million in API credits through its Cybersecurity Grant Program to accelerate defensive security research, particularly for open-source software and critical infrastructure.

The company also announced the expansion of Aardvark, its security research agent, through a private beta program. EVMbench’s tasks, tooling, and evaluation framework have been released publicly to support continued research into AI-driven cyber capabilities.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

The post OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments appeared first on Cyber Security News.

News alert: CredShields and Checkmarx partner to extend AppSec into Web3 and smart contracts

November 21, 2025

In "The Last Watchdog"

OpenAI’s New Aardvark GPT-5 Agent that Detects and Fixes Vulnerabilities Automatically

OpenAI has unveiled Aardvark, an autonomous AI agent powered by its cutting-edge GPT-5 model, designed to detect software vulnerabilities and automatically propose fixes. This tool aims to entrust developers and security teams by scaling human-like analysis across vast codebases, addressing the escalating challenge of protecting software in an era where…

November 2, 2025

In "Cyber Security News"

CredShields Joins Forces with Checkmarx to Bring Smart Contract Security to Enterprise AppSec Programs

Singapore, Singapore, November 19th, 2025, CyberNewsWire The collaboration advances enterprise grade application security into decentralized ecosystems, uniting Checkmarx’s AppSec expertise with Web3 specialization by CredShields. CredShields, a leading Web3 security firm, has partnered with Checkmarx, the global leader in agentic AI-powered application security testing, to work with AI-driven smart contract…

November 19, 2025

In "Cyber Security News"

rssfeeds-admin

Next ClawHavoc Poisons OpenClaw’s ClawHub With 1,184 Malicious Skills »

Previous « Guardian AI-Penetration Testing Tool Connects Gemini, GPT-4 with 19 Security Tools Including Nmap

Published by

rssfeeds-admin

3 months ago

The Next James Bond: Casting Director of Game of Thrones and Star Wars Sequels Leading the Search for Daniel Craig’s Replacement

The casting search for the next actor to play James Bond is officially underway. Amazon…

7 minutes ago

Get an $1,800 Power Lift Recliner and Massage Chair for Just $375 During the Wayfair Memorial Day Sale

I can think of few activities I'd enjoy more than playing a video game on…

7 minutes ago

DC’s Absolute Universe Dominates the 2026 Eisner Award Nominations

The list of nominees for the 2026 Will Eisner Comic Industry Awards has been revealed.…

1 hour ago

Cyber Security News

New Malware Framework Enables Screen Control, Browser Artifact Access, and UAC Bypass

A newly uncovered malware framework is raising serious alarms across the cybersecurity community. Researchers have…

2 hours ago

Cyber Security News

node-ipc npm Package with 822K Weekly Downloads Compromised in Supply Chain Attack

A widely used JavaScript inter-process communication library has been weaponized again. Socket and Stepsecurity have…

2 hours ago

Cyber Security News

Anthropic’s Mythos AI Reportedly Found macOS Vulnerabilities that Could Bypass Apple Security

Security researchers at Calif, a Palo Alto-based cybersecurity firm, have used techniques derived from an…

2 hours ago

This website uses cookies.

OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments

Three Evaluation Modes

Related

News alert: CredShields and Checkmarx partner to extend AppSec into Web3 and smart contracts

OpenAI’s New Aardvark GPT-5 Agent that Detects and Fixes Vulnerabilities Automatically

CredShields Joins Forces with Checkmarx to Bring Smart Contract Security to Enterprise AppSec Programs

Recent Posts

The Next James Bond: Casting Director of Game of Thrones and Star Wars Sequels Leading the Search for Daniel Craig’s Replacement

Get an $1,800 Power Lift Recliner and Massage Chair for Just $375 During the Wayfair Memorial Day Sale

DC’s Absolute Universe Dominates the 2026 Eisner Award Nominations

New Malware Framework Enables Screen Control, Browser Artifact Access, and UAC Bypass

node-ipc npm Package with 822K Weekly Downloads Compromised in Supply Chain Attack

Anthropic’s Mythos AI Reportedly Found macOS Vulnerabilities that Could Bypass Apple Security

OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments

Three Evaluation Modes

Related

Related Post

Recent Posts