OpenAI Launches EVMbench: A New Framework to Detect and Exploit Blockchain Vulnerabilities

OpenAI has collaborated with crypto investment firm Paradigm to release EVMbench, a new benchmark designed to evaluate how artificial intelligence agents interact with smart contract security.

As smart contracts currently secure over $100 billion in open-source crypto assets, the ability of AI to successfully read, write, and audit code is becoming a critical component of the financial infrastructure.

EVMbench Capabilities and Methodology

The EVMbench framework is built on a dataset of 120 curated high-severity vulnerabilities sourced from 40 different audits and open code competitions.

It also includes specific vulnerability scenarios derived from the security audit of the Tempo blockchain.

To ensure safety and reproducibility, the system utilizes a Rust-based harness that restricts unsafe RPC methods and runs all exploit tasks in an isolated, local Anvil environment rather than on live networks.

This allows for rigorous testing without risking actual assets or network stability.

The framework evaluates agents across three distinct capability modes designed to mimic real-world security tasks.

The Detect mode challenges agents to audit a smart contract repository and identify ground-truth vulnerabilities based on historical data.

EVMbench Capability Modes

Capability Mode	Description	Verification Method
Detect	Agents audit repositories to find known vulnerabilities.	Scored on recall of ground-truth vulnerabilities and audit rewards.
Patch	Agents modify contracts to remove exploits while keeping functionality.	Verified through automated tests to ensure the exploit is gone and code compiles.
Exploit	Agents attempt to drain funds from a deployed contract.	Graded programmatically via transaction replay on a sandboxed blockchain.

The Patch mode requires agents to fix the identified issues without breaking the contract’s intended functionality or causing compilation errors.

Finally, the Exploit mode tests an agent’s ability to execute end-to-end fund-draining attacks in a sandboxed environment, providing a clear metric of offensive capability that defenders must guard against.

Model Performance and Safety Initiatives

The release of EVMbench highlights significant progress in AI model capabilities regarding cybersecurity tasks.

In the exploit mode evaluation, OpenAI’s GPT-5.3-Codex achieved a success rate of 72.2 percent. This represents a substantial increase in capability compared to the GPT-5 model, which scored 31.9 percent just over six months ago.

Despite these gains in offensive testing, the report notes that performance remains weaker in detection and patching tasks.

Agents often struggle to maintain full functionality while removing subtle bugs, indicating that human oversight remains essential in the auditing process.

Recognizing the dual-use nature of cybersecurity tools, OpenAI is emphasizing an evidence-based approach to aid defenders.

This includes expanding their security research agent, Aardvark, and committing $10 million in API credits through their Cybersecurity Grant Program.

These initiatives aim to accelerate cyber defense for open-source software and critical infrastructure.

While EVMbench has limitations such as not supporting complex timing mechanics or mainnet forks, it represents a significant step toward standardizing how AI interacts with blockchain security.

Tags:

3 Comments

Reply

admin April 7, 2022

WelcRimply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since
- Reply
  
  admin April 7, 2022
  
  The leap into electronic typesetting, remaining essentiallyuncha opularisedthe with the release of Letrasetsheets containingthe leap remaining essentially unchanged.
Reply

admin April 7, 2022

Travel orem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a gallery Followe yof type and scrambled it to make a type specimen book.

ClickFix Exploits Homebrew Workflow to Deploy Cuckoo Stealer

Cloud Security

May 31, 2022

TeamPCP Turns Cloud Misconfigurations Into a Self-Propagating Cybercrime

Malware Analysis

May 31, 2022

Fake CAPTCHA Attack Chain Triggers Enterprise-Wide Malware Infection

Cyber Security

May 31, 2022

transparent tribe hacker group targets indias startup ecosystem

Digital Forensics

May 31, 2022

EDR-Freeze: Technical Mechanics and Forensic Artifacts Exposed

Cyber Security

April 1, 2022

Russian Hacker Alliance Launches Large-Scale Cyberattack Targeting Denmark

Web Application Security

March 31, 2022

Dell RecoverPoint for VMs Zero-Day CVE-2026-22769 Exploited Since

Cyber Security

February 28, 2022

Researchers Expose DigitStealer C2 Infrastructure Targeting macOS Users

SOC & Blue Team

February 28, 2022

Critical MCP Server Enables Arbitrary Code Execution and

Cyber Security

February 28, 2022

New Threat Emerges as Attackers Leverage Grok and

Best cyber security courses ; certification in Lucknow offered by D.R.D Security provide industry-leading training, hands-on practical labs, and career-oriented learning for students,

About Us

Categories

Our Location

OpenAI Launches EVMbench: A New Framework to Detect and Exploit Blockchain Vulnerabilities

EVMbench Capabilities and Methodology