OpenAI Launches EVMbench

OpenAI just stepped into the EVM and brought a benchmark with teeth. From San Francisco, the AI lab founded in 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and others introduced EVMbench, an open framework designed to test whether AI agents can actually perform in smart contract security or simply narrate competence. In a cycle crowded with model demos and leaderboard theater, this is the kind of tech news that separates claims from capability.

EVMbench was developed in collaboration with Paradigm, the San Francisco based crypto investment firm founded in 2018 by Matt Huang and Fred Ehrsam, and OtterSec, the web3 security auditing firm founded by Robert Chen. The premise is simple and unforgiving. AI agents must Detect real vulnerabilities, Patch them without breaking core functionality, and Exploit them end to end inside a sandboxed EVM. Detect, Patch, Exploit. 3 verbs that turn theory into consequence.

Underneath the surface sit 120 curated vulnerabilities drawn from 40 real audit reports, including findings connected to Tempo, the payments focused Layer 1 co developed by Paradigm and Stripe. These are not synthetic puzzles drafted for academic comfort. They are extracted from live codebases where security failures carry financial weight. Agents scan repositories, produce vulnerability reports, submit code fixes, and, when prompted, attempt to drain funds in a controlled environment via JSON RPC. If the exploit executes, the benchmark records it. If the patch breaks functionality, it records that too.

The technical paper credits Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, and Olivia Watkins. OpenAI, Paradigm, OtterSec. Research depth from AI, crypto, and frontline auditing in one frame. Most coverage names the organizations, not the individuals, but the author list tells you who engineered the test itself. That matters in tech news, where credibility is often buried beneath volume.

Smart contracts secure tens of billions in value. Every overlooked vulnerability is a liability with a clock attached. As AI models sharpen in code understanding and generation, the question shifts from can they write contracts to can they secure them, and if not, can they exploit them faster than human defenders can react.

OpenAI has already expanded its security posture with initiatives like Trusted Access for Cyber and API credits aimed at defenders. EVMbench aligns with that trajectory. It is not a product pitch. It is a measuring instrument published in public view. In a market saturated with forward looking statements, measurement is leverage.

Crypto has long pursued trust minimized systems. AI pursues capability maximized systems. EVMbench stands between them and forces both to prove performance under pressure. This is the strain of tech news that does not fade after a 24 hour cycle. It lingers in roadmaps, audit workflows, and model evaluations. The only real question now is who runs the benchmark next, and what it reveals when their agent meets the code.

OpenAI Launches EVMbench

Related Articles

GoodShip and Cargado Announce June 2026 Cross-Border Freight Benchmarking Partnership

Perplexity Launches Brain to Give AI Agents Self-Improving Work Memory

Anaplan Launches Agentic Enterprise to Expand Enterprise AI Decision Infrastructure

Jane Street's $7B CoreWeave Bet Signals a Compute Shift

AI Capital Flows Are Becoming the Market's Strongest Signal

More from Jesse Landry

Kind Designs Raises $10M Pre-Series A for Living Seawalls

Alpaca Raises $135M to Expand AI-Native Brokerage Infrastructure

Lendistry Secures $100M East West Bank Credit Facility for Airport Lending

Trending

Skate to Where the Workload Is Going Examines the Next Enterprise AI PC Decision

Dayforce

360 Capital

OpenAI Launches EVMbench

Related Articles

GoodShip and Cargado Announce June 2026 Cross-Border Freight Benchmarking Partnership

Perplexity Launches Brain to Give AI Agents Self-Improving Work Memory

Anaplan Launches Agentic Enterprise to Expand Enterprise AI Decision Infrastructure

Jane Street's $7B CoreWeave Bet Signals a Compute Shift

AI Capital Flows Are Becoming the Market's Strongest Signal

More from Jesse Landry

Kind Designs Raises $10M Pre-Series A for Living Seawalls

Alpaca Raises $135M to Expand AI-Native Brokerage Infrastructure

Lendistry Secures $100M East West Bank Credit Facility for Airport Lending

Trending

Skate to Where the Workload Is Going Examines the Next Enterprise AI PC Decision

Dayforce

360 Capital

Related Articles

News
GoodShip and Cargado Announce June 2026 Cross-Border Freight Benchmarking Partnership
Jul 6, 2026

News
Perplexity Launches Brain to Give AI Agents Self-Improving Work Memory
Jul 5, 2026

News
Anaplan Launches Agentic Enterprise to Expand Enterprise AI Decision Infrastructure
Jul 4, 2026

News
Jane Street's $7B CoreWeave Bet Signals a Compute Shift
Jul 3, 2026

News
AI Capital Flows Are Becoming the Market's Strongest Signal
Jul 2, 2026