Latest
Alloy Therapeutics Raises $40M Series E at $1B Valuation to Scale Biotech Platform EcosystemAlloy Therapeutics Raises $40M Series E at $1B Valuation to Scale Biotech Platform Ecosystem|Artemis Security Raises $70M Series A to Build AI-Native Cyber Defense Against Automated AttacksArtemis Security Raises $70M Series A to Build AI-Native Cyber Defense Against Automated Attacks|Solidroad Raises $25M Series A to Analyze and Improve Every Customer Interaction with AISolidroad Raises $25M Series A to Analyze and Improve Every Customer Interaction with AI|Balerion AI Raises $6M Seed to Automate Mortgage Loan Origination with AI AgentsBalerion AI Raises $6M Seed to Automate Mortgage Loan Origination with AI Agents|Sennos Raises $20M to Expand AI-Driven Fermentation Monitoring and Industrial AnalyticsSennos Raises $20M to Expand AI-Driven Fermentation Monitoring and Industrial Analytics|Ulysses Secures Investment from Booz Allen to Scale Autonomous Maritime Robotics SystemsUlysses Secures Investment from Booz Allen to Scale Autonomous Maritime Robotics Systems|Resolve AI Raises $40M Extension at $1.5B Valuation to Automate Production Incident ManagementResolve AI Raises $40M Extension at $1.5B Valuation to Automate Production Incident Management|Worki Raises $2.75M Pre-Seed to Connect Healthcare Workforce Systems and Reduce Admin CostsWorki Raises $2.75M Pre-Seed to Connect Healthcare Workforce Systems and Reduce Admin Costs|Joyful Health Raises $17M Series A to Fix Healthcare Revenue Cycle with AI Financial InfrastructureJoyful Health Raises $17M Series A to Fix Healthcare Revenue Cycle with AI Financial Infrastructure|Wealth.com Raises $65M Series B to Power AI Estate and Tax Planning for Wealth AdvisorsWealth.com Raises $65M Series B to Power AI Estate and Tax Planning for Wealth Advisors|Alloy Therapeutics Raises $40M Series E at $1B Valuation to Scale Biotech Platform EcosystemAlloy Therapeutics Raises $40M Series E at $1B Valuation to Scale Biotech Platform Ecosystem|Artemis Security Raises $70M Series A to Build AI-Native Cyber Defense Against Automated AttacksArtemis Security Raises $70M Series A to Build AI-Native Cyber Defense Against Automated Attacks|Solidroad Raises $25M Series A to Analyze and Improve Every Customer Interaction with AISolidroad Raises $25M Series A to Analyze and Improve Every Customer Interaction with AI|Balerion AI Raises $6M Seed to Automate Mortgage Loan Origination with AI AgentsBalerion AI Raises $6M Seed to Automate Mortgage Loan Origination with AI Agents|Sennos Raises $20M to Expand AI-Driven Fermentation Monitoring and Industrial AnalyticsSennos Raises $20M to Expand AI-Driven Fermentation Monitoring and Industrial Analytics|Ulysses Secures Investment from Booz Allen to Scale Autonomous Maritime Robotics SystemsUlysses Secures Investment from Booz Allen to Scale Autonomous Maritime Robotics Systems|Resolve AI Raises $40M Extension at $1.5B Valuation to Automate Production Incident ManagementResolve AI Raises $40M Extension at $1.5B Valuation to Automate Production Incident Management|Worki Raises $2.75M Pre-Seed to Connect Healthcare Workforce Systems and Reduce Admin CostsWorki Raises $2.75M Pre-Seed to Connect Healthcare Workforce Systems and Reduce Admin Costs|Joyful Health Raises $17M Series A to Fix Healthcare Revenue Cycle with AI Financial InfrastructureJoyful Health Raises $17M Series A to Fix Healthcare Revenue Cycle with AI Financial Infrastructure|Wealth.com Raises $65M Series B to Power AI Estate and Tax Planning for Wealth AdvisorsWealth.com Raises $65M Series B to Power AI Estate and Tax Planning for Wealth Advisors
Back to articles
Jesse Landry

OpenAI Launches EVMbench

OpenAI just stepped into the EVM and brought a benchmark with teeth. From San Francisco, the AI lab founded in 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and...

NewsTop News

OpenAI just stepped into the EVM and brought a benchmark with teeth. From San Francisco, the AI lab founded in 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and others introduced EVMbench, an open framework designed to test whether AI agents can actually perform in smart contract security or simply narrate competence. In a cycle crowded with model demos and leaderboard theater, this is the kind of tech news that separates claims from capability.

EVMbench was developed in collaboration with Paradigm, the San Francisco based crypto investment firm founded in 2018 by Matt Huang and Fred Ehrsam, and OtterSec, the web3 security auditing firm founded by Robert Chen. The premise is simple and unforgiving. AI agents must Detect real vulnerabilities, Patch them without breaking core functionality, and Exploit them end to end inside a sandboxed EVM. Detect, Patch, Exploit. 3 verbs that turn theory into consequence.

Underneath the surface sit 120 curated vulnerabilities drawn from 40 real audit reports, including findings connected to Tempo, the payments focused Layer 1 co developed by Paradigm and Stripe. These are not synthetic puzzles drafted for academic comfort. They are extracted from live codebases where security failures carry financial weight. Agents scan repositories, produce vulnerability reports, submit code fixes, and, when prompted, attempt to drain funds in a controlled environment via JSON RPC. If the exploit executes, the benchmark records it. If the patch breaks functionality, it records that too.

The technical paper credits Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, and Olivia Watkins. OpenAI, Paradigm, OtterSec. Research depth from AI, crypto, and frontline auditing in one frame. Most coverage names the organizations, not the individuals, but the author list tells you who engineered the test itself. That matters in tech news, where credibility is often buried beneath volume.

Smart contracts secure tens of billions in value. Every overlooked vulnerability is a liability with a clock attached. As AI models sharpen in code understanding and generation, the question shifts from can they write contracts to can they secure them, and if not, can they exploit them faster than human defenders can react.

OpenAI has already expanded its security posture with initiatives like Trusted Access for Cyber and API credits aimed at defenders. EVMbench aligns with that trajectory. It is not a product pitch. It is a measuring instrument published in public view. In a market saturated with forward looking statements, measurement is leverage.

Crypto has long pursued trust minimized systems. AI pursues capability maximized systems. EVMbench stands between them and forces both to prove performance under pressure. This is the strain of tech news that does not fade after a 24 hour cycle. It lingers in roadmaps, audit workflows, and model evaluations. The only real question now is who runs the benchmark next, and what it reveals when their agent meets the code.