RadixArk did not show up quietly. It arrived the way real infrastructure stories do, already running hot, already embedded, already necessary. Founded in August 2025 in Redwood City, California, RadixArk spun out of SGLang, an open source inference engine born two years earlier inside the UC Berkeley lab of Ion Stoica. This was not a demo project looking for a business model. This was production software solving a problem most people only notice when their AI bill hits seven figures and the GPUs start sweating.
On January 20, 2026, the market caught up. RadixArk was reported to be valued at $400 million in a Series round led by Accel. The exact dollars raised were not disclosed, which is often how you know the leverage sits with the builders. Five months from public launch to a valuation that puts them squarely in the infrastructure heavyweight class is not luck. It is what happens when the pipes matter more than the paint.
At the center of this is SGLang, now stewarded by RadixArk as both an open source engine and a commercial platform. It is deployed across more than 100,000 GPUs globally, generating trillions of tokens every day. It is the dominant inference engine for AMD, the default for xAI, and deeply embedded across cloud providers, frontier labs, and enterprise platforms. Throughput gains of five to six times over baseline systems are not marketing copy. They are operational math that changes balance sheets.
The technical spine is RadixAttention, a cache reuse system that treats inference like a memory problem instead of a brute force one. Shared prefixes get reused automatically. Redundant compute disappears. Cache hit rates climb as high as ninety-nine percent in production. Structured output runs three times faster. JSON stops being a tax. This is the kind of work that does not trend on social, but quietly decides who can afford to scale.
The company is led by Ying Sheng, Co-Founder and Chief Executive Officer, who left xAI to build this full time, and Banghua Zhu, Co-Founder and Chief Technology Officer, a UC Berkeley PhD, NVIDIA Principal Research Scientist, and University of Washington professor. The broader SGLang core team reads like a systems hall of fame in the making, including Lianmin Zheng, Liangsheng Yin, Yineng Zhang, Ke Bao, Byron Hsu, Chenyang Zhao, and Zhiqiang Xie, backed by academic gravity from Ion Stoica and Joseph E. Gonzalez. This is not a solo act. It is a deep bench.
RadixArk is also widening the frame with Miles, an enterprise reinforcement learning framework released in November 2025 that closes the loop between training and inference. On policy learning, zero KL drift, production scale MoE support. Inference is no longer the end of the story. It is part of a feedback system that compounds advantage.
AI does not fail because models are dumb. It fails because infrastructure gets expensive, brittle, and slow. RadixArk is betting that the next decade belongs to the teams who make intelligence cheap, fast, and repeatable, and they are already running in production while the rest of the market is still arguing about prompts.